首页 > 代码库 > re模块

re模块

python也支持正则表达式,这节说正则表达式模块re

1 >>> import re
2 >>> dir(re)
3 [A, ASCII, DEBUG, DOTALL, I, IGNORECASE, L, LOCALE, M, \
4 MULTILINE, S, Scanner, T, TEMPLATE, U, UNICODE, VERBOSE, X,\
5 ‘_MAXCACHE, __all__, __builtins__, __cached__, __doc__, __file__, __loader__, \
6 ‘__name__, __package__, __spec__, __version__, _alphanum_bytes, _alphanum_str, _cache, \
7 ‘_cache_repl, _compile, _compile_repl, _expand, _locale, _pattern_type, _pickle, _subx, \
8 ‘compile, copyreg, error, escape, findall, finditer, fullmatch, match, purge, search, \
9 ‘split, sre_compile, sre_parse, sub, subn, sys, template]

使用正则表达式模块首先需要导入模块,re模块中比较重要的几个方法 findall / match / search /compile ,下面用例子解释下:

 

#findall
>>> help(re.findall)
Help on function findall in module re:

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.
    
    If one or more capturing groups are present in the pattern, return
    a list of groups; this will be a list of tuples if the pattern
    has more than one group.
    
    Empty matches are included in the result.
其中:flags定义包括:
re.I:忽略大小写
re.L:表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境
re.M:多行模式
re.S:‘ . ‘并且包括换行符在内的任意字符(注意:‘ . ‘不包括换行符)
re.U: 表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于 Unicode 字符属性数据库
>>> a=this is a test string >>> re.findall(a) >>> re.findall(rtest,a) #查找匹配‘test‘的单词 [test] >>> a=this is a {{test}} string >>> patten=r{{(.*?)}} #结合稍微复杂的模式 >>> re.findall(patten,a) [test]
>>> pattern=ra.c>>> test_str=abc a1c a*c a|c abd aed a\nc>>> re.findall(pattern,test_str)
[abc‘, a1c‘, a*c‘, a|c]
>>> re.findall(pattern,test_str,re.S)      
[abc‘, a1c‘, a*c‘, a|c‘, a\nc]
 1 #re.search   该方法检索字符串,找到匹配后返回,
 2 >>> help(re.search)
 3 Help on function search in module re:
 4 
 5 search(pattern, string, flags=0)
 6     Scan through string looking for a match to the pattern, returning
 7     a match object, or None if no match was found.
 8 >>> test_str=this is a test ,is good,is bad
 9 >>> pattern=ris
10 >>> re.search(pattern,test_str)
11 <_sre.SRE_Match object; span=(2, 4), match=is>
12 >>> dir(b)
13 [__class__, __copy__, __deepcopy__, __delattr__, __dir__, __doc__, __eq__, __format__, \
__ge__, __getattribute__, __gt__, __hash__, __init__, __le__, __lt__, __ne__, __new__, \
__reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__, end, endpos, \
expand, group, groupdict, groups, lastgroup, lastindex, pos, re, regs, span, start, string] 14 >>> b.group() 15 is
 1 #re.match  仅从字符串首开始匹配
 2 >>> help(re.match)
 3 Help on function match in module re:
 4 
 5 match(pattern, string, flags=0)
 6     Try to apply the pattern at the start of the string, returning
 7     a match object, or None if no match was found.
 8 >>> test_str
 9 this is a test ,is good,is bad
10 >>> pattern
11 is
12 >>> b=re.match(pattern,test_str)
13 >>> b
14 >>> print(b)
15 None           #没有匹配到字符串中的is

re.compile 将正则表达式转换为模式对象,可以实现更有效率的匹配,

 1 >>> help(re.compile)
 2 Help on function compile in module re:
 3 
 4 compile(pattern, flags=0)
 5     Compile a regular expression pattern, returning a pattern object.
 6 >>> pattern=r{{(.*?)}}
 7 >>> pat_obj=re.compile(pattern)        #编译成模式对象
 8 >>> pat_obj
 9 re.compile({{(.*?)}})
10 >>> test_str=this is a {{test}} str, {{}}
11 >>> pat_obj.findall(test_str)
12 [test, ‘‘]                   #我们看到这同样匹配了找到的所有结果

 

re模块