首页 > 代码库 > Python基础day-13[模块:re,subprocess未完]

Python基础day-13[模块:re,subprocess未完]

re(续):

  re默认是贪婪模式。

  贪婪模式:在满足匹配时,匹配尽可能长的字符串。

import re
s = askldlaksdabccccccccasdabcccalsdacbcccacbcccabccc

res = re.findall(abc+,s)
print(res)

res = re.findall(abc+?,s)    #在规则后面加?来取消贪婪模式。
print(res)

执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
[abcccccccc, abccc, abccc]
[abc, abc, abc]

Process finished with exit code 0

re的模块的常用方式:

re.split(): 类似字符串的split命令但是比 字符串的split 更强大。

import re
s = askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc

res = re.split(\d,s)
print(res)
res = re.split((\d+),s)    #加()来保留分割符
print(res)


执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
[askldlaksdab, ccccc.cccas, dabc, cc.alsdacbcccac.cccab, ccc]
[askldlaksdab, 8, ccccc.cccas, 8, dabc, 8, cc.alsdacbcccac.cccab, 8, ccc]

Process finished with exit code 0

re.sub():类似replace 替换操作。

import re
s = askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc

res = re.sub(abc+,123,s)
print(res)


执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
askldlaksdab8ccccc.cccas8d1238cc.alsdacbcccac.cccab8ccc

Process finished with exit code 0

re.compile():编译

import re
s = askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc

obj = re.compile(\d+)   #定义一个对象对应的编译规则
res = obj.findall(s)    #调用处理
print(res)

执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
[8, 8, 8, 8]

Process finished with exit code 0

一个小爬虫正则练习(爬校花网)

import requests,re,json
url = http://www.xiaohuar.com/2014.html‘    #校花排行榜top120
def req():
    req_str = requests.get(url)
    # print(‘encoding‘,req_str.encoding)
    return req_str.text

def run():
    html = req()
    html = html.encode(Latin-1).decode(gbk)
    # print(html)
    obj = re.compile(<div class="top-title">(.*?)</div>.*?<div class="title">.*?target="_blank">(.*?)</a></span></div>,re.S)   #匹配top排名序号和姓名学校
    res = obj.findall(html)
    return res

dic = {}
res = run()
for x in res:
    dic[x[0]]=x[1]
data = json.dumps(dic)       #序列化
with open(xiaohua.json,a,encoding=utf-8) as f:
    f.write(data)

with open(xiaohua.json, r, encoding=utf-8) as f:
    data = json.load(f)   #反序列化
    print(data)

subprocess:

        subprocess模块允许一个进程创建一个新的子进程,通过管道连接到子进程的stdin/stdout/stderr,获取子进程的返回值等操作。 

import subprocess

s = subprocess.Popen(dir,shell=True,stdout=subprocess.PIPE)
print(s.stdout.read().decode(gbk))

执行结果:
D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py
 驱动器 E 中的卷没有标签。
 卷的序列号是 383D-453A

 E:\Python\DAY-15 的目录

2017/06/27  19:52    <DIR>          .
2017/06/27  19:52    <DIR>          ..
2017/06/27  19:52               338 3213.py
2017/06/27  19:47               778 tmp.py
2017/06/27  19:25             9,146 xiaohua.json
               3 个文件         10,262 字节
               2 个目录 117,877,260,288 可用字节


Process finished with exit code 0

 

Python基础day-13[模块:re,subprocess未完]