首页 > 代码库 > 基于bs4的网页游历
基于bs4的网页游历
1. HTML的基本格式
<html> <head> <title> This is a python demo page </title> </head> <body> <p class="title"> <b> The demo python introduces several python courses. </b> </p> <p class="course"> Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses: <a class="py1" href=http://www.mamicode.com/"http://www.icourse163.org/course/BIT-268001" id="link1"> Basic Python </a> and <a class="py2" href=http://www.mamicode.com/"http://www.icourse163.org/course/BIT-1001870001" id="link2"> Advanced Python </a>
1. 下行游历。
1.1 contents
import requests # r = requests.get("http://python123.io/ws/demo.html") demo = r.text from bs4 import BeautifulSoup soup = BeautifulSoup(demo,"html.parser") print(soup.body.contents) # 返回所有子节点的信息。。 print(soup.body.contents[1]) # 以列表的形势,因此可以进行列表的处理
print(soup.p.contents) # 只返回标签的全部子第一个信息
1.2 气死我了连续两次都没保存上。。。。。
children and descendants
print(soup.body.children) # 仅循环 # <list_iterator object at 0x01383D10> print(soup.body.descendants) # 仅循环 # <generator object descendants at 0x024B42A0> for i in soup.body.children: print(i) for j in soup.body.descendants: print(j)
2. 上行游历
2.1 parent
print(soup.a.parents)
print(soup.p.parent)
2.2 parents
for i in soup.p.parents: print(i.prettify())
3. 平行游历(仅同一父亲节点下)
基于bs4的网页游历
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。