首页 > 代码库 > BeautifulSoup使用
BeautifulSoup使用
request能取到网页上面的数据,但是这些是属于结构化的数据,我们不能直接使用,需要将这些数据进行转化,从而方便使用
BeautifulSoup能将标签移除掉,从而获得网页上的数据以及内容
1、将特定标签的内容取出来
单个标签
from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="http://www.mamicode.com/#" class="link">This is link1</a>\<a href="http://www.mamicode.com/# link2" class = "link"> This is link2</a>\</body>\</html>‘
soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘h1‘)
print(header[0].text)
多个相同的标签
from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="http://www.mamicode.com/#" class="link">This is link1</a>\<a href="http://www.mamicode.com/# link2" class = "link"> This is link2</a>\</body>\</html>‘
soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘a‘)
for alink in header:
print(alink.text)
2、取出含有特定css属性的元素
id前面需要加#
from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="http://www.mamicode.com/#" class="link">This is link1</a>\<a href="http://www.mamicode.com/# link2" class = "link"> This is link2</a>\</body>\</html>‘
soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘#title‘)
print(header)
class前面加.
from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="http://www.mamicode.com/#" class="link">This is link1</a>\<a href="http://www.mamicode.com/# link2" class = "link"> This is link2</a>\</body>\</html>‘
soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘.link‘)
for alink in header:
print(alink.text)
3、取得a标签里面链接的内容
from bs4 import BeautifulSoup
html_sample = ‘\<html>\ <body>\<h1 id ="title"> HelloWorld</h1>\<a href="http://www.mamicode.com/#" class="link">This is link1</a>\<a href="http://www.mamicode.com/# link2" class = "link"> This is link2</a>\</body>\</html>‘
soup= BeautifulSoup(html_sample,‘html.parser‘)
header=soup.select(‘a‘)
for alink in header:
print(alink[‘href‘])
BeautifulSoup使用
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。