首页 > 代码库 > Python 爬虫-抓取小说《鬼吹灯之精绝古城》
Python 爬虫-抓取小说《鬼吹灯之精绝古城》
想看小说《鬼吹灯之精绝古城》,可是网页版的好多广告,还要一页一页的翻,还无法复制,于是写了个小爬虫,保存到word里慢慢看。
代码如下:
""" 爬取《鬼吹灯之精绝古城》小说 """ from selenium import webdriver import os from docx import Document class DownloadFiles(): def __init__(self): self.baseUrl = ‘http://www.luoxia.com/guichui/‘ self.basePath = os.path.dirname(__file__) def makedir(self, name): path = os.path.join(self.basePath, name) isExist = os.path.exists(path) if not isExist: os.makedirs(path) print(‘File has been created.‘) else: print(‘The file is existed.‘) # 切换到该目录下 os.chdir(path) def connect(self, url): try: driver = webdriver.PhantomJS() driver.get(url) print(url) except: "This page is not existed." return driver def getContent(self): doc = Document() self.makedir(‘storyFiles‘) for page in range(27426, 27461): print(‘The page number is : ‘ + str(page)) url = self.baseUrl + str(page) + ‘.htm‘ driver = self.connect(url) rList = driver.find_elements_by_xpath(‘//article/p‘) for r in rList: print(r.text) doc.add_paragraph(r.text) doc.save(‘guichuideng.doc‘) if __name__ == ‘__main__‘: obj = DownloadFiles() obj.getContent()
Python 爬虫-抓取小说《鬼吹灯之精绝古城》
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。