Python 爬虫-抓取小说《鬼吹灯之精绝古城》

首页 > 代码库 > Python 爬虫-抓取小说《鬼吹灯之精绝古城》

Python 爬虫-抓取小说《鬼吹灯之精绝古城》

2024-09-04 01:51:53 225人阅读

想看小说《鬼吹灯之精绝古城》，可是网页版的好多广告，还要一页一页的翻，还无法复制，于是写了个小爬虫，保存到word里慢慢看。

代码如下：

"""
爬取《鬼吹灯之精绝古城》小说
"""
from selenium import webdriver
import os
from docx import Document

class DownloadFiles():

    def __init__(self):
        self.baseUrl = ‘http://www.luoxia.com/guichui/‘
        self.basePath = os.path.dirname(__file__)

    def makedir(self, name):
        path = os.path.join(self.basePath, name)
        isExist = os.path.exists(path)
        if not isExist:
            os.makedirs(path)
            print(‘File has been created.‘)
        else:
            print(‘The file is existed.‘)
        # 切换到该目录下
        os.chdir(path)

    def connect(self, url):
        try:
            driver = webdriver.PhantomJS()
            driver.get(url)
            print(url)
        except:
            "This page is not existed."
        return driver

    def getContent(self):
        doc = Document()
        self.makedir(‘storyFiles‘)
        for page in range(27426, 27461):
            print(‘The page number is : ‘ + str(page))
            url = self.baseUrl + str(page) + ‘.htm‘
            driver = self.connect(url)
            rList = driver.find_elements_by_xpath(‘//article/p‘)
            for r in rList:
                print(r.text)
                doc.add_paragraph(r.text)

        doc.save(‘guichuideng.doc‘)

if __name__ == ‘__main__‘:
    obj = DownloadFiles()
    obj.getContent()

View Code

Python 爬虫-抓取小说《鬼吹灯之精绝古城》

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Python 爬虫-抓取小说《鬼吹灯之精绝古城》

Python 爬虫-抓取小说《鬼吹灯之精绝古城》

看完仍有疑问？有类似问题直接问程序猿