简单的大众点评爬虫

首页 > 代码库 > 简单的大众点评爬虫

2024-08-07 17:02:28 218人阅读

一个很简单的爬虫，爬取中大周边地点的点评信息。

# -*- coding: utf-8 -*-import requestsimport reimport timedef placeSplider(name, star, url):    time.sleep(5)    res = requests.get(‘http://www.dianping.com‘+url)    text = res.text    longInfo = "<p class=\"desc J-desc\">(.*?)</p>"    longInfo_re = re.compile(longInfo, re.DOTALL)    longInfos = longInfo_re.findall(text)        info = "sml-rank-stars sml-str(.*?)\".*?<p class=\"desc\">(.*?)</p>"    info_re = re.compile(info, re.DOTALL)    results = info_re.findall(text)    #print result    #print ‘%d results‘ %len(results)    if len(results) == 0 or len(results[0]) < 2 or results[0][1].count(u‘人点评‘) > 0:        print u‘没有点评\n‘        return    fOut = open(‘D:\\%s.txt‘ %name, ‘w‘)    fOut.write(‘place star %s\n‘ %star)    for result in results:        star = result[0]        info = result[1]        if info.count(‘<span‘) > 0 or info.count(u‘仅售‘)>0:#去广告            print ‘‘            break        else:            if info[-6:] == u"......":#替换短评论为相应的长评论                info = info[:-6]                for i in longInfos:                    if i.count(info) > 0:                        info = i                        break            info = info.replace("<br/>", ‘‘)            info = info.replace("<br>", ‘‘)            info = info.replace("&nbsp;", ‘‘)            print star, info            fOut.write(‘%s\n‘ %star)            fOut.write(‘%s\n‘ %info.encode(‘u8‘))    fOut.close()for page in range(1, 6):    res = requests.get(‘http://www.dianping.com/search/keyword/206/0_%E4%B8%AD%E5%B1%B1%E5%A4%A7%E5%AD%A6/p‘+str(page))    text = res.text    href = "data-hippo-type=\"shop\" title=\"(.*?)\" target=\"_blank\" href=http://www.mamicode.com/"(.*?)\".*?sml-rank-stars sml-str(.*?)\""    href_re = re.compile(href, re.DOTALL)    result =  href_re.findall(text)    for place in result:        name = place[0]        url = place[1]        star = place[2]        print name, star, url        placeSplider(name, star, url)    time.sleep(5)

简单的大众点评爬虫

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > 简单的大众点评爬虫

简单的大众点评爬虫

看完仍有疑问？有类似问题直接问程序猿