网页解析正则表达式

首页 > 代码库 > 网页解析正则表达式

2024-07-29 22:15:04 220人阅读

在写爬虫的过程中，最麻烦的就是写正则表达式，还要一个一个的尝试，一次次的调试，很是费时间。于是我就写了一个网页版的，只需要输入要爬的网址，和正则式，网页上就可以显示爬到的数据。

思路：其实很简单，将网址和正则式传到服务器，服务器解析之后，将结果返回到前端。我用的是bootcss(前端)+bottle(后台用python处理)，代码很简单，就是过程有些复杂。由于传递的参数是一个网址，而后台判断参数结束的标志是/......./,所以每次都是传值失败，后来想到用先用base64加密再传递

webRegx.py

import urllib2
import re
import json

def getHtml(url):
    html = urllib2.urlopen(url).read()
    return html

def getResult(url,reg):
    html = urllib2.urlopen(url).read()
    reg = re.compile(reg)
    results = reg.findall(html)
    if len(results)>0:
        for result in results:
            print result
    else:
        print "not result"
    return json.dumps(results)

注意：最后要返回一个json结构的数据

main.py

from bottle import route,request,template,run,Bottle,static_file
from webRegx import getResult
import base64

app = Bottle()

@app.route('/')
def show():
    return template('templates/index')

@app.route('/jiexi/:webstr#.*?#',method='post')
def test(webstr):
    #return "hello{}!".format(name)
    #webstr = webstr.replace(',','?')
    base64_url,base64_reg =webstr.split(",") 
    url=base64.decodestring(base64_url)#解密
    reg=base64.decodestring(base64_reg)
    return getResult(url,reg)

@app.route('/templates/:filename')
def send_static(filename):
    return static_file(filename, root='./templates')

run(app, host='localhost', port=8080)

index.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="">
    <meta name="author" content="">

    <title>Sticky Footer Template for Bootstrap</title>

   <!-- 新 Bootstrap 核心 CSS 文件 -->
    <link rel="stylesheet" href=http://www.mamicode.com/"http://cdn.bootcss.com/bootstrap/3.2.0/css/bootstrap.min.css">>查询用的是ajax方式。
最后效果：




网页解析正则表达式

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > 网页解析正则表达式

网页解析正则表达式

看完仍有疑问？有类似问题直接问程序猿