Ubuntu上python识别验证码遇到的问题

首页 > 代码库 > Ubuntu上python识别验证码遇到的问题

Ubuntu上python识别验证码遇到的问题

2024-09-28 01:39:01 217人阅读

python有专门图片识别的库
我用的是pytesseract

pytesseract说明

Python-tesseract is a wrapper for google’s Tesseract-OCR
( http://code.google.com/p/tesseract-ocr/ ). It is also useful as a
stand-alone invocation script to tesseract, as it can read all image types
supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff,
and others, whereas tesseract-ocr by default only supports tiff and bmp.
Additionally, if used as a script, Python-tesseract will print the recognized
text in stead of writing it to a file. Support for confidence estimates and
bounding box data is planned for future releases.

大意如下：
1.Python-tesseract是一个基于google’s Tesseract-OCR的独立封装包
2.Python-tesseract功能是识别图片文件中文字，并作为返回参数返回识别结果
3.Python-tesseract默认支持tiff、bmp格式图片，只有在安装PIL之后，才能支持jpeg、gif、png等其他图片格式

那么问题来了，PIL是什么？
PIL：Python Imaging Library，已经是Python平台事实上的图像处理标准库了。PIL功能非常强大，但API却非常简单易用。

安装PIL

在Debian/Ubuntu Linux下直接通过apt安装：

1	$ sudo apt-get install python-imaging

Mac和其他版本的Linux可以直接使用easy_install或pip安装，安装前需要把编译环境装好：

1	$ sudo easy_install PIL

如果安装失败，根据提示先把缺失的包（比如openjpeg）装上。
Windows平台就去PIL官方网站下载exe安装包。

直接安装pytesseract

1	$ sudo pip install pytesseract

图文转换测试

ubuntu安装成功后
然后测试一下，随便找了个简单的验证码图片test.png放在同一目录下

# -*- coding:utf -8-*-

import pytesseract

from PIL import Image

image = Image.open(‘test.png‘)

code = pytesseract.image_to_string(image)

print (code)

出现了报错
OSError: [Errno 2] No such file or directory
黑人问号？？？
技术分享
mmp,开始我以为是文件读不到
结果到网上查，是没有安装tesseract-ocr
然后安装下

1	apt-get install tesseract-ocr

可以，很完美
技术分享

验证时的优化函数

接着写我的脚本，发现验证码全是数字
于是要把一些容易读出字母的数字改过来

change={

‘O‘:‘0‘,

‘o‘:‘0‘,

‘I‘:‘1‘,

‘i‘:‘1‘,

‘L‘:‘1‘,

‘l‘:‘1‘,

‘Z‘:‘2‘,

‘z‘:‘2‘,

‘e‘:‘3‘,

‘a‘:‘4‘,

‘S‘:‘5‘,

‘s‘:‘5‘,

‘b‘:‘6‘,

‘T‘:‘7‘,

‘t‘:‘7‘,

‘q‘:‘9‘

};

替换的时候

1 2	for x in change: text = text.replace(x,change[x])

把他们合起来

# -*-coding:utf-8-*-

import pytesseract

from PIL import Image

image = Image.open(‘test.png‘)

code = pytesseract.image_to_string(image)

change={

‘O‘:‘0‘,

‘o‘:‘0‘,

‘I‘:‘1‘,

‘i‘:‘1‘,

‘L‘:‘1‘,

‘l‘:‘1‘,

‘Z‘:‘2‘,

‘z‘:‘2‘,

‘e‘:‘3‘,

‘a‘:‘4‘,

‘S‘:‘5‘,

‘s‘:‘5‘,

‘b‘:‘6‘,

‘T‘:‘7‘,

‘t‘:‘7‘,

‘q‘:‘9‘

};

for x in change:

code = code.replace(x,change[x])

print code

python下载图片

python获取图片并写到本地的脚本如下

# -*- coding:utf-8 -*-

import requests

r = requests.get(url = "http://example/test.php")

data = http://www.mamicode.com/r.content

f = file("captchatest.png","wb")

f.write(data)

f.close()

登录脚本

最后的登录脚本为

# -*- coding:utf-8 -*-

import requests

import pytesseract

from PIL import Image

s = requests.session()

def change_to_string():

image = Image.open(‘captchatest.png‘)

code = pytesseract.image_to_string(image)

change={

‘O‘:‘0‘,

‘o‘:‘0‘,

‘I‘:‘1‘,

‘i‘:‘1‘,

‘L‘:‘1‘,

‘l‘:‘1‘,

‘Z‘:‘2‘,

‘z‘:‘2‘,

‘e‘:‘3‘,

‘a‘:‘4‘,

‘S‘:‘5‘,

‘s‘:‘5‘,

‘b‘:‘6‘,

‘T‘:‘7‘,

‘t‘:‘7‘,

‘q‘:‘9‘

};

for x in change:

code = code.replace(x,change[x])

return code

r = s.get(url = "http://example/login.php")

data = http://www.mamicode.com/r.content

f = file("captchatest.png","wb")

f.write(data)

f.close()

# print change_to_string()

rr = s.post(url = "http://example/login.php" , data = http://www.mamicode.com/{‘username‘:‘c014‘,‘password‘:‘c014‘,‘captcha‘:change_to_string()})

print "---------------------"

print rr.content

Ubuntu上python识别验证码遇到的问题

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Ubuntu上python识别验证码遇到的问题