首页 > 代码库 > 我的tesseract学习记录(二)

我的tesseract学习记录(二)

前言:花了约三周看文档(打酱油),又花了两周搭环境,终于把tesseract用起来了,对简体中文的识别率还不错,在95%以上。现在简要记录一下安装、识别过程。

一、系统环境

  系统:centos6.5

  编译环境:g++

  依赖软件:leptonica、opencv2.4.9、tesseract3.02

二、安装过程

(1) leptonica

sudo yum -y install autoconf automake libtool
sudo yum -y install autoconf-archive
sudo yum -y install pkgconfig
sudo yum -y install libpng12-dev
sudo yum -y install libjpeg8-dev
sudo yum -y install libtiff5-dev
sudo yum -y install zlib1g-dev

wget http://www.leptonica.org/source/leptonica-1.68.tar.gz
tar xvzf leptonica-1.68.tar.gz
cd leptonica-1.68/
./configure
make && make install

(2) tesseract3.02

  tesseract的安装参考这里

  同时参考官网这里

  ./autogen.sh
  ./configure --enable-debug LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" 
  make   make install   ldconfig

  语言文件:

  export TESSDATA_PREFIX=/some/path/to/tessdata

    to point to your tessdata directory (example: if your tessdata path is ‘/usr/local/share/tessdata‘ you have to use ‘export TESSDATA_PREFIX=‘/usr/local/share/‘).

    环境变量TESSDATA_PREFIX的路径需要设置成为tessdata文件夹的父目录。

(3) opencv2.4.9

  1. $ sudo yum -y install gtk2-devel tbb-devel libpng-devel
  2. $ wget http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/2.4.9/opencv-2.4.9.zip   
  3. $ unzip opencv-2.4.9.zip   
  4. $ cd opencv-2.4.9  
  5. $ mkdir build   
  6. $ cd build   
  7. $ cmake  -D  CMAKE_BUILD_TYPE=RELEASE  -D  CMAKE_INSTALL_PREFIX=/usr/local  ..
  8. $ make  -j2
  9. $ make install  

三、API接口应用过程

(1) 编译过程

  1、设置PKG_CONFIG_PATH environment variable ,加入`tesseract.pc‘

     $echo ‘export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig‘ >> ~/.bashrc

     $source ~/.bashrc

  2、提示opencv缺libcufft,libnpps,libnppi,libnppc,libcudart等几个库,可以参考这里

    这些库在cuda/lib64中,建立软连接

    [root@localhost lib64]#ln -s /usr/local/cuda-6.5/lib64/libcufft.so.6.5 /usr/local/lib/libcufft.so

    [root@localhost lib64]#ln -s /usr/local/cuda-6.5/lib64/libnpps.so.6.5 /usr/local/lib/libnpps.so
    [root@localhost lib64]# ln -s /usr/local/cuda-6.5/lib64/libnppi.so.6.5 /usr/local/lib/libnppi.so
    [root@localhost lib64]# ln -s /usr/local/cuda-6.5/lib64/libnppc.so.6.5 /usr/local/lib/libnppc.so
    [root@localhost lib64]# ln -s /usr/local/cuda-6.5/lib64/libcudart.so.6.5 /usr/local/lib/libcudart.so

 

    在运行时,报错:

    error while loading shared libraries: libcufft.so.6.5: cannot open shared object file: No such file or directory

    error while loading shared libraries: libnpps.so.6.5: cannot open shared object file: No such file or directory

    error while loading shared libraries: libnppi.so.6.5: cannot open shared object file: No such file or directory

    error while loading shared libraries: libnppc.so.6.5: cannot open shared object file: No such file or directory

    error while loading shared libraries: libcudart.so.6.5: cannot open shared object file: No such file or directory

    

      解决方法,参考这里

    When I run testing routine, facing error: error while loading shared libraries: libcudart.so.6.5: cannot open shared object file: No such file or directory.

    Solution for this, copy respect library to /usr/local/lib:

    sudo cp /usr/local/cuda-6.5/lib64/libcudart.so.6.5 /usr/local/lib/libcudart.so.6.5 && sudo ldconfig

    sudo cp /usr/local/cuda-6.5/lib64/libcublas.so.6.5 /usr/local/lib/libcublas.so.6.5 && sudo ldconfig

    sudo cp /usr/local/cuda-6.5/lib64/libcurand.so.6.5 /usr/local/lib/libcurand.so.6.5 && sudo ldconfig

    

    最后的结果是能够成功进行识别,但是准确率较windows下有所下降,唯一的区别是,linux用的opencv是2.4.9,而windows中使用的2.4.10.

    

我的tesseract学习记录(二)