Home > Android, Java > Building the Tesseract NDK library for Android

Building the Tesseract NDK library for Android

My project for my Android Components graduate class at FAU (COT6930) was a component to allow programmers better access to the Optical Character Recognition (OCR) package Tesseract maintained by Google.

The issue with this library, is that it is written in C and C++. It contains not only the OCR library, but also the Leptonica Image Processing Library.

Available Resources

We have the Android Native Development Kit (NDK) that offers the means of utilizing code written in C and or C++ in our Android programming. However the process of compiling and preparing those libraries is very complex, lengthy and error prone. Below I describe the process; however after my component will allow users to have a more direct access to the OCR libraries on  their own Android development.

We have available the excellent work of Robert Theis with his tess-two project. Tess-two uses the Tesseract project and adds the Android Java Native Interface (JNI) to allow compilation for the Android platform, complete in an Eclipse project.

Building Tesseract for Android

These are the steps in starting work with the NDK. It is assumed that you have the Android SDK already installed. Otherwise you can look at my post here.

  • Download the NDK from the Android Development site.
  • Then extract the NDK to a folder in your computer. Make sure there are no spaces on the path name. I extracted to : “C:\Apps\android-ndk”
  • Configure Eclipse to use the NDK:

    Setup Eclipse NDK

    Setup Eclipse NDK

  • You may be required to Install C/C++ support on Eclipse (Source highlight, etc.)
    • Help->Install New Software->Select the Indigo site->Programming languages
    • Select: Eclipse C/C++ development tools. Binary runtime and user documentation.
  • Download the Tess-two project
  • Extract Tess-two into an Eclipse workspace.
  • Import tess-two project.
    • File -> Import -> Existing Projects into workspace -> tess-two directory.
  • Set up Eclipse to build the libraries automatically. Older versions of the NDK without the ‘ndk-build.cmd’ file will not allow for automatic building without having to deal with cygwin. Just download the newest version of the Android NDK. Follow the excellent instructions on this MobilePears article. Your build configuration should look something like this screenshot:

    Set-up to Build Native Libraries

    Set-up Eclipse to Build Native Libraries

  • Refresh the project. That should trigger the build of the libraries. Go have a cup of coffee. It will take some time.
  • Right click the project, Android Tools -> Fix Project Properties. Right click -> Properties -> Android -> Check Is Library.

Assuming that everything above worked fine, you have a ready to use OCR library ready to use in more or less one day. It was an entire week in my case.

My component will have the compiled library ready to deploy with a minimum effort. It will be coming soon.

  1. February 20, 2013 at 8:17 am

    thank you so much, hopefully it run 🙂

  2. herehong
    November 26, 2012 at 11:21 am

    I tried to follow this tutorial but it still error

    Could not find tess-two.apk

    What should I do now?

    Thanks

    • November 26, 2012 at 9:35 pm

      This project will not produce any .apk files. It produces Android libraries. Please check on the root of your project for a folder called “libs”. You will find several folders (representing different hardware) containing your libraries (*.so files).
      Tell us if that worked.

      • herehong
        November 26, 2012 at 10:00 pm

        Hi Gabriel ,

        My solution is Right click at tess-two project -> Properties -> Android -> Unchecked Is Library.and it worked for me.

        Thanks again *-*

        And I have some question here,
        If I want to translate that string to other languages, what should I do now?

        sorry for my bad English ._.

      • December 1, 2012 at 11:32 pm

        That should compile, but I’m not sure why you would want a native library as an .akp file.
        About the languages, I have not done any localization, but the place to look at is at the d.android.com site.

  3. October 8, 2012 at 1:10 am

    Hi,
    I am trying to NDK build tess-two for days but without success. I’ve tried in the Ubuntu 11.04 but it always stooped after some time. Then I found your tutorial. I’ve did exactly what you have mentioned here. but every time it terminated after this,–
    “SharedLibrary : libtess.so
    Install : libtess.so => libs/x86/libtess.so”

    it seems like the same place it stopped when I am doing this on Ubuntu. What can I do here. Can you make some suggestions.

    Thanks

    • October 8, 2012 at 10:47 pm

      Hi Buddhika,
      It seems to me that it stopped because it completed its job. The console entry says that the library (*.so files) are installed under the libs folder. Check under your libs folder. You should see your compiled libraries there. You did it. Congrats.

      • October 9, 2012 at 2:05 pm

        Hi Gabriel,

        Thanks for the reply, I have successfully built it. Thanks again for your amazing tutorial.
        By the way one of the requirements in my project where I want to take a large image(from camera) then split and save small chunks of that image for later processed by OCR engine.
        for example a tabloid sized Auto classified paper page with classifieds are presented in a box with a image. so what I wanted is to from the large picture, extract those boxes and save them in the SD card or somewhere. Do you think I can do this with tesseract alone or do I need to use libraries like OpenCV. where should I begin. Much appreciated your input.

        Thanks,
        Buddhika

      • October 9, 2012 at 9:48 pm

        Great Buddhika, I’m very glad to be of any help.
        I’m planning to do some experimentation with imaging processing as well. But I will keep on the same track as presented by Robert Theis with his tess-two; and use the Leptonica Image Processing Library. This is an active development library and has been proved to work in the Android platform. Check the links above on this same post. Give it a try and good luck. But come back and tell us how you are progressing.

  4. elkesiempregana
    October 5, 2012 at 4:30 am

    So am I supposed to be able to build those libraries in Windows? I read all tess… stuff only work building under linux, i got W7 and I downloaded everything but cygwin. Yes, I’m new in eclipse and I got lost with projects, workspaces, libraries, builds, components, paths,…
    Sorry for my English.

    • October 6, 2012 at 10:33 pm

      That is correct. You can built Tess-two in Windows without Cygwin, using either Eclipse Indigo or Juno. I tested both. You do need the latest version of the NDK from the Android Development site (link above), and the builds are done automatically for you. Roll up your sleeves and start building things. Good luck.

  1. No trackbacks yet.

Leave a comment