share
Stack OverflowAny open source C# OCR library?
[+76] [7] Skuta
[2009-04-13 18:52:21]
[ c# open-source ocr ]
[ http://stackoverflow.com/questions/744966/any-open-source-c-ocr-library ] [DELETED]

I couldn't get anything out of google, so I ask: Do you know some free open source C# OCR library?

edit I need C# option, no coding of wrappers from C++ or similar stuff. the google library's website doesn't work - not sure if it ceased to exist or it's just unavailable which is very uncommon for Google websites.

The Google Code pages are back. I guess they sometimes also have to undergo some maintenance. And obviously Google Code is not run with the same availability as google.com - 0xA3
[+34] [2009-04-13 18:58:33] Benoit [ACCEPTED]

You can use Tesseract OCR in C# by following the instructions given in this question [1].

This blog post [2] might also be of interest. Seems they got the source to compile on Windows.

[1] http://stackoverflow.com/questions/30328/ocr-with-the-tesseract-interface
[2] http://maniish.wordpress.com/2007/03/03/tesseract-ocr-library-successfully-compiled-in-window/

(11) Or you can go directly to tessnet wrapper site: pixel-technology.com/freeware/tessnet2 - Marc Climent
EMGU now contains a wrapper for tesseract. See: stackoverflow.com/a/18070183/852208 - b_levitt
1
[+16] [2009-04-13 18:57:22] Srikar Doddi

You can work with office 2007 OCR Engine. Look at this msdn reference [1] for more information and sampe code.

[1] http://msdn.microsoft.com/en-us/library/aa202819%28office.11%29.aspx

This solution is not open-source. - Benoit
Not open source but wutever .Net has for now ==> codeproject.com/KB/office/OCRSampleApplication.aspx - Srikar Doddi
(6) While not Open-Source, I have used this library succesfully in C# apps and it worked well. +1 - Simucal
2
[+16] [2009-04-13 18:58:20] an0nym0usc0ward
can't load the website - Skuta
(1) Works fine for me. - Alex Fort
it doesn't look like library for C# - Skuta
(13) You should just stick to answers to the question. Not C# = not an answer. "You should just write your own" - FerretallicA
(1) It has a .NET compatible library, so it can be called from C#. - AaronLS
3
[+5] [2011-05-26 06:26:49] Simon Card

Check out the Microsoft Research project "OCR in the Cloud". There's example C# code provided for using it, although it runs on Windows Phone 7. I've deployed it to my phone and it works a treat. http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/default.aspx


4
[+4] [2009-04-13 19:03:45] 0xA3

Have a look at OCRopus [1]:

OCRopus is a [...] document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.

OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.

(Quote from http://code.google.com/p/ocropus/)

OCRopus comes with C++ interfaces, so you would have to provide your own wrapper classes to C# though.

[1] http://en.wikipedia.org/wiki/OCRopus

(1) I have no idea how to do wrapper class and I'd like some out-of-box stuff =/ - Skuta
Writing a wrapper is actually not too difficult. If you are not familiar with it either try to find someone who can do it for you or ask for help here. - 0xA3
5
[+3] [2011-02-16 16:46:55] mellin

Try tessnet2 [1]:

Tesseract is a C++ open source OCR engine. Tessnet2 is .NET assembly that expose very simple methods to do OCR.

[1] http://www.pixel-technology.com/freeware/tessnet2/

6
[+3] [2012-03-26 15:01:23] Baboon

Unfortunately, tessnet2 does not seem to be maintained anymore, the last version targets tesseract 2.x while google is at 3.x
Here [1] and there you can find talks about a potential tessnet3 but with no hurry from the main dev.

For a .NET wrapper on newer versions look here [2]

[1] http://groups.google.com/group/tesseract-dev/browse_thread/thread/431c8075af25f5aa
[2] http://code.google.com/p/tesseractdotnet/

7