I couldn't get anything out of google, so I ask: Do you know some free open source C# OCR library?
edit I need C# option, no coding of wrappers from C++ or similar stuff. the google library's website doesn't work - not sure if it ceased to exist or it's just unavailable which is very uncommon for Google websites.
You can use Tesseract OCR in C# by following the instructions given in this question [1].
This blog post [2] might also be of interest. Seems they got the source to compile on Windows.
[1] http://stackoverflow.com/questions/30328/ocr-with-the-tesseract-interfaceYou can work with office 2007 OCR Engine. Look at this msdn reference [1] for more information and sampe code.
[1] http://msdn.microsoft.com/en-us/library/aa202819%28office.11%29.aspxGoogle's own: http://code.google.com/p/tesseract-ocr/
Check out the Microsoft Research project "OCR in the Cloud". There's example C# code provided for using it, although it runs on Windows Phone 7. I've deployed it to my phone and it works a treat. http://research.microsoft.com/en-us/um/redmond/projects/hawaii/students/default.aspx
Have a look at OCRopus [1]:
OCRopus is a [...] document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.
OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.
(Quote from http://code.google.com/p/ocropus/)
OCRopus comes with C++ interfaces, so you would have to provide your own wrapper classes to C# though.
[1] http://en.wikipedia.org/wiki/OCRopusTry tessnet2 [1]:
[1] http://www.pixel-technology.com/freeware/tessnet2/Tesseract is a C++ open source OCR engine. Tessnet2 is .NET assembly that expose very simple methods to do OCR.
Unfortunately, tessnet2 does not seem to be maintained anymore, the last version targets tesseract 2.x while google is at 3.x
Here
[1] and there you can find talks about a potential tessnet3 but with no hurry from the main dev.
For a .NET wrapper on newer versions look here [2]
[1] http://groups.google.com/group/tesseract-dev/browse_thread/thread/431c8075af25f5aa