Noticing
the Trend for mobile/ tablet applications, every organization is going for a
mobile version of its app or website. If you are going for an application that
involves image processing, one of the most common and needed requirement is for
OPTICAL CHARACTER RECOGNITION. Now as win 8 being new, implementing OCR with
win8 app may be the need in many of your applications.
LEADTOOLS
help us to achieve this in a simplest way. It is a free SDK available for win
8. Any Windows Store application can easily be created with the help of LEAD
Technologies imaging SDKs. The LEADTOOLS Optical Character Recognition SDK
provides native WinRT libraries that can run on any mobile, tablet or desktop
device. They Can help you convert any image to pdfs, plain text etc or
implement your custom logic.
Features of Leadtools WINRT
· Win32, x64 and ARM,
Native WinRT binaries
· Develop Windows
Store applications that target any Windows 8 desktop, tablet or mobile device
· Image viewer
controls designed specifically for WinRT and Windows Store apps
o
Compatible with Expression Blend
o
Supports both mouse and multi-touch gesture
input
o
Built-in interactive modes such as pan, scale,
pinch and zoom, magnifying glass and more
o
Automatically scale images to fit, fit width and
stretch to the control size
· Load, convert and
save more than 150 image formats
· Advanced bit depth, color space and compression support for common
formats including PDF, PDF/A, JPEG, JPEG 2000, TIFF, JBIG2 and more OCR
Features
OCR Features for WinRT
· Fast, accurate and
reliable optical character recognition for use in any application or
environment
· Choose from several
built-in and custom dictionaries to improve OCR results
· Recognize text from
over 30 languages and character sets including English, Spanish, French,
German, Japanese, Chinese, Arabic and more
· Automatically detect
the document's language
· Full page analysis
and Zonal recognition
· Unique color and
bitonal image recognition
· Automated document
image cleanup
Functional Code
In this example
we will be allowing user to load an image and then we will recognize the
characters in that and return the text recognized.
Step 1 Create a win 8 metro app and
add reference to the LEADTOOLS binaries for win8.
Step 2 Initialize the LEADTOOLS OCR
engine and preparing a document.
// Create an instance of the engine
string strEngineDirectory = Path.Combine(Windows.ApplicationModel.Package.Current.InstalledLocation.Path,
@"OCR");
_ocrEngine =
OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);
_ocrEngine.Startup(null, null, string.Empty, strEngineDirectory);
// Create the OCR document
_ocrDocument = _ocrEngine.DocumentManager.CreateDocument();
Step3 Load an image and add it as a
page to our document.
// Show the file picker
var picker = new FileOpenPicker();
picker.SuggestedStartLocation = PickerLocationId.PicturesLibrary;
picker.ViewMode = PickerViewMode.List;
foreach (var imageFormat in _imageFormats)
picker.FileTypeFilter.Add(imageFormat.Extension);
var file = await picker.PickSingleFileAsync();
if (file == null)
return;
// Create a LEADTOOLS stream from the file
ILeadStream leadStream = LeadStreamFactory.Create(file);
// Get the RasterCodecs object to load the image from the OCR
engine
RasterCodecs codecs = _ocrEngine.RasterCodecsInstance;
// Load the image (first page only)
RasterImage rasterImage = await codecs.LoadAsync(leadStream, 0,
CodecsLoadByteOrder.BgrOrGray, 1, 1);
// Add it to the OCR engine
// Check if we have previous pages, remove them
_ocrDocument.Pages.Clear();
_ocrPage = _ocrDocument.Pages.AddPage(rasterImage, null);
Snapshot displaying image Loaded
Step 4 OCR to Text
Converting an image to raw text is very easy with the RecognizeText function which returns the results as a string object.
// Auto-zone the page
_ocrPage.AutoZone(null);
// Recognize the page and get the results as text
TextResults.Text = _ocrPage.RecognizeText(null);
Snapshot displaying image to text conversion
Hope you will find this
useful.
For any queries you can
contact me at saurabhpahuja@yahoo.co.in
Warm Regards,
Saurabh Pahuja