The Atalasoft OCR engine

Introduction

At Atalasoft, we’re excited to unveil the newest addition to our product line, Atalasoft OCR. This suite of objects, now available, provides interfacing to OCR engines in a way that makes integration into your .NET application a snap.

In the classes provided, we offer the best of all possible worlds: a multilayered approach to exposing engine capabilities that gets up and running quickly, yet also allows you to get down to the nitty gritty details that are most important to you.

When using Atalasoft OCR engine in its most basic way, most of the work is in managing the user interface and not the OCR engine.

The following snippet of C# code demonstrates how to convert a set of image files into a single plain text file.



 
   
The Atalasoft OCR engine static   void  Main( string [] args)
The Atalasoft OCR engine
{
The Atalasoft OCR engine    
// create and initialize the engine
The Atalasoft OCR engine
    ExperVisionEngine engine = new ExperVisionEngine(nullnull);
The Atalasoft OCR engine    engine.Initialize();
The Atalasoft OCR engine     
The Atalasoft OCR engine    
// select a file or set of files
The Atalasoft OCR engine
    OpenImageFileDialog openDialog = new OpenImageFileDialog();
The Atalasoft OCR engine    openDialog.Multiselect 
= true;
The Atalasoft OCR engine    
if (openDialog.ShowDialog() == DialogResult.OK) 
The Atalasoft OCR engine    
{
The Atalasoft OCR engine        SaveFileDialog saveFileDialog 
= new SaveFileDialog();
The Atalasoft OCR engine        saveFileDialog.Filter 
= "Text (*.txt)|*.txt";
The Atalasoft OCR engine        
if (saveFileDialog.ShowDialog() != DialogResult.OK)
The Atalasoft OCR engine            
return;
The Atalasoft OCR engine        
try 
The Atalasoft OCR engine        
{
The Atalasoft OCR engine            
// translate into a plain text file
The Atalasoft OCR engine
            engine.Translate(
The Atalasoft OCR engine                
new FileSystemImageSource(openDialog.FileNames, true),
The Atalasoft OCR engine                
"text/plain", saveFileDialog.FileName);
The Atalasoft OCR engine        }

The Atalasoft OCR engine        
catch (OcrException err) 
The Atalasoft OCR engine        
{
The Atalasoft OCR engine            System.Console.WriteLine(
"Error in OCR: " + err.Message);
The Atalasoft OCR engine        }

The Atalasoft OCR engine    }

The Atalasoft OCR engine    engine.ShutDown();
The Atalasoft OCR engine}

The Atalasoft OCR engine

As you can see, the interfacing is simple. You may also notice that the main use of the engine is the Translate method, which will takes a set of images and writes them to a file (or stream) using the given MIME type as the output format. By using the MIME standard to describe output file types, it is easy to ask the engine what output types it can support as well as to augment or replace them!

The OcrEngine maintains a collection of objects that implement an interface called ITranslator. When you request that a set of images are to be translated to an output file format, the engine will select a translator that matches the mime type.

If your task requires you to generate output in a particular format, it is short work to create your own object to translate the recognized text and images into the format that you need. You can add your new translator or take away from the engine’s translator collection as you see fit. You can even bypass the translator selection process entirely and simply supply the translator that you want to use.

OCR Engine Events

Through the familiar .NET event mechanism, you can get hooked into every step of document processing, allowing you to finely control how your images are handled. For example, you can request notification during the stage when an image is preprocessed to make it more palatable for the OCR engine, letting you alter what the engine will use for recognition.

In the following C# code snippet, you can see how to hook in your own code to do image preprocessing:



 
   
The Atalasoft OCR engine static   void  Main( string [] args)
The Atalasoft OCR engine
{
The Atalasoft OCR engine    
// create and initialize the engine
The Atalasoft OCR engine
    ExperVisionEngine engine = new ExperVisionEngine(nullnull);
The Atalasoft OCR engine    engine.Initialize();
The Atalasoft OCR engine
The Atalasoft OCR engine    engine.PagePreprocessing 
+=
The Atalasoft OCR engine        
new OcrPagePreprocessingEventHandler(engine_PagePreprocessing);
The Atalasoft OCR engine}

The Atalasoft OCR engine
The Atalasoft OCR engine
private   static   void  engine_PagePreprocessing(
The Atalasoft OCR engine    
object  sender, OcrPagePreprocessingEventArgs e)
The Atalasoft OCR engine
{
The Atalasoft OCR engine    
// override all options
The Atalasoft OCR engine
    e.OptionsOut = 0;
The Atalasoft OCR engine
The Atalasoft OCR engine    AtalaImage imageBW;
The Atalasoft OCR engine    
// convert to black and white, if needed
The Atalasoft OCR engine
    if (e.ImageIn.PixelFormat != PixelFormat.Pixel1bppIndexed)
The Atalasoft OCR engine        imageBW 
= e.ImageIn.GetChangedPixelFormat(
The Atalasoft OCR engine                                 PixelFormat.Pixel1bppIndexed);
The Atalasoft OCR engine    
else
The Atalasoft OCR engine        imageBW 
= e.ImageIn;
The Atalasoft OCR engine
The Atalasoft OCR engine    
// Deskew the image
The Atalasoft OCR engine
    AutoDeskewCommand deskew = new AutoDeskewCommand();
The Atalasoft OCR engine    AtalaImage imageDeskewed 
= deskew.ApplyToImage(imageBW);
The Atalasoft OCR engine    
if (imageBW != imageDeskewed && imageBW != e.ImageIn)
The Atalasoft OCR engine        imageBW.Dispose();
The Atalasoft OCR engine
The Atalasoft OCR engine    
// Hand back to the engine
The Atalasoft OCR engine
    e.ImageOut = imageDeskewed;
The Atalasoft OCR engine}

The Atalasoft OCR engine

As you can see, the amount of work to get hooked in is small, letting you concentrate on the task: processing the image in the way that you want.

The Atalasoft OCR objects let you hook into image processing, image segmentation, and output page construction. There are also events to let you track progress of the engine on a page as well as throughout an entire document. This lets you show your users what they need to know.

Contact Atalasoft directly for more details, or download a 30 day trial of our OCR engine today.

你可能感兴趣的:(Engine)