Open Text Capture Recognition Engine
Features
- Precise recognition of all common hand-print and machine-print character sets
- Optimal accuracy through excellent image preprocessing
- Processing of forms and documents with different layouts, eg, dropout colour forms, black and white sources, fax images
- Intelligent colour filtering and binarisation of colour or greyscale documents
- Extended binary image processing with Advanced Imaging (ADI): correction of lines, dot shading and inverted print
- Barcode recognition
- Special classifiers facilitate worldwide use
- Processing of different types of forms in one batch – no manual presorting required
- Automatic location and reading of fields using search strings
- Additional improvement of recognition results using customer-specific dictionaries
- Text Layout Analysis (TLA) for creating and outputting text blocks
- Easy-to-use interfaces for system integration
- Scalable depending on throughput requirements
- Tools for the definition of form classification, for testing and analysis of results
- Comprehensive documentation with programming examples for integrating Open Text Capture Recognition Engine into different system environments
- Excellent price/performance ratio
Recognition engine
The RecoStar engine is the recognition application’s core component, containing the algorithms of the two leading international character recognition engines, RecoStar and PSW6120. In Open Text Capture Recognition Engine, both engines are used in parallel; the recognition results are voted on. The combination of different recognition methods and intelligent matching of preliminary results achieves a recognition quality that is not possible using one single recognition method or a conventional voter. The engines can be selected individually for each field so that their specific strengths are taken advantage of. Recognition in the basic RecoStar product only uses the RecoStar engine and does not include voting.
Character sets and classifiers
Hand print: numeric and alphanumeric, upper and lower case; machine print: numeric and alphanumeric, OCRA, OCR-B, Farrington 7B, E13B, CMC7.
Depending on the recognition engine specified and the texts to be read, special country-specific classifiers can also be used. These facilitate worldwide use of the engine.
Barcode
The following barcode types are supported:
1-D (Codabar, Code 128, Code 2 of 5, Code 3 of 9, Code 93, EAN 13, EAN 8, Interleaved 2 of 5, Patch Code, PostNet, UCC 128, UPC-A, UPC-E) , 2-D (PDF 417)

Barcodes can be searched and read in various Open Text Capture Recognition Engine operating modes. They can also be hidden for subsequent processing.
Voting, classifiers, barcodes
The recognition data gained from parallel use of the integrated engines are optimised using a voting procedure. The combination of different recognition algorithms and intelligent matching of preliminary results achieves an excellent recognition rate of unequalled precision (extremely low error rate).

Voting examples
Read more about this topic in our white paper ‘ Improving OCR & ICR Accuracy Through Expert Voting’
Advanced Forms Handling (AFH)
The following features are provided as part of the RecoStar option Advanced Forms Handling (AFH): coordinate systems, orientation marks, string search, object search, box reading and line removal.
Coordinate system
A document-wide x/y coordinate system can be used as a measurement reference system. RecoStar provides four functions for measuring images/documents and positioning read zones. These functions search the visual information available for a useable coordinate system to reference read zones and images.
Orientation marks
The geometric objects printed on the document (angles, rectangles) are recognised. These are treated as measurement objects and are used to define the measurement reference system.
Line removal
When the function ‘Read from document fields’ is used, line removal is activated automatically.
Box reading
Boxes on forms and documents usually consist of a rectangular shape containing user instructions for entering information, such as characters or other markings. The task of the recognition process is to extract or verify content from boxes while ignoring the accompanying rectangular shape. RecoStar provides this function for single and nested boxes.
Rotation of forms
Once the form type has been determined, the form can be rotated in 90-degree steps and reprocessed without requiring a new definition of the form elements. This means that rotated and nonrotated forms can be processed in the same application with minimal effort.
Advanced Imaging (ADI) Binärbildverarbeitung
Efficient, intelligent image preprocessing can significantly improve recognition performance. In previous versions, Open Text Capture Recognition Engine has already been providing familiar binary image processing features for forms such as line removal and dirt removal. Preparing business documents (business letters, invoices, delivery notes etc) requires dependable recognition quality and short processing times, eg, for automated capturing of incoming company mail. As of version 2.6, additional binary image processing functions are available for eliminating dot shading, correcting inverted print and complex line systems, and erasing hole punch markings.
From version 2.6, the binary image processing functions described here are provided as part of the Advanced Imaging (ADI) option. The image is first prepared for subsequent processing steps by eliminating any interfering contours. These may be caused by graphical elements, dirt, dot shading, inverted areas etc.
The following processing functions are used to eliminate interfering contours:
- Remove Shading Detects small contours caused by low-grade paper or bad printing/scanning quality; these usually appear in large numbers (eg, noise, dot shading)

Eliminating noise
-
GraphicLines
Independent of their rotational angle, lines (including broken lines or lines smudged into bordering objects) are recognised as logical objects and erased from the image – this process is accompanied by character reconstruction if necessary. Even small sections, should they be recognisable as elements of a box, can be removed successfully. If box lines are removed from a ‘box’ graphical object, the resulting read data are formatted as a text block


Box analysis

Slope correction, line removal with character reconstruction, layout reproduction for inhomogeneous lines
- InversPrint Intelligent treatment of inverse text and boxes

treatment of inverse text and boxes - BarCode Locating, reading and erasing barcodes from the image
- Punching Removes contours of two or four-hole punch marks, as well as of the three-hole punch marks typically encountered in the US
- PaperArea If there is a black border, this is removed. The image angle is corrected if necessary
Colour filtering and greyscale image processing
Integrated image preprocessing for greyscale and colour images is provided by the Advanced Imaging (ADI) option. Image preprocessing is performed in three steps: colour filtering, greyscale image enhancement and binary image conversion . Supported standard formats include:
- TIFF (uncompressed), FAX, JPEG
- JFIF
- BMP
- PCX binary
- BO (RecoStar)
If required, in addition to the standard greyscale image extraction methods, digital colour filtering can be used instead of optical filters. A digital filter is capable of eliminating multiple colour backgrounds from a document. Depending on the application, digital colour filtering is defined with a DesignTool. Greyscale images created from digital colour filtering are subsequently processed and optimised for character recognition.
The binarisation of greyscale images used by Open Text Document Technologies is optimised for character recognition. This is an improvement on the usual binary image conversion methods, which are optimised for processing photos and other graphic images. By analysing surrounding information and assigning dynamic thresholds, the algorithms can isolate the required text for character recognition even if the backgrounds interfere.

Example of colour image recognition
Text Layout Analysis
From version 2.6, Open Text Capture Recognition Engine provides Text Layout Analysis (TLA) as part of the Advanced Imaging (ADI) option. In Text Layout Analysis, read data are analysed in relation to their document position and subsequently compiled into representative text blocks.

Compilation of text blocks and lines

International
Deutsch
Française
Italiano
USA
