CIB ocr technical manual (EN)

7. Properties Text-Recognition

DataFolder
OCRConfigs
OCRRegion
PaddingHorizontal
PaddingVertical
OutputFilename
OutputFormat
OutputText
OutputTextLength
OutputType
RegionTemplate

DataFolder

Property-Name 

Data-Type 

Type 

DataFolder 

String 

Set 

 

This property specifies a path to the Tesseract language package „tessdata“.  

If the property is empty, it is assumed the folder „tessdata“ is located in the currently used folder. 

 

Syntax 

DataFolder=<path> 

default=No input 
(current folder is used) 

 

Example 

DataFolder=C:\Test\Invoice 

OCRConfigs

(From CIB ocr version 2.3.1) 

Property-Name 

Data-Type 

Type 

OCRConfigs 

String 

Set 

 

Names of Tesseract config files. 

All Tesseract config files should be located in $(TESSDATA_PREFIX)\tessdata\configs\ 

 

Syntax 

OCRConfigs=config_name1[;config_name2[;config_name3...]] 

 

Examples 

OCRConfigs=hocr 

or 

OCRConfigs=hocr;debug 

OCRRegion

Property-Name 

Data-Type 

Type 

OCRRegion 

String 

Set 

 

This property specifies rectangle on a page. 
This rectangle is used to define a scan-area to extract the text. That means all characters are ignored which are outside of this scan-area.  
A rectangle is defined by two basic points (left,top and right,bottom).  
Point of origin is the top-left corner of the page, the unit is mm.  

 

Syntax 

OCRRegion=<onerectangle> 
<onerectangle>: <left> ";" <top> ";" <right> ";" <bottom> 

default=No input 

The whole page is scanned if no input is set or if the rectangles given by the coordinates are empty. 

 

Example 

OCRRegion=5;5;15;20 

PaddingHorizontal

Property-Name 

Data-Type 

Type 

PaddingHorizontal 

String 

Set 

 

This property adds a horizontal padding to the rectangle determined by “OCRRegion” on a page. 
The main Use Case is Textrecognition with deeper on a line. This Property allows to further extend the OCRRegion in horizontal direction. This way context information in the image will not get lost.  

The unit is %. That means in case of a OCRRegion width of 100 and PaddingHorizontal of 10: The image is extended by 10 pixels left and 10 pixels right. The center of the OCRRegion and the padded rectangle remains. 

Syntax 

PaddingHorizontal =<integer_value> 

default=0 

Example 

PaddingHorizontal = 10 

PaddingVertical

Property-Name 

Data-Type 

Type 

PaddingVertical 

String 

Set 

 

This property adds a vertical padding to the rectangle determined by “OCRRegion” on a page. 
The main Use Case is Textrecognition with deeper on a line. This Property allows to further extend the OCRRegion in vertical direction. This way context information in the image will not get lost.  

The unit is %. That means in case of a OCRRegion height of 100 and PaddingVertical of 10: The image is extended by 10 pixels on top and 10 pixels at the bottom. The center of the OCRRegion and the padded rectangle remains. 

Syntax 

PaddingVertical=<integer_value> 

default=0 

 

Example 

PaddingVertical=10 

OutputFilename

Property-Name 

Data-Type 

Type 

OutputFilename 

String 

Set 

 

This property specifies the name of the outputfile.  
The property OutputFilename is optional, if it is empty  OutputTextLength and OutputText are used. 
The format/extension is described in the next property OutputFormat. 

 

Syntax 

OutputFilename=<name> 
<name>: name.ext  

default=No input, use of OutputTextLength and OutputText. 

 

Example 

OutputFilename=Rechnung.txt 

OutputFormat

Property-Name 

Data-Type 

Type 

OutputFormat 

String 

Set 

 

This property defines the format of the created outputfile. 

 

Syntax 

OutputFormat=<format> 
<format>: FormatText | FormatHocr FormatHocrText 

default=FormatHocr 

 

Example 

OutputFormat=FormatHocr 

OutputText

Property-Name 

Data-Type 

Type 

OutputText 

String 

Get 

 

This property contains the result of text-recognition. 
If used, it is also required to define the size of the output buffer with the property OutputTextLength. 

 

Syntax of output: 

[textstring] 

 

Example 

Das ist der gelesene Text. 

OutputTextLength

Property-Name 

Data-Type 

Type 

OutputTextLength 

String 

Get 

 

This property contains the length of the output result and specifies the required size of the output buffer. 

 

Syntax of output: 

[integer] (string representation)  

 

Example 

1000 

OutputType

Property-Name 

Data-Type 

Type 

OutputType 

String 

Set 

 

This property defines whether output should be in memory or to a file. 
This property is automatically set depending on whether OutputFilename is set or not. IOutputFilename is set, then OutputType=File is automatically setotherwise OutputType=Memory is set.  

Syntax 

OutputType=<type> 
<type>: File | Memory  

default=File 

 

Example 

OutputType=File 

RegionTemplate

Property-Name 

Data-Type 

Type 

RegionTemplate 

String 

Set 

 

The property RegionTemplate contains the name of the xfdf-file, where the OCRRegions are defined. 

 

Syntax 

RegionTemplate=<filename.xfdf> 

default=No input 

 

Example 

RegionTemplate=region.xfdf