CIB ocr technical manual (EN)
7. Properties Text-Recognition
OCRConfigs
OCRRegion
PaddingHorizontal
PaddingVertical
OutputFilename
OutputFormat
OutputText
OutputTextLength
OutputType
RegionTemplate
DataFolder
Property-Name | Data-Type | Type |
DataFolder | String | Set |
This property specifies a path to the Tesseract language package „tessdata“.
If the property is empty, it is assumed the folder „tessdata“ is located in the currently used folder.
Syntax
DataFolder=<path>
default=No input
(current folder is used)
Example
DataFolder=C:\Test\Invoice
OCRConfigs
(From CIB ocr version 2.3.1)
Property-Name | Data-Type | Type |
OCRConfigs | String | Set |
Names of Tesseract config files.
All Tesseract config files should be located in $(TESSDATA_PREFIX)\tessdata\configs\
Syntax
OCRConfigs=config_name1[;config_name2[;config_name3...]]
Examples
OCRConfigs=hocr
or
OCRConfigs=hocr;debug
OCRRegion
Property-Name | Data-Type | Type |
OCRRegion | String | Set |
This property specifies a rectangle on a page.
This rectangle is used to define a scan-area to extract the text. That means all characters are ignored which are outside of this scan-area.
A rectangle is defined by two basic points (left,top and right,bottom).
Point of origin is the top-left corner of the page, the unit is mm.
Syntax
OCRRegion=<onerectangle>
<onerectangle>: <left> ";" <top> ";" <right> ";" <bottom>
default=No input
The whole page is scanned if no input is set or if the rectangles given by the coordinates are empty.
Example
OCRRegion=5;5;15;20
PaddingHorizontal
Property-Name | Data-Type | Type |
PaddingHorizontal | String | Set |
This property adds a horizontal padding to the rectangle determined by “OCRRegion” on a page.
The main Use Case is Textrecognition with deeper on a line. This Property allows to further extend the OCRRegion in horizontal direction. This way context information in the image will not get lost.
The unit is %. That means in case of a OCRRegion width of 100 and PaddingHorizontal of 10: The image is extended by 10 pixels left and 10 pixels right. The center of the OCRRegion and the padded rectangle remains.
Syntax
PaddingHorizontal =<integer_value>
default=0
Example
PaddingHorizontal = 10
PaddingVertical
Property-Name | Data-Type | Type |
PaddingVertical | String | Set |
This property adds a vertical padding to the rectangle determined by “OCRRegion” on a page.
The main Use Case is Textrecognition with deeper on a line. This Property allows to further extend the OCRRegion in vertical direction. This way context information in the image will not get lost.
The unit is %. That means in case of a OCRRegion height of 100 and PaddingVertical of 10: The image is extended by 10 pixels on top and 10 pixels at the bottom. The center of the OCRRegion and the padded rectangle remains.
Syntax
PaddingVertical=<integer_value>
default=0
Example
PaddingVertical=10
OutputFilename
Property-Name | Data-Type | Type |
OutputFilename | String | Set |
This property specifies the name of the outputfile.
The property OutputFilename is optional, if it is empty – OutputTextLength and OutputText are used.
The format/extension is described in the next property OutputFormat.
Syntax
OutputFilename=<name>
<name>: name.ext
default=No input, use of OutputTextLength and OutputText.
Example
OutputFilename=Rechnung.txt
OutputFormat
Property-Name | Data-Type | Type |
OutputFormat | String | Set |
This property defines the format of the created outputfile.
Syntax
OutputFormat=<format>
<format>: FormatText | FormatHocr | FormatHocrText
default=FormatHocr
Example
OutputFormat=FormatHocr
OutputText
Property-Name | Data-Type | Type |
OutputText | String | Get |
This property contains the result of text-recognition.
If used, it is also required to define the size of the output buffer with the property OutputTextLength.
Syntax of output:
[textstring]
Example
Das ist der gelesene Text.
OutputTextLength
Property-Name | Data-Type | Type |
OutputTextLength | String | Get |
This property contains the length of the output result and specifies the required size of the output buffer.
Syntax of output:
[integer] (string representation)
Example
1000
OutputType
Property-Name | Data-Type | Type |
OutputType | String | Set |
This property defines whether output should be in memory or to a file.
This property is automatically set depending on whether OutputFilename is set or not. If OutputFilename is set, then OutputType=File is automatically set, otherwise OutputType=Memory is set.
Syntax
OutputType=<type>
<type>: File | Memory
default=File
Example
OutputType=File
RegionTemplate
Property-Name | Data-Type | Type |
RegionTemplate | String | Set |
The property RegionTemplate contains the name of the xfdf-file, where the OCRRegions are defined.
Syntax
RegionTemplate=<filename.xfdf>
default=No input
Example
RegionTemplate=region.xfdf