CIB ocr technical manual (EN)
5. Usecase: Calling CIB ocr Via CIB job/CIB documentServer
(From CIB ocr version 2.3.0 and CIB job version 1.8.0)
For using CIB ocr via CIB job or CIB documentServer the corresponding XML schema has to defined. This XML can be used for CIB documentServer request. The following XML-example uses CIB ocr to extract text from an image-file using Tesseract library.
In load-step inputfile is loaded into memory.
In ocr-step first a valid license has to be set, “Tesseract” is the OCRlibrary to be used and OutputFormat is “FormatHocr”. Extracted text is in German ("OCRLanguage"=deu).
In save-step in-memory-output is written into a file.
Example:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<root>
<Comod>
<defaults>
<properties command="job">
<property name="OutputMode">XML</property>
<property name="UseInMemoryProcessing">1</property>
</properties>
</defaults>
<jobs>
<job name="tesseract" expected-result-code="404">
<steps>
<step name="LoadStep" command="load">
<properties>
<property name="InputFilename">./input/input.png</property>
</properties>
</step>
<step name="OcrStep" expected-result-code="1000" command="ocr">
<properties>
<property name="LicenseCompany">CustomerLicensee</property>
<property name="LicenseKey">4444-cccc-88888888</property>
<property name="OCRLibraryName">Tesseract</property>
<property name="DataFolder">.</property>
<property name="OutputFormat">FormatHocr</property>
<property name="TraceFilename">OCR_trace.log</property>
<property name="OCRLanguage">deu</property>
<property name="TracePreprocessOutput">0</property>
</properties>
</step>
<step name="SaveStep" expected-result-code="0" command="save">
<properties>
<property name="OutputFilename">./ocr_out.html</property>
</properties>
</step>
</steps>
</job>
</jobs>
</Comod>
</root>