CIB pdf toolbox technical guide (EN)
10. Extract barcode information from PDF
(from CIB pdf toolbox Version 1.8.0 onwards)
With CIB pdf toolbox Join, barcodes can be read from barcode images in PDF documents. As output format for the barcode information XML or CSV is possible.
The CIB pdf
toolbox generates a metafile from PDF, which is further processed by the CIB
ocr module. Therefore, this functionality requires the module CIB ocr with a
corresponding license.
(from CIB pdf toolbox Version 1.13.3 onwards):
Now it is possible to set all CIB ocr StringProperties with prefix "CibOcr" in their name to the CIB pdf toolbox. These properties will be transferred to the CIB module in case of an OCR or barcode reading request.
Example:
CIB ocr has the property "DatamatrixScanGap=100". The property "CibOcrDatamatrixScanGap=100" is set to the CIB pdf toolbox, which is then transferred as DatamatrixScanGap with value "100" when the CIB pdf toolbox is called.
Property description |
Type |
Functionality |
Kind |
BarcodeRange |
String |
This property contains a list of barcode types separated by semicolons. All images in the PDF are checked to see if they are barcodes of the specified types. The search can also be restricted to areas of the PDF. Syntax: <barcoderangelist> ::= <barcodetypes> | <barcoderanges> <barcoderanges> ::= <onebarcoderange> | <onebarcoderange> “;” <barcoderanges> <onebarcoderange> ::= “{” <pagenumber> “}” | “{” <pagenumber> “;” <barcodetypes> “}” | “{” <pagenumber> “;” <barcodetype> “;” <rangeleft> “;” <rangebottom> “;” <rangeright> “;” <rangetop> “}” <pagenumber> ::= <Integer> <barcodetypes> ::= <onebarcodetype> | <onebarcodetype> “,” <barcodetypes> <onebarcodetype> ::= “DataMatrix” | “Code128” | “CodeITF” | “Code39” | “Code39Extended” | “QR” <rangeleft> ::= <Integer> <rangebottom> ::= <Integer> <rangeright> ::= <Integer> <rangetop> ::= <Integer> <rangeleft>, <rangebottom>,
<rangeright>, <rangetop> are positive integers in mm, origin (0;0
is the lower left corner of the PDF page. „ “: (Property is empty) All images of the PDF are checked for barcode information and this is done with the two barcode types "QR" and "DataMatrix". (default) Incorrect assigning of the property results in returning error code 100. See below for examples. |
Set |
BarcodeInfo |
String |
The retrieved barcode information is returned to this property. If an output file (OutputFilename) and a memory area (MemoryOutputCallback) are specified, the output is sent to this file and also to memory. Hint: If no barcode is found in the specified area, the property/output file remains empty. The format of the output depends on the contents of the OutputFormat property. Possible are OutputFormat= FormatBarcodeXml /FormatBarcodeCsv. The coordinate origin (0;0) is the lower left corner of the PDF page. OutputFormat=FormatBarcodeXml : <barcodeimagexml> ::= <empty value> | „<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?><barcodeimages>“ <barcodeimagelist> „</barcodeimages>“ <barcodeimagelist> ::= <onebarcodeimage> | <onebarcodeimage> <barcodeimagelist> <onebarcodeimage> ::= „<barcodeimage><pagenumber>“ <pagenumber> „</pagenumber><left>“ <imageleft> „</left><bottom>“ <imagebottom> „</bottom><right>“ <imageright> „</right><top>“ <imagetop> „</top><barcodeinfos>“ <barcodeinfolist> „</barcodeinfos></barcodeimage>“ <pagenumber> ::= <Integer> <imageleft> ::= <Integer> <imagebottom> ::= <Integer> <imageright> ::= <Integer> <imagetop> ::= <Integer> <barcodeinfolist> ::= <onebarcodeinfo> | <onebarcodeinfo> <barcodeinfolist> <onebarcodeinfo> ::= „<barcodeinfo type=”“ <barcodetype> „”“ <barcodecontent> „</barcodeinfo>“ <barcodetype> ::= „DataMatrix“ | „Code128“ | „CodeITF“ | „Code39“ | „Code39Extended“ | „QR“ <barcodecontent> ::= <Text>
OutputFormat=FormatBarcodeCsv: <barcodeimagecsv> ::= <empty value> | <barcodeimagelistcsv> <barcodeimagelistcsv> ::= <onebarcodeimagerow> | <onebarcodeimagerow> <barcodeimagelistcsv> <onebarcodeimagerow> ::= <pagenumber> „;“ <imageleft> „;“ <imagebottom> „;“ <imageright> „;“ <imagetop> „;“ <barcodeinfolistcsv> <CR> <LF> <pagenumber> ::= <Integer> <imageleft> ::= <Integer> <imagebottom> ::= <Integer> <imageright> ::= <Integer> <imagetop> ::= <Integer> <barcodeinfolistcsv> ::= <onebarcodeinfocsv> | <onebarcodeinfocsv> „;“ <barcodeinfolistcsv> <onebarcodeinfocsv> ::= „;“ | <barcodetype> „;“ <barcodecontent> <barcodetype> ::= „DataMatrix“ | „Code128“ | „CodeITF“ | „Code39“ | „Code39Extended“ | „QR“ <barcodecontent> ::= „““ <Text> „““ Hints regarding CSV:
See below for examples |
Get |
OCRDebug |
String |
This property is only relevant for technical test purposes. It controls the output of images from the PDF. Possible values: „1“ The intermediate steps of barcode extraction using OCR get individual output files. „0“ No output for intermediate steps. (default) For every single image in the PDF document, which was transferred to the CIB Ocr Dll, the outputs are several files:
The files are written to the same directory as the output file. The names of these files have the following form: “output-file”__Page_”page-number”_Image_“image-number“_“file-extension“ The following applies to the "file extension“:
Example: |
Set |
BarcodeRecognitionMode |
String |
This property controls which method of barcode recognition is to use. "RecognizeImages": The original method is used. CIB pdf toolbox provides separate image objects to CIB ocr for recognition. (default) "RecognizePages": CIB pdf toolbox provides complete page images to CIB ocr for recognition. Note: Using RecognizePages could be slower than the original method since complete image for page should be recognized. |
|