CIB pdf toolbox technical guide (EN)

10. Extract barcode information from PDF

(from CIB pdf toolbox Version 1.8.0 onwards)

With CIB pdf toolbox Join, barcodes can be read from barcode images in PDF documents. As output format for the barcode information XML or CSV is possible.

The CIB pdf toolbox generates a metafile from PDF, which is further processed by the CIB ocr module. Therefore, this functionality requires the module CIB ocr with a corresponding license.
(from CIB pdf toolbox Version 1.13.3 onwards):

Now it is possible to set all CIB ocr StringProperties with prefix "CibOcr" in their name to the CIB pdf toolbox. These properties will be transferred to the CIB module in case of an OCR or barcode reading request.

Example:

CIB ocr has the property "DatamatrixScanGap=100". The property "CibOcrDatamatrixScanGap=100" is set to the CIB pdf toolbox, which is then transferred as DatamatrixScanGap with value "100" when the CIB pdf toolbox is called.

Property description

Type

Functionality

Kind

BarcodeRange

String

This property contains a list of barcode types separated by semicolons. All images in the PDF are checked to see if they are barcodes of the specified types. The search can also be restricted to areas of the PDF.

Syntax:

<barcoderangelist> ::= <barcodetypes> | <barcoderanges>

<barcoderanges> ::= <onebarcoderange> | <onebarcoderange> “;” <barcoderanges>

<onebarcoderange> ::= “{” <pagenumber> “}” | “{” <pagenumber> “;” <barcodetypes> “}” | “{” <pagenumber> “;” <barcodetype> “;” <rangeleft> “;” <rangebottom> “;” <rangeright> “;” <rangetop> “}”

<pagenumber> ::= <Integer>

<barcodetypes> ::= <onebarcodetype> | <onebarcodetype> “,” <barcodetypes>

<onebarcodetype> ::= “DataMatrix” | “Code128” | “CodeITF” | “Code39” | “Code39Extended” | “QR”

<rangeleft> ::= <Integer>

<rangebottom> ::= <Integer>

<rangeright> ::= <Integer>

<rangetop> ::= <Integer>

<rangeleft>, <rangebottom>, <rangeright>, <rangetop> are positive integers in mm, origin (0;0 is the lower left corner of the PDF page.

„ “: (Property is empty) All images of the PDF are checked for barcode information and this is done with the two barcode types "QR" and "DataMatrix". (default)

Incorrect assigning of the property results in returning error code 100.

See below for examples.

Set

BarcodeInfo

String

The retrieved barcode information is returned to this property.

If an output file (OutputFilename) and a memory area (MemoryOutputCallback) are specified, the output is sent to this file and also to memory.

Hint:
If you enter individual specifications (output file or memory area) or no specifications at all, the value for BarcodeInfo is not set at all.

If no barcode is found in the specified area, the property/output file remains empty.

The format of the output depends on the contents of the OutputFormat property. Possible are OutputFormat= FormatBarcodeXml /FormatBarcodeCsv.

The coordinate origin (0;0) is the lower left corner of the PDF page.

OutputFormat=FormatBarcodeXml :

<barcodeimagexml> ::= <empty value> | „<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?><barcodeimages>“ <barcodeimagelist> „</barcodeimages>“

<barcodeimagelist> ::= <onebarcodeimage> | <onebarcodeimage> <barcodeimagelist>

<onebarcodeimage> ::= „<barcodeimage><pagenumber>“ <pagenumber> „</pagenumber><left>“ <imageleft> „</left><bottom>“ <imagebottom> „</bottom><right>“ <imageright> „</right><top>“ <imagetop> „</top><barcodeinfos>“ <barcodeinfolist> „</barcodeinfos></barcodeimage>“

<pagenumber> ::= <Integer>

<imageleft> ::= <Integer>

<imagebottom> ::= <Integer>

<imageright> ::= <Integer>

<imagetop> ::= <Integer>

<barcodeinfolist> ::= <onebarcodeinfo> | <onebarcodeinfo> <barcodeinfolist>

<onebarcodeinfo> ::= „<barcodeinfo type=”“ <barcodetype> „”“ <barcodecontent> „</barcodeinfo>“

<barcodetype> ::= „DataMatrix“ | „Code128“ | „CodeITF“ | „Code39“ | „Code39Extended“ | „QR“

<barcodecontent> ::= <Text>

 

OutputFormat=FormatBarcodeCsv:

<barcodeimagecsv> ::= <empty value> | <barcodeimagelistcsv>

<barcodeimagelistcsv> ::= <onebarcodeimagerow> | <onebarcodeimagerow> <barcodeimagelistcsv>

<onebarcodeimagerow> ::= <pagenumber> „;“ <imageleft> „;“ <imagebottom> „;“ <imageright> „;“ <imagetop> „;“ <barcodeinfolistcsv> <CR> <LF>

<pagenumber> ::= <Integer>

<imageleft> ::= <Integer>

<imagebottom> ::= <Integer>

<imageright> ::= <Integer>

<imagetop> ::= <Integer>

<barcodeinfolistcsv> ::= <onebarcodeinfocsv> | <onebarcodeinfocsv> „;“ <barcodeinfolistcsv>

<onebarcodeinfocsv> ::= „;“ | <barcodetype> „;“ <barcodecontent>

<barcodetype> ::= „DataMatrix“ | „Code128“ | „CodeITF“ | „Code39“ | „Code39Extended“ | „QR“

<barcodecontent> ::= „““ <Text> „““

Hints regarding CSV:

  • If the original barcode text contains a quotation mark ", this is replaced in CSV format by two consecutive quotation marks "".
  • Every barcode image has its own "<onebarcodeimagerow>" line. If a PDF image consists of image and mask and should both contain barcode information, they are entered as two "<onebarcodeimagerow>" lines with the same page number and the same legal area
  • For every barcode image, the barcode information for different barcode types is specified at the end, separated by the semicolon. First the bar code type and then the bar code content is specified for each bar code info
  • To ensure that each CSV line consists of the same number of columns (separated by a semicolon), a corresponding number of ";;" is appended at the end of each CSV line (= <onebarcodeimagerow>).

See below for examples

Get

OCRDebug

String

This property is only relevant for technical test purposes. It controls the output of images from the PDF.

Possible values:

„1“     The intermediate steps of barcode extraction using OCR get individual output files.

„0“     No output for intermediate steps. (default)

For every single image in the PDF document, which was transferred to the CIB Ocr Dll, the outputs are several files:

  • The image itself as a bitmap as it was passed to OCR.
  • The barcode result of the CIB Ocr Dll, if the barcode result is not empty. If the bar code result is empty, there is no output for this file.

The files are written to the same directory as the output file. The names of these files have the following form:

“output-file”__Page_”page-number”_Image_“image-number“_“file-extension“

The following applies to the "file extension“:

  • For bitmap file: „.bmp“
  • For the barcode result of CIB Ocr Dll: „_BARCODE.txt“

Example:
If the output file is called „Output.xml“, there will be the following files for the 4. image of the 3rd page:
Output.xml__Page_3_Image_4.bmp
Output.xml__Page_3_Image_4_BARCODE.txt

Set

BarcodeRecognitionMode

String

This property controls which method of barcode recognition is to use.

"RecognizeImages": The original method is used. CIB pdf toolbox provides separate image objects to CIB ocr for recognition. (default)

"RecognizePages": CIB pdf toolbox provides complete page images to CIB ocr for recognition.

Note:

Using RecognizePages could be slower than the original method since complete image for page should be recognized.