CIB DeepER API Documentation (EN)

API-Description

OCR Result: Page Layout with Text

The Return value of the OCR POST-Request is a JSON-object that stores the information of recognized text together with some additional layout information.

An example OCR Result looks like this:

    {
   "root": {
     "versions": [
       [
         "CIB deepER",
         "2.9.0"
       ]
     ],
     "angle": 0,
     "attributes": {},
     "id": "image_1",
     "image": "image1.png",
     "type": "image",
     "children": [
       {
         "attributes": {},
         "id": "page_1",
         "type": "page",
         "width": 2479,
         "height": 3508,
         "number": 1,
         "children": [
           {
             "attributes": {},
             "confidence": null,
             "top": 585,
             "width": 711,
             "id": "line_1",
             "type": "line",
             "left": 245,
             "height": 32,
             "text": "",
             "children": [
               {
                 "attributes": {},
                 "confidence": 0.70,
                 "children": [],
                 "top": 585,
                 "width": 96,
                 "id": "word_1",
                 "type": "word",
                 "left": 245,
                 "height": 32,
                 "text": "Software"
               },
               {
                 "attributes": {},
                 "confidence": 0.74,
                 "children": [],
                 "top": 585,
                 "width": 153,
                 "id": "word_2",
                 "type": "word",
                 "left": 350,
                 "height": 32,
                 "text": "Entwicklung"
               }
             ],
           },
           {
             "attributes": {},
             "confidence": null,
             "top": 626,
             "width": 147,
             "id": "line_2",
             "type": "line",
             "left": 247,
             "height": 32,
             "text": "",
             "children": [
               {
                 "attributes": {},
                 "confidence": 0.72,
                 "children": [],
                 "top": 626,
                 "width": 147,
                 "id": "word_8",
                 "type": "word",
                 "left": 247,
                 "height": 32,
                 "text": "München"
               }
             ],
           }
         ],
       }
     ],
   }
 }

    

The layout exists of a dictionary structure, which includes various kinds of nodes (=sub-dictionaries), hierarchically nested:

  • Root node
    • Image node
      • Page node
        • Line node(s)
          • Word node(s)

All nodes (except root) have the attibute 'children', which is a list with all nodes of the subordinate instance.


Root node

The Root node stores only one key 'root' with the Image node as value.

Image node

The Image node stores information about the image, like name and deskew angle. The key 'children' has a value of type list, whose only element is a Page node.

Page node

The Page node stores information like the page coordinates in pixel (integer) and the page number (integrer), starting with 1. A 'children' key stores a list of all line nodes.

Line node

Line nodes store information like the coordinates (top, left, height, witdth in pixel, integer). The values 'confidence' and 'text' are only filled in the lowest nodes, means in Word nodes. These are for every line stored in the value of 'children', which is a list of nodes again. The text of a line can be reproduced by concatenating the text of all its words, separated by blank symbols.

Word node

Each word node stores information about the coordinates, recognized text and recognition confidence. Confidence is a float value between 0 and 1.

Blank symbols are not explicitely listed in the 'text'-values. A blank is always positioned between two consecutive words of the same line. If the space between two words is longer than a normal blank, the line is split into two lines next to each other.