CIB DeepER API Documentation (EN)
API-Description
OCR Result: Page Layout with Text
The Return value of the OCR POST-Request is a JSON-object that stores the information of recognized text together with some additional layout information.
An example OCR Result looks like this:
{ "root": { "versions": [ [ "CIB deepER", "2.9.0" ] ], "angle": 0, "attributes": {}, "id": "image_1", "image": "image1.png", "type": "image", "children": [ { "attributes": {}, "id": "page_1", "type": "page", "width": 2479, "height": 3508, "number": 1, "children": [ { "attributes": {}, "confidence": null, "top": 585, "width": 711, "id": "line_1", "type": "line", "left": 245, "height": 32, "text": "", "children": [ { "attributes": {}, "confidence": 0.70, "children": [], "top": 585, "width": 96, "id": "word_1", "type": "word", "left": 245, "height": 32, "text": "Software" }, { "attributes": {}, "confidence": 0.74, "children": [], "top": 585, "width": 153, "id": "word_2", "type": "word", "left": 350, "height": 32, "text": "Entwicklung" } ], }, { "attributes": {}, "confidence": null, "top": 626, "width": 147, "id": "line_2", "type": "line", "left": 247, "height": 32, "text": "", "children": [ { "attributes": {}, "confidence": 0.72, "children": [], "top": 626, "width": 147, "id": "word_8", "type": "word", "left": 247, "height": 32, "text": "München" } ], } ], } ], } }
The layout exists of a dictionary structure, which includes various kinds of nodes (=sub-dictionaries), hierarchically nested:
- Root node
- Image node
- Page node
- Line node(s)
- Word node(s)
All nodes (except root) have the attibute 'children', which is a list with all nodes of the subordinate instance.
Root node
The Root node stores only one key 'root' with the Image node as value.
Image node
The Image node stores information about the image, like name and deskew angle. The key 'children' has a value of type list, whose only element is a Page node.
Page node
The Page node stores information like the page coordinates in pixel (integer) and the page number (integrer), starting with 1. A 'children' key stores a list of all line nodes.
Line node
Line nodes store information like the coordinates (top, left, height,
witdth in pixel, integer). The values 'confidence' and 'text' are only filled
in the lowest nodes, means in Word nodes. These are for every line stored in
the value of 'children', which is a list of nodes again. The text of a line can
be reproduced by concatenating the text of all its words, separated by blank
symbols.
Word node
Each word node stores information about the coordinates, recognized text and recognition confidence. Confidence is a float value between 0 and 1.
Blank symbols are not explicitely listed in the 'text'-values. A blank is always positioned between two consecutive words of the same line. If the space between two words is longer than a normal blank, the line is split into two lines next to each other.