CIB DeepER API Documentation (EN)

Site: CIB eLearning
Course: CIB DeepER
Book: CIB DeepER API Documentation (EN)
Printed by: Guest user
Date: Sunday, 28 April 2024, 8:28 PM

API-Description

This API documentation is intended for CIB DeepER version 2.9.0.

Instead of working directly with the API, other options are using the text recognition module CIB ocr or the pdf module CIB pdf toolbox. The corresponding documentation can be found here:

OCR Call

DeepER is accessed by a REST Interface.

For requesting an OCR-result, a POST-Request has to be sent to https://backend-ocr.cib.de/ocr/v4

Credentials have to be provided via the authentication header.

Supported input formats are PNG and JPG. To process an image, it has to be provided as a file in the request body. The content-type should be 'application/octet-stream'. The naming of the field must be 'file'.

In Python, a basic client implementation looks like this:

import requests
 files = {
           'file': ('image.png', open(r'\path\to\image.png', 'rb'), 
             'application/octet-stream'),
      }
        response = requests.post('http://backend-ocr.cib.de/ocr/v4',
                                files=files, 
                                auth=('<<username>>', '<<password>>'))
        json_response = response.json()

OCR Result: Page Layout with Text

The Return value of the OCR POST-Request is a JSON-object that stores the information of recognized text together with some additional layout information.

An example OCR Result looks like this:

    {
   "root": {
     "versions": [
       [
         "CIB deepER",
         "2.9.0"
       ]
     ],
     "angle": 0,
     "attributes": {},
     "id": "image_1",
     "image": "image1.png",
     "type": "image",
     "children": [
       {
         "attributes": {},
         "id": "page_1",
         "type": "page",
         "width": 2479,
         "height": 3508,
         "number": 1,
         "children": [
           {
             "attributes": {},
             "confidence": null,
             "top": 585,
             "width": 711,
             "id": "line_1",
             "type": "line",
             "left": 245,
             "height": 32,
             "text": "",
             "children": [
               {
                 "attributes": {},
                 "confidence": 0.70,
                 "children": [],
                 "top": 585,
                 "width": 96,
                 "id": "word_1",
                 "type": "word",
                 "left": 245,
                 "height": 32,
                 "text": "Software"
               },
               {
                 "attributes": {},
                 "confidence": 0.74,
                 "children": [],
                 "top": 585,
                 "width": 153,
                 "id": "word_2",
                 "type": "word",
                 "left": 350,
                 "height": 32,
                 "text": "Entwicklung"
               }
             ],
           },
           {
             "attributes": {},
             "confidence": null,
             "top": 626,
             "width": 147,
             "id": "line_2",
             "type": "line",
             "left": 247,
             "height": 32,
             "text": "",
             "children": [
               {
                 "attributes": {},
                 "confidence": 0.72,
                 "children": [],
                 "top": 626,
                 "width": 147,
                 "id": "word_8",
                 "type": "word",
                 "left": 247,
                 "height": 32,
                 "text": "München"
               }
             ],
           }
         ],
       }
     ],
   }
 }

    

The layout exists of a dictionary structure, which includes various kinds of nodes (=sub-dictionaries), hierarchically nested:

  • Root node
    • Image node
      • Page node
        • Line node(s)
          • Word node(s)

All nodes (except root) have the attibute 'children', which is a list with all nodes of the subordinate instance.


Root node

The Root node stores only one key 'root' with the Image node as value.

Image node

The Image node stores information about the image, like name and deskew angle. The key 'children' has a value of type list, whose only element is a Page node.

Page node

The Page node stores information like the page coordinates in pixel (integer) and the page number (integrer), starting with 1. A 'children' key stores a list of all line nodes.

Line node

Line nodes store information like the coordinates (top, left, height, witdth in pixel, integer). The values 'confidence' and 'text' are only filled in the lowest nodes, means in Word nodes. These are for every line stored in the value of 'children', which is a list of nodes again. The text of a line can be reproduced by concatenating the text of all its words, separated by blank symbols.

Word node

Each word node stores information about the coordinates, recognized text and recognition confidence. Confidence is a float value between 0 and 1.

Blank symbols are not explicitely listed in the 'text'-values. A blank is always positioned between two consecutive words of the same line. If the space between two words is longer than a normal blank, the line is split into two lines next to each other.









Plain Text Output

As an alternative, it is possible to receive the OCR result as plain text. For that, in addition to the image, a JSON-File with the key-value-pair {'output_format': 'plain_text'} has to be uploaded. The content-type must be 'application/json'. The naming of the field must be 'json'.

In Python, a client implementation looks like this:

import requests
import json

files = {
    'file': ('image.png', open(r'\path\to\image.png', 'rb'), 
    	      'application/octet-stream'),
    'json': (None, json.dumps({'output_format': 'plain_text'}),
             'application/json')
}
response = requests.post('http://backend-ocr.cib.de/ocr/v4', files=files, auth=('<>', '<>'))
json_response = response.json()

 

In this case, the response would be a JSON-Object with three keys:

  • 'image': filename of the original image (String)
  • 'text': This is the recognized text as one String. Line breaks are encoded as '\n'. Backslashes that appear in the original text are escaped through a second backslash.
  • 'versions': Information about the used OCR Engine.


An example OCR Result with plain text output looks like this:

{
  "image": "schaefer.png",
  "text": "Software Entwicklung\nMünchen",
  "versions": [
    [
      "CIB deepER",
      "2.9.0"
    ]
  ]
}

Character Recognition

The following characters can be recognized. Supported are particularly the languages German, English and Spanish.

  • ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜÁÉÍÑÓÚ
  • abcdefghijklmnopqrstuvwxyzßäöüáéíñóú
  • 0123456789
  • ¿?¡!,-_.:;„“»›‹«”’‚‘
  • []{}()÷<>#%&*+/|\~•=
  • $£§€@©