CIB ocr technical manual (EN)

6. General properties

6.11. Preprocess

Allgemein
Preprocessor details
     ▸ Thresholding Methods
     ▸ Median filters
     ▸ BilateralFilter
     ▸ Thinning Algorithms
     ▸ DeSkew
     ▸ Invert / AutoInvert
     ▸ AutoRotate
Composite Algorithms

Allgemein

Property-Name 

Datentyp 

Art 

Preprocess 

String 

Set 

 

This property allows to specify methods for  preprocessing the inputfile in order to  get a better ocr-result. 

 

Syntax  

Preprocess: <preprocessnames>
<preprocessnames>= <preprocessor> | <preprocessor> “+” <preprocessnames>
<preprocessor>: NativeAdaptiveThresholding | PureMedianBlur | PureAdaptiveThresholding |
PureAdaptiveGaussianThresholding | MedianBlurAGT | MedianBlurAT | MedianBlurGAT |
SauvolaThresholding | NiblackThresholding | WolfJolionThresholding | NickThresholding |
FengThresholding | OtsuThresholding | BilateralFilter | ThinningZhangSuen |
ThinningGuoHall | DeSkew | Invert | AutoInvert | Composite

No default

 

Example 

Preprocess = NativeAdaptiveThresholding 


Preprocessor details
Thresholding Methods is the simplest method of image segmentation. From a grayscale image, thresholding can be used to create binary images. 
The simplest thresholding methods replace each pixel in an image with a black pixel if the image intensity is less than some fixed constant T, or a white pixel if the image intensity is greater than that constant. 

Adaptive Thresholding

Using a global value as threshold value  may not be good in all conditions where  an  image has different lighting conditions in different areas. In that case, we go for adaptive thresholding. Adaptive thresholding means that  the algorithm calculates the threshold for small regions of the image.  Thus we get different thresholds for different regions of the same image and this gives us better results for images with varying illumination. 

It has three ‘special’ input parameters and only one output argument.  

Adaptive Method - It decides how the  thresholding value is calculated. 

  • cv2.ADAPTIVE_THRESH_MEAN_C : threshold value is the mean of the neighborhood area.
  • cv2.ADAPTIVE_THRESH_GAUSSIAN_C : threshold value is the weighted sum of neighborhood values where weights are a Gaussian window.

Block Size - It decides the size of the  neighborhood area. 

C - It is just a constant which is subtracted from the mean or weighted mean calculated. 

 

NativeAdaptiveThresholding 

This is complex filter which consists of the following steps using OpenCV library:  

  • cv::medianBlur()
  • cv::adaptiveThreshold() using CV_ADAPTIVE_THRESH_MEAN_C threshold type
  • cv::bilateralFilter()
  • The result is a grayscale image;

PureAdaptiveThresholding  

While the conventional thresholding operator uses a global threshold for all pixels, adaptive thresholding changes the threshold dynamically over the image. This more sophisticated version of thresholding can accommodate changing lighting conditions in the image , e.g. those occurring as a result of a strong illumination gradient or shadows.   

PureAdaptiveThresholding consist only of one step:  

  • cv::adaptiveThreshold() using CV_ADAPTIVE_THRESH_MEAN_C threshold type
  • The result is a binary image;

Alternative: PureAdaptiveGaussianThresholding

Thresholding based on standard deviation

The methods described in the following sections - FengThresholding, SauvolaThresholding, NiblackThresholding, WolfJolionThresholding, NickThresholding - differ only by the final formula for the thresholding value for particular pixel, but use the same matrixes with standard deviation.


SauvolaThresholding

The basic idea behind Sauvola is that if there is a lot of local contrast, the threshold should be chosen close to the mean value, whereas if there is very little contrast, the threshold should be chosen below the mean, by an amount proportional to the normalized local standard deviation.

 

NiblackThresholding

Niblack’s method can be considered as the first local threshold method. It has the advantage of detecting the text but it introduces a lot of background noise. Sauvola and Pietikinen modified the Niblack threshold to decrease the background noise but the text detection rate is also decreased while bleed-through still remains in most cases.

 

WolfJolionThresholding

In particular, for most colored images the Wolfjolion preprocessor allows to achieve the best quality of recognition as well as for images with background noise and anti-aliased font.

 

NickThresholding

Nick's binarization derives its thresholding formula from the basic Niblack algorithm, the parent of many local image thresholding methods. The major advantage of Nick's method over Niblack is that it considerably improves binarization for "white" and light page images by shifting down the binarization threshold.

 

FengThresholding

The Feng thresholding method is interesting because it can qualitatively outperform the Sauvola thresholding method. However, the Feng method contains many parameters which have to be set. Hence this method was never widely accepted.

 

OtsuThresholding

Considering a bimodal image (a bimodal image is an image whose histogram has two peaks) we can approximately take a value in the middle of those peaks as threshold value. That is what Otsu binarization does. So it automatically calculates a threshold value from an image’s histogram for a bimodal image. (For images which are not bimodal, binarization won’t be accurate.)


Median filters

A median filter is an example of a non-linear filter and, if properly designed, is very good at preserving image detail. Running  a median filter: 

  1. considers each pixel in the image
  2. sorts the neighboring pixels into order based upon their intensities,
  3. replaces the original value of the pixel by the median value from the list.

A median filter is a rank-selection (RS) filter, for example one that selects the closest of the neighboring values when a pixel's value is external in its neighborhood, and leaves it unchanged otherwise . It is sometimes preferred, especially in photographic applications. 

Median and other RCRS filters are good at removing salt and pepper noise from an image, and also cause relatively little blurring of edges, and hence are often used in computer vision applications.  

Disadvantage: the rest becomes blurred,  this impairs the  borders of characters and consequently recognition accuracy. 

At the same time (and rather unexpectedly), the best choice for “recipes” and images with “curved” or “complex in general” text is  the MedianBlurGAT preprocessor. 

Used filters: 

  • PureMedianBlur

Contain thresholding in addition

  • MedianBlurAGT
  • MedianBlurAT
  • MedianBlurGAT


BilateralFilter

A bilateral filter is a non-linear, edge-preserving and noise-reducing smoothing filter for images. The intensity value at each pixel in an image is replaced by a weighted average of intensity values from nearby pixels. This weight can be based on a Gaussian distribution. Crucially, the weights depend not only on  the Euclidean distance of pixels, but also on the radiometric differences (e.g. range differences, such as color intensity, depth distance, etc.). This preserves sharp edges by systematically looping through each pixel and adjusting weights to the adjacent pixels accordingly.  

It is normally used for non-text images or after thresholding. 


Thinning Algorithms

This is an algorithm used for binary images to  reduce a black and white area to a n  e.g. one bit skeleton.   

A fast parallel thinning  algorithm consists of  tw iteration loops: 
One aimed at  deleting the  south-east boundary  points and  the north-west  corner points  while the  other one  is aimed  at deleting the north-west  boundary points  and the south-east  corner points. End points  and  pixel  connectivity are  preserved. Each  pattern  is  thinned down  to  a "skeleton"  of  unitary thickness.  Experimental  results show  that  this method  is  very effective . 

Used algorithms: 

  • ThinningZhangSuen
  • ThinningGuoHall

DeSkew

Deskewing an image can help a lot, if you want to do barcode detection, or just improve the readability of scanned images. I photos of goods with a barcode for example,   the skew angle is often  too high, so the barcode cannot be detected.  After deskewing, the barcode can be read. 

If an image is a logo, a good choice is DeSkew+AutoInvert and  any of the preprocessors Feng,  Nick,  Sauvola or WolfJolion.  
For invoices  suggestion is DeSkew and  Sauvola or WolfJolion.  


Invert / AutoInvert

Both filters are suitable for images containing more black than white color. 

Application of “Invert” changes black to white and vice versa.   

Filter “Autoinvert” checks first,  if we really have more black than white  on page. 

We get good results, if “Invert (AutoInvert)” is used together with “BilateralFilter” and “DeSkew” . 


AutoRotate

This preprocessor algorithm allows to detect image rotation by 90/180/270 degrees, using artificial intelligent algorithm It detects rotation of image and rotate it  before text recognition process. The following preprocessor settings allow to detect  image rotation and rotate it, and then de-skew the resulting image, before text recognition: 

Example  

Preprocess = AutoRotate+Deskew 

 
For using this algorithm, an additional property should be set: AutoRotateModel. This property should point to tensorflow-based model file , trained to detect image rotation. 


Composite Algorithms

(From CIB ocr version 2.3.0) 

CIB OCR can use complex algorithms for image preprocessing. For using of complex image processing algorithms preprocessor "Composite" should be used. This possibility is based on usage of CIB image toolbox functionality. Each preprocessing algorithm should be described in XML format (details are available in CIB image toolbox documentation). 

Example CIB runshell: 

cibrsh.exe –oc Preprocess=Composite AlgorithmsSetName=AlgorithmsSet_sample.xml
AlgorithmName=SepaTextExtraction AlgorithmProfile=processing_profile.xml
IPLTraceFilename=OCR_IPL.log


Example CIB Job/CIB DocumentServer 

<?xml version="1.0" encoding="ISO-8859-1" ?
<root> 
<Comod> 
<defaults> 
<properties command="job"> 
<property name="OutputMode">XML</property> <property name="UseInMemoryProcessing">1</property>  
</properties> 
</defaults> 
<jobs> 
<job name="tesseract" expected-result-code="404"> 
<steps> 
<step name="LoadStep" command="load"> <properties> 
<property name="InputFilename">./input/input.png</property> </properties> 
</step> 
<step name="OcrStep" expected-result-code="1000" command=" ocr"> 
<properties> 
<property name="LicenseCompany">CustomerLicensee</property> 
<property name="LicenseKey">4444-cccc-88888888</property> <property name="OCRLibraryName">Tesseract</property> <property name="DataFolder">.</property> 
<property name="OutputFormat">FormatHocr</property> <property name="TraceFilename">OCR_trace.log</property> <property name="OCRLanguage">deu</property> <property name="TracePreprocessOutput">1</property>  <property name="Preprocess">Composite</property> <property name="AlgorithmsSetName">AlgorithmsSet_sample.xml</property> <property name="AlgorithmName">SepaTextExtraction</property> <property name="AlgorithmProfile">processing_profile.xml</property> <property name="IPLTraceFilename">OCR_IPL.log</property> </properties> 
</step> 
<step name="SaveStep" expected-result-code="0" command="save"> <properties> 
<property name="OutputFilename">./ocr_out.html</property> </properties> 
</step> 
</steps> 
</job> 
</jobs> 
</Comod> 
</root>