CIB ocr technical manual (EN)
6. General properties
6.11. Preprocess
AllgemeinPreprocessor details
▸ Thresholding Methods
▸ Median filters
▸ BilateralFilter
▸ Thinning Algorithms
▸ DeSkew
▸ Invert / AutoInvert
▸ AutoRotate
Composite Algorithms
Allgemein
Property-Name |
Datentyp |
Art |
Preprocess |
String |
Set |
This property allows to specify methods for preprocessing the inputfile in order to get a better ocr-result.
Syntax
Preprocess: <preprocessnames>
<preprocessnames>= <preprocessor> | <preprocessor> “+” <preprocessnames>
<preprocessor>: NativeAdaptiveThresholding | PureMedianBlur | PureAdaptiveThresholding |
PureAdaptiveGaussianThresholding | MedianBlurAGT | MedianBlurAT | MedianBlurGAT |
SauvolaThresholding | NiblackThresholding | WolfJolionThresholding | NickThresholding |
FengThresholding | OtsuThresholding | BilateralFilter | ThinningZhangSuen |
ThinningGuoHall | DeSkew | Invert | AutoInvert | Composite
No default
Example
Preprocess = NativeAdaptiveThresholding
Preprocessor details
Thresholding Methods is the simplest method of image segmentation. From a grayscale image, thresholding can be used to create binary images.The simplest thresholding methods replace each pixel in an image with a black pixel if the image intensity is less than some fixed constant T, or a white pixel if the image intensity is greater than that constant.
Adaptive Thresholding
Using a global value as threshold value may not be good in all conditions where an image has different lighting conditions in different areas. In that case, we go for adaptive thresholding. Adaptive thresholding means that the algorithm calculates the threshold for small regions of the image. Thus we get different thresholds for different regions of the same image and this gives us better results for images with varying illumination.
It has three ‘special’ input parameters and only one output argument.
Adaptive Method - It decides how the thresholding value is calculated.
- cv2.ADAPTIVE_THRESH_MEAN_C : threshold value is the mean of the neighborhood area.
- cv2.ADAPTIVE_THRESH_GAUSSIAN_C : threshold value is the weighted sum of neighborhood values where weights are a Gaussian window.
Block Size - It decides the size of the neighborhood area.
C - It is just a constant which is subtracted from the mean or weighted mean calculated.
NativeAdaptiveThresholding
This is a complex filter which consists of the following steps using OpenCV library:
- cv::medianBlur()
- cv::adaptiveThreshold() using CV_ADAPTIVE_THRESH_MEAN_C threshold type
- cv::bilateralFilter()
- The result is a grayscale image;
PureAdaptiveThresholding
While the conventional thresholding operator uses a global threshold for all pixels, adaptive thresholding changes the threshold dynamically over the image. This more sophisticated version of thresholding can accommodate changing lighting conditions in the image , e.g. those occurring as a result of a strong illumination gradient or shadows.
PureAdaptiveThresholding consist only of one step:
- cv::adaptiveThreshold() using CV_ADAPTIVE_THRESH_MEAN_C threshold type
- The result is a binary image;
Alternative: PureAdaptiveGaussianThresholding
Thresholding based on standard deviation
The methods described in the following sections - FengThresholding, SauvolaThresholding, NiblackThresholding, WolfJolionThresholding, NickThresholding - differ only by the final formula for the thresholding value for particular pixel, but use the same matrixes with standard deviation.
SauvolaThresholding
NiblackThresholding
Niblack’s method can be considered as the first local threshold method. It has the advantage of detecting the text but it introduces a lot of background noise. Sauvola and Pietikinen modified the Niblack threshold to decrease the background noise but the text detection rate is also decreased while bleed-through still remains in most cases.
WolfJolionThresholding
In particular, for most colored images the Wolfjolion preprocessor allows to achieve the best quality of recognition as well as for images with background noise and anti-aliased font.
NickThresholding
Nick's binarization derives its thresholding formula from the basic Niblack algorithm, the parent of many local image thresholding methods. The major advantage of Nick's method over Niblack is that it considerably improves binarization for "white" and light page images by shifting down the binarization threshold.
FengThresholding
The Feng thresholding method is interesting because it can qualitatively outperform the Sauvola thresholding method. However, the Feng method contains many parameters which have to be set. Hence this method was never widely accepted.
OtsuThresholding
Considering a bimodal image (a bimodal image is an image whose histogram has two peaks) we can approximately take a value in the middle of those peaks as threshold value. That is what Otsu binarization does. So it automatically calculates a threshold value from an image’s histogram for a bimodal image. (For images which are not bimodal, binarization won’t be accurate.)
Median filters
A median filter is an example of a non-linear filter and, if properly designed, is very good at preserving image detail. Running a median filter:
- considers each pixel in the image
- sorts the neighboring pixels into order based upon their intensities,
- replaces the original value of the pixel by the median value from the list.
A median filter is a rank-selection (RS) filter, for example one that selects the closest of the neighboring values when a pixel's value is external in its neighborhood, and leaves it unchanged otherwise . It is sometimes preferred, especially in photographic applications.
Median and other RCRS filters are good at removing salt and pepper noise from an image, and also cause relatively little blurring of edges, and hence are often used in computer vision applications.
Disadvantage: the rest becomes blurred, this impairs the borders of characters and consequently recognition accuracy.
At the same time (and rather unexpectedly), the best choice for “recipes” and images with “curved” or “complex in general” text is the MedianBlurGAT preprocessor.
Used filters:
- PureMedianBlur
Contain thresholding in addition
- MedianBlurAGT
- MedianBlurAT
- MedianBlurGAT
A bilateral filter is a non-linear, edge-preserving and noise-reducing smoothing filter for images. The intensity value at each pixel in an image is replaced by a weighted average of intensity values from nearby pixels. This weight can be based on a Gaussian distribution. Crucially, the weights depend not only on the Euclidean distance of pixels, but also on the radiometric differences (e.g. range differences, such as color intensity, depth distance, etc.). This preserves sharp edges by systematically looping through each pixel and adjusting weights to the adjacent pixels accordingly.
It is normally used for non-text images or after thresholding.
Thinning Algorithms
This is an algorithm used for binary images to reduce a black and white area to a n e.g. one bit skeleton.
A fast parallel thinning
algorithm consists of
tw
o iteration loops:
One aimed at
deleting the
south-east boundary
points and
the north-west
corner points
while the
other one
is aimed
at deleting the north-west
boundary points
and the south-east
corner points. End points
and
pixel
connectivity are
preserved. Each
pattern
is
thinned down
to
a "skeleton"
of
unitary thickness.
Experimental
results show
that
this method
is
very effective
.
Used algorithms:
- ThinningZhangSuen
- ThinningGuoHall
DeSkew
Deskewing an image can help a lot, if you want to do barcode detection, or just improve the readability of scanned images. I n photos of goods with a barcode for example, the skew angle is often too high, so the barcode cannot be detected. After deskewing, the barcode can be read.
If an image is a logo, a good choice is DeSkew+AutoInvert and
any of the preprocessors Feng,
Nick,
Sauvola or WolfJolion.
For invoices a
suggestion is DeSkew and
Sauvola or WolfJolion.
Invert / AutoInvert
Both filters are suitable for images containing more black than white color.
Application of “Invert” changes black to white and vice versa.
Filter “Autoinvert” checks first, if we really have more black than white on page.
We get good results, if “Invert (AutoInvert)” is used together with “BilateralFilter” and “DeSkew” .
AutoRotate
This preprocessor algorithm allows to detect image rotation by 90/180/270 degrees, using artificial intelligent algorithm . It detects rotation of image and rotate it before text recognition process. The following preprocessor settings allow to detect image rotation and rotate it, and then de-skew the resulting image, before text recognition:
Example
Preprocess = AutoRotate+Deskew
For using this algorithm, an additional property should be set: AutoRotateModel. This property should point to tensorflow-based model file
, trained to detect image rotation.
Composite Algorithms
(From CIB ocr version 2.3.0)
CIB OCR can use complex algorithms for image preprocessing. For using of complex image processing algorithms preprocessor "Composite" should be used. This possibility is based on usage of CIB image toolbox functionality. Each preprocessing algorithm should be described in XML format (details are available in CIB image toolbox documentation).
Example CIB runshell:
cibrsh.exe –oc Preprocess=Composite AlgorithmsSetName=AlgorithmsSet_sample.xml
AlgorithmName=SepaTextExtraction AlgorithmProfile=processing_profile.xml
IPLTraceFilename=OCR_IPL.log
Example CIB Job/CIB DocumentServer
<?xml version="1.0" encoding="ISO-8859-1" ?> <root>
<Comod>
<defaults>
<properties command="job">
<property name="OutputMode">XML</property> <property name="UseInMemoryProcessing">1</property>
</properties>
</defaults>
<jobs>
<job name="tesseract" expected-result-code="404">
<steps>
<step name="LoadStep" command="load"> <properties>
<property name="InputFilename">./input/input.png</property> </properties>
</step>
<step name="OcrStep" expected-result-code="1000" command=" ocr">
<properties>
<property name="LicenseCompany">CustomerLicensee</property>
<property name="LicenseKey">4444-cccc-88888888</property> <property name="OCRLibraryName">Tesseract</property> <property name="DataFolder">.</property>
<property name="OutputFormat">FormatHocr</property> <property name="TraceFilename">OCR_trace.log</property> <property name="OCRLanguage">deu</property> <property name="TracePreprocessOutput">1</property> <property name="Preprocess">Composite</property> <property name="AlgorithmsSetName">AlgorithmsSet_sample.xml</property> <property name="AlgorithmName">SepaTextExtraction</property> <property name="AlgorithmProfile">processing_profile.xml</property> <property name="IPLTraceFilename">OCR_IPL.log</property> </properties>
</step>
<step name="SaveStep" expected-result-code="0" command="save"> <properties>
<property name="OutputFilename">./ocr_out.html</property> </properties>
</step>
</steps>
</job>
</jobs>
</Comod>
</root>