skew angle estimation in document processing

SKEW ANGLE ESTIMATION IN DOCUMENT PROCESSING
USING COHEN’S CLASS DISTRIBUTIONS
E.Kavallieratou, N.Fakotakis, and G.Kokkinakis
Wire Communications Laboratory,
University of Patras, 26500 Patras, Greece.
tel. ++30-61-991722, fax ++30-61-991855
ergina@wcl.ee.upatras.gr
Abstract: A skew angle estimation approach based on the application of several time-frequency
distributions of Cohen's class to the horizontal projection profile of the page is proposed for document
processing. Our results prove that the Wigner-Ville is the best trade-off between accuracy and
computational cost.
Skew estimation
Cohen distributions
Document page
Projection profile
1 INTRODUCTION
The skew angle estimation of a document page is an important task for both document analysis and optical character
recognition (OCR) applications. A wide variety of methods have been proposed in the literature.
O’Gorman (1993) classified them on the basis of the techniques used. The Projection Profile technique has been
applied by Pavlidis (1991) and Ciardiello (1988), while its combination with the Fourier transform is a possibility
presented by Postl (1986) and Peake (1997). Shihari (1989) and Le (1994) applied the Hough Transform. The same
did Yu (1996) who presented a rather accurate and fast approach in a combination of the Hough Transform and
Connected Components techniques. O’Gorman, himself, makes use of the Connected Components, as well.
However, most of the proposed approaches are usually able to deal with small skew angles (±15), failing to manage
cases of documents that may exceed this limit. Moreover, some of them entail high computational cost, especially in
the case where the Hough transform is used. Also, certain approaches are font, column, graphics or border
dependent.
Chin (1997) in his research analyses the skewing problem in handwritten pages and underlines the methods which
handle printed pages very well do not manage handwritten pages that well or even if they do the computational cost
is greater.
In this paper we propose a skew angle estimation approach which is based on the application of the time frequency
distributions of Cohen's class to the horizontal projection profile of the page. The estimation algorithm has been
developed in the framework of the LE-1 1802 project ACCeSS that combines spoken and written language in call
center applications. We used seven well known distributions of Cohen’s class, i.e. the Wigner-Ville, the MargenauHill and the Rihaczek distribution, and some of their smoothed versions . The skewing angles range from –89 to +89
degrees and the hit ratio lies between 42% and 100%, depending on the Cohen distribution used.
Figure 1.
The proposed approach is both font size and type independent. It requires only a few lines of text in order to
accomplish its task and the presence of graphics, columns or borders do not affect its performance (fig.1). Hence, it is
appropriate for forms scanned or received by fax with a poor resolution and large skewing angle. Moreover,
handwritten pages are successfully handled even if the handwritten text lines are not exactly parallel to each other
(fig.2).
In section 2 we give more information about the distributions of Cohen’s class. The proposed algorithm is described
in section 3. Finally, in sections 4 and 5, experimental results and conclusions are given, respectively.
Figure 2.
2
2 TIME-FREQUENCY DISTRIBUTIONS OF COHEN'S CLASS
The representation of signals, the characteristics of which vary with time, called “time-varying” or “non-stationary”
signals is one exciting chapter of signal processing. This is an active area for many researchers coming from a wide
variety of scientific domains such as communications, medicine, seismic surveying, etc. For such signals the concept
of time-frequency distributions has been introduced.
The spectrogram was the first method and is still a widely used and powerful tool for the analysis of non-stationary
signals. However, the spectrogram as a decomposition distribution has the disadvantages of windowing. Thus, we
cannot achieve a good time resolution and a good frequency resolution simultaneously. On the other hand, the
distributions of Cohen's class, as energy distributions, overcome the windowing problem and are gaining ground.
In order to devise a joint function of time and frequency, many distributions have been proposed since 1932, (by
Wigner, Ville, Page, Choi-Williams), but only in 1966 Cohen proved that an infinite number of distributions can be
generated by the unified formulation:
+∞ +∞ +∞
ρ z (t , f ) =
∫ ∫ ∫e
j 2 πϑ ( u − t )
−∞−∞−∞
τ
τ − j 2 πfτ
g (ϑ , τ ) ⋅ z( u + ) z * (u − )e
dϑdudτ
2
2
where g( , ) is an arbitrary function called the Kernel function. The Kernel function characterizes the observation
mode chosen by the analyst. It determines how the signal energy is distributed in the time and frequency domain and
it corresponds to the windows that are used in the atomic decomposition. In table 1 the Kernel functions for some
distributions of Cohen’s class are shown.
Distribution
Kernel function
Wigner-Ville (WV)
1
Rihaczek
e jπϑτ
Margenau-Hill
cos(πϑτ )
Page
e − jπϑ τ
 (πϑτ ) 2 
exp −
2 
 2σ 
Choi-Williams
3
sin(πϑτ )
πϑτ
Born-Jordan
Table 1.
However, as Cohen (1989) himself clarified, the only way to find the appropriate distribution for a certain problem is
to try it on the problem.
3 SKEW ESTIMATION PROCEDURE
Our approach employs the projection profile technique and the Cohen distributions. The whole idea is based on the
fact that the histogram of the non-skewed page presents more pronounced peaks and dips than any other histogram of
the same page corresponding to a skew angle. A Cohen distribution of a histogram represents its time-frequency
distribution, where in this case the time increases according to the height of the page. Consequently, the Cohen
distribution presents maximum intensity for the histograms of 0 and 180 degrees, which show the major peaks and
dips alternations. The closer the skew angle is to 0o and 180o, the larger are the values of the maximum intensity.
This fact guarantees the success of our algorithm, provided that the skew angle ranges between -89 and +89 degrees
with respect to the right page position. Otherwise the page would be oriented at reverse side. In Fig.4 the WVDs of
histograms for several skew angles of the page of Fig.3 are shown.
Figure 3.
Initially, the page in question is rotated for several skewing angles, e.g., in steps that range between 1 and 10
degrees, depending on the application, and the corresponding histograms are calculated. The appropriate Cohen
4
distribution is then applied to each histogram and the curve of maximum intensity is calculated. Finally, the angle
that presents the maximum intensity is selected.
0 degrees
-1 degrees
'
(
/
(
.
(
-
(
,
(
+
(
*
(
)
(
Frequency
Frequency
Height of page
U
V
]
V
\
V
E
?
D
?
[
V
C
?
Z
V
B
?
Y
V
X
V
A
?
W
V
@
?
Frequency
?
Frequency
F
G
1
2
3
4
5
6
7
"
#
$
%
&
'
(
2 degrees
?
0
!
Height of page
1 degrees
>
8
9
:
;
<
=
>
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
?
Height of page
Height of page
Figure 4.
Thus, a curve of maximum intensity corresponding to each angle is obtained. In Fig.5 several curves of maximum
intensity, that correspond to the page of Fig.3, are shown
5
x
y
z
{
|
0o
1o
-1o
w
v
Frequency
u
t
s
r
q
p
^
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Height of page
.
Figure 5.
In order to define the curve that corresponds to the overall maximum intensity we tried several techniques. First,
we experimented by choosing the curve with the maximum peak. However, this method failed to deal with pages,
such as that of Fig.2, where the scanning introduces noise in the upper part of the page. Thus, the wrong curve
may be selected and consequently to react to a wrong estimation. The same problem was met by using as a criterion
the integral of the curves, that is, calculating the area under the curve.
In both cases, this problem could be overcome by examining the curves after applying a threshold, which would
trim the upper part of the page where noise from scanning is usually introduced. By doing so, the curves of Fig.6a
are transformed into those of Fig.6b and we can go on with our procedure. However, the calculation of the
threshold may present a problem, as it is difficult to know when and where this kind of noise will show up.
Œ
‹


‹
Ž

‹


‹
Œ
Š
‹

Š
‹
Ž
Š
‹

Š
‹
Œ
‘

’
“
”
0o
-1o
-4o
Intensity

‡
}
~

€

‚
ƒ
„
…
†
‡
‡
†
ˆ
‡
†
‰
‡
ž
¡
Ÿ
Height of page
(a)
ª
§
¨
«
¬
­
®
©
0o
-1 o
-4 o
§
¦
£
¥
£
¢
£
¤
Intensity
¦
¤
¥
¤
Ÿ
•
–
—
˜
™
š
›
œ
Height of page
6

ž
Ÿ
Ÿ
ž
Ÿ
(b)
Figure 6.
Instead, we resorted to another technique: The selection of the curve that presents the most maxima throughout the
time domain. Moreover, in order to facilitate this procedure we check only the parts of each curve that their values
exceed a threshold (a tenth of the mean curve’s maximum peak proved a good threshold in our experiments). In
Fig.7, the number of the maximum values, measured on several curves of the page shown in Fig.1 is presented,
with respect to the corresponding skew angle of this page.
¹
µ
·
¸
·
µ
¶
¸
¶
µ
´
¸
´
µ
¸
µ
¯
°
±
²
³
´
µ
´
´
´
¶
´
·
angle (degrees)
Figure 7.
The calculation of the histograms for every skew angle and the subsequent application of the distribution is not
necessary and would also increase computational complexity and cost. Since the page rotation is the most
computationally expensive part of our algorithm, it is desirable to minimize the number of required rotations.
However, the whole range between +89 and -89 degrees should be covered and only that. If the covered range were
larger than ±89 degrees, it would be possible to end up with the page reversed. On the other hand, if the covered
range were smaller than ±89 degrees and the skew angle larger than that, the method would also fail.
In order to satisfy both requirements, a first estimation is made by initially rotating the page in big steps in the
range of ±89 degrees. Then, the exact skew angle is found by rotating the page in steps of one degree within a
smaller range around the first detected angle.
In order to define the appropriate big step we calculated the number of necessary rotations for various steps in the
range between +89 and -89 degrees. The results are shown in Fig.8. We note that for a big step of 12 degrees the
number of rotations is minimized. Consequently, 12 is the demanded step. This step was used in our experiments
described below.
7
required rotations
¼
Á
º
¼
À
º
¼
¿
º
¼
½
º
¼
º
º
Á
º
À
º
¿
º
½
º
º
»
¼
º
¼
»
½
º
½
»
¾
º
¾
»
¿
º
¿
»
big step
Figure 8.
Thus, the page is rotated from -84 to +84 in steps of 12 degrees and after a first skew estimation and correction of
the document’s position. The same procedure is repeated, this time within the range of -6o and +6o, in steps of 1o.
Briefly, the steps of the proposed procedure can be summarized as follows:
1.
The document is rotated to the right and left in steps of 12o within the range ±90o. For each angle the
horizontal projection profile is extracted.
2.
The Cohen distribution is calculated for each projection, as well as the maximum intensity of the distribution.
3.
The angle whose histogram presents the maximum intensity is selected and the document is rotated by this
angle (angle1).
4.
The procedure is repeated from step 1 to step 3 but this time the document is rotated within a smaller range
(around angle1) by one degree at a time and a more exact angle is calculated (angle2).
5.
The estimated skew angle is angle1+angle2.
Although the computational time is not extremely long (table 2), a further improvement could be achieved if part
of the page is used instead of the whole page. The maximum peak of the histogram is used as the criterion for
which part of the page should be used. Here, we assume that the intensity of the text is higher than the one of
graphics. If this assumption is not true, the use of the whole page is recommended as well as in the case of
handwritten pages which include text lines of different angles.
4 EXPERIMENTAL RESULTS
We applied seven of Cohen’s class distributions and several of their smoothed versions to a set of 100 documents
consisting of 30 handwritten pages, 20 different application forms, 20 two-columns pages, and 30 other pages with
text and graphics of skew angles up to ±85 degrees approximately. In table 2, the results of this test are presented.
8
The rates correspond to an accuracy of ±0.5o and the given mean time is the time per page, when the whole page
and a window of 400x400 pixels are used respectively. The CPU time refers to a PC Pentium II at 233 MHz.
These results indicate that most of the Cohen’s class distributions are appropriate to our problem but Wigner-Ville
is the best in both accuracy and computational cost. Further testing with the Wigner-Ville distribution, in 250
applications forms, verified the complete success of the distribution, even when part of the page was used.
Distribution
Wigner-Ville (WV)
Pseudo WV
Smoothed Pseudo WV
Rihaczek
Margenau-Hill
Pseudo Margenau-Hill
Page
Pseudo Page
Choi-Williams
Born-Jordan
Zhao-Atlas-Marks
Success
Mean time
Rate
(whole page used)
100%
30s
87%
42s
91%
49s
44%
56s
98%
50s
81%
58s
92%
57s
23%
65s
100%
58s
86%
56s
89%
64s
Table 2.
Mean Time
(window 400x400p)
4.6s
7.6s
7.8s
8.1s
7.8s
8.2s
8.2s
9.1s
8.1s
8.1s
9.0s
5 CONCLUSION
An approach of skew angle estimation based on the Cohen’s class distributions was presented. It is suitable for
most types of document pages: printed and handwritten pages, with graphics or borders, poor resolution, and
various types and sizes of fonts. Skew angles of up to ±89 degrees are managed with an accuracy of ±0.5.
We demonstrated that although most of the applied distributions face successfully the problem, the Wigner-Ville
distribution is the best trade-off between rate and time. We continue to work on the improvement of the
computational cost, as well as on the algorithm of rotation, so that a better accuracy is achieved.
REFERENCES
O'Gorman, L., 1993. The document spectrum for page layout analysis. IEEE trans. On Pattern Analysis and
Machine Intelligence 15, 1162-1173.
9
Pavlidis, T., J.Zhou. 1991. Page segmentation by white streams. Proc. 1st Int. Conf. Document Analysis and
Recognition (ICDAR), Int. Assoc. Pattern Recognition, 945-953.
Ciardiello, G., G.Scafuro, M.T.Degrandi, M.R. Spada, M.P.Roccotelli. 1988. An experimental system for office
document handling and text recognition. Proc 9th Int. Conf. on Pattern Recognition, 739-743.
Postl, W., 1986. Detection of linear oblique structures and skew scan in digitized documents. Proc. 8th Int. Conf.
Pattern Recognition, IEEE CS Press. Los Alamitos, Calif. 687-689.
Peake, G.S, T.N.Tan. 1997. A General Algorithm for Document Skew Angle Estimation. IEEE International
Conference on Image Processing 2, 230-233.
Shihari, S.N, Govindaraju. 1989. Analysis of textual images using the Hough transform. Machine Vision and
Applications 2, 141-153.
Le, D.S., G.R. Thoma, H.Wechsler. 1994. Automated page orientation and skew angle detection for binary
document image. Pattern Recognition 27, 1325-1344.
Yu, B., A.K.Jain. 1996. A robust and fast skew detection algorithm for generic documents. Pattern Recognition
29, 1599-1629.
Chin, W., A.Harvey, A.Jennings. 1997. Skew detection in handwritten scripts. Proc. IEEE on Speech and Image
Technologies for Computing and Telecommunications, 319-322.
Cohen, L., 1966. Generalized phase-space distribution functions. J.Math. Phys. 7, 781-786.
Cohen, L., 1989. Time-Frequency distributions-a review. Proceedings of the IEEE 77, 941-980.
10
Fig. 1: Example of mixed printed and handwritten page with various size of fonts.
Fig. 2: Example of handwritten page with text lines that are not parallel to each other.
Fig. 3: A right-oriented document page.
Fig.4: The WV distributions for several histograms of the document page of Fig. 3.
Fig. 5: Curves of maximum intensity extracted from Fig. 3.
Fig. 6: Several curves of maximum intensity before (a) and after (b) the subtraction of the noise.
Fig. 7: Number of maximum values of several curves of Fig.1with respect to the corresponding skew angle.
Fig.8: The relation between big step and rotations. The rotations are minimized for a big step equal to 12.
Table 1: The Kernel functions for some Cohen distributions.
Table 2: Experimental results from the application of the specified distributions.
11