SKEW ANGLE ESTIMATION IN DOCUMENT PROCESSING USING COHEN’S CLASS DISTRIBUTIONS E.Kavallieratou, N.Fakotakis, and G.Kokkinakis Wire Communications Laboratory, University of Patras, 26500 Patras, Greece. tel. ++30-61-991722, fax ++30-61-991855 ergina@wcl.ee.upatras.gr Abstract: A skew angle estimation approach based on the application of several time-frequency distributions of Cohen's class to the horizontal projection profile of the page is proposed for document processing. Our results prove that the Wigner-Ville is the best trade-off between accuracy and computational cost. Skew estimation Cohen distributions Document page Projection profile 1 INTRODUCTION The skew angle estimation of a document page is an important task for both document analysis and optical character recognition (OCR) applications. A wide variety of methods have been proposed in the literature. O’Gorman (1993) classified them on the basis of the techniques used. The Projection Profile technique has been applied by Pavlidis (1991) and Ciardiello (1988), while its combination with the Fourier transform is a possibility presented by Postl (1986) and Peake (1997). Shihari (1989) and Le (1994) applied the Hough Transform. The same did Yu (1996) who presented a rather accurate and fast approach in a combination of the Hough Transform and Connected Components techniques. O’Gorman, himself, makes use of the Connected Components, as well. However, most of the proposed approaches are usually able to deal with small skew angles (±15), failing to manage cases of documents that may exceed this limit. Moreover, some of them entail high computational cost, especially in the case where the Hough transform is used. Also, certain approaches are font, column, graphics or border dependent. Chin (1997) in his research analyses the skewing problem in handwritten pages and underlines the methods which handle printed pages very well do not manage handwritten pages that well or even if they do the computational cost is greater. In this paper we propose a skew angle estimation approach which is based on the application of the time frequency distributions of Cohen's class to the horizontal projection profile of the page. The estimation algorithm has been developed in the framework of the LE-1 1802 project ACCeSS that combines spoken and written language in call center applications. We used seven well known distributions of Cohen’s class, i.e. the Wigner-Ville, the MargenauHill and the Rihaczek distribution, and some of their smoothed versions . The skewing angles range from –89 to +89 degrees and the hit ratio lies between 42% and 100%, depending on the Cohen distribution used. Figure 1. The proposed approach is both font size and type independent. It requires only a few lines of text in order to accomplish its task and the presence of graphics, columns or borders do not affect its performance (fig.1). Hence, it is appropriate for forms scanned or received by fax with a poor resolution and large skewing angle. Moreover, handwritten pages are successfully handled even if the handwritten text lines are not exactly parallel to each other (fig.2). In section 2 we give more information about the distributions of Cohen’s class. The proposed algorithm is described in section 3. Finally, in sections 4 and 5, experimental results and conclusions are given, respectively. Figure 2. 2 2 TIME-FREQUENCY DISTRIBUTIONS OF COHEN'S CLASS The representation of signals, the characteristics of which vary with time, called “time-varying” or “non-stationary” signals is one exciting chapter of signal processing. This is an active area for many researchers coming from a wide variety of scientific domains such as communications, medicine, seismic surveying, etc. For such signals the concept of time-frequency distributions has been introduced. The spectrogram was the first method and is still a widely used and powerful tool for the analysis of non-stationary signals. However, the spectrogram as a decomposition distribution has the disadvantages of windowing. Thus, we cannot achieve a good time resolution and a good frequency resolution simultaneously. On the other hand, the distributions of Cohen's class, as energy distributions, overcome the windowing problem and are gaining ground. In order to devise a joint function of time and frequency, many distributions have been proposed since 1932, (by Wigner, Ville, Page, Choi-Williams), but only in 1966 Cohen proved that an infinite number of distributions can be generated by the unified formulation: +∞ +∞ +∞ ρ z (t , f ) = ∫ ∫ ∫e j 2 πϑ ( u − t ) −∞−∞−∞ τ τ − j 2 πfτ g (ϑ , τ ) ⋅ z( u + ) z * (u − )e dϑdudτ 2 2 where g( , ) is an arbitrary function called the Kernel function. The Kernel function characterizes the observation mode chosen by the analyst. It determines how the signal energy is distributed in the time and frequency domain and it corresponds to the windows that are used in the atomic decomposition. In table 1 the Kernel functions for some distributions of Cohen’s class are shown. Distribution Kernel function Wigner-Ville (WV) 1 Rihaczek e jπϑτ Margenau-Hill cos(πϑτ ) Page e − jπϑ τ (πϑτ ) 2 exp − 2 2σ Choi-Williams 3 sin(πϑτ ) πϑτ Born-Jordan Table 1. However, as Cohen (1989) himself clarified, the only way to find the appropriate distribution for a certain problem is to try it on the problem. 3 SKEW ESTIMATION PROCEDURE Our approach employs the projection profile technique and the Cohen distributions. The whole idea is based on the fact that the histogram of the non-skewed page presents more pronounced peaks and dips than any other histogram of the same page corresponding to a skew angle. A Cohen distribution of a histogram represents its time-frequency distribution, where in this case the time increases according to the height of the page. Consequently, the Cohen distribution presents maximum intensity for the histograms of 0 and 180 degrees, which show the major peaks and dips alternations. The closer the skew angle is to 0o and 180o, the larger are the values of the maximum intensity. This fact guarantees the success of our algorithm, provided that the skew angle ranges between -89 and +89 degrees with respect to the right page position. Otherwise the page would be oriented at reverse side. In Fig.4 the WVDs of histograms for several skew angles of the page of Fig.3 are shown. Figure 3. Initially, the page in question is rotated for several skewing angles, e.g., in steps that range between 1 and 10 degrees, depending on the application, and the corresponding histograms are calculated. The appropriate Cohen 4 distribution is then applied to each histogram and the curve of maximum intensity is calculated. Finally, the angle that presents the maximum intensity is selected. 0 degrees -1 degrees ' ( / ( . ( - ( , ( + ( * ( ) ( Frequency Frequency Height of page U V ] V \ V E ? D ? [ V C ? Z V B ? Y V X V A ? W V @ ? Frequency ? Frequency F G 1 2 3 4 5 6 7 " # $ % & ' ( 2 degrees ? 0 ! Height of page 1 degrees > 8 9 : ; < = > H I J K L M N O P Q R S T U V ? Height of page Height of page Figure 4. Thus, a curve of maximum intensity corresponding to each angle is obtained. In Fig.5 several curves of maximum intensity, that correspond to the page of Fig.3, are shown 5 x y z { | 0o 1o -1o w v Frequency u t s r q p ^ _ ` a b c d e f g h i j k l m n o p Height of page . Figure 5. In order to define the curve that corresponds to the overall maximum intensity we tried several techniques. First, we experimented by choosing the curve with the maximum peak. However, this method failed to deal with pages, such as that of Fig.2, where the scanning introduces noise in the upper part of the page. Thus, the wrong curve may be selected and consequently to react to a wrong estimation. The same problem was met by using as a criterion the integral of the curves, that is, calculating the area under the curve. In both cases, this problem could be overcome by examining the curves after applying a threshold, which would trim the upper part of the page where noise from scanning is usually introduced. By doing so, the curves of Fig.6a are transformed into those of Fig.6b and we can go on with our procedure. However, the calculation of the threshold may present a problem, as it is difficult to know when and where this kind of noise will show up. 0o -1o -4o Intensity } ~ ¡ Height of page (a) ª § ¨ « ¬ ® © 0o -1 o -4 o § ¦ £ ¥ £ ¢ £ ¤ Intensity ¦ ¤ ¥ ¤ Height of page 6 (b) Figure 6. Instead, we resorted to another technique: The selection of the curve that presents the most maxima throughout the time domain. Moreover, in order to facilitate this procedure we check only the parts of each curve that their values exceed a threshold (a tenth of the mean curve’s maximum peak proved a good threshold in our experiments). In Fig.7, the number of the maximum values, measured on several curves of the page shown in Fig.1 is presented, with respect to the corresponding skew angle of this page. ¹ µ · ¸ · µ ¶ ¸ ¶ µ ´ ¸ ´ µ ¸ µ ¯ ° ± ² ³ ´ µ ´ ´ ´ ¶ ´ · angle (degrees) Figure 7. The calculation of the histograms for every skew angle and the subsequent application of the distribution is not necessary and would also increase computational complexity and cost. Since the page rotation is the most computationally expensive part of our algorithm, it is desirable to minimize the number of required rotations. However, the whole range between +89 and -89 degrees should be covered and only that. If the covered range were larger than ±89 degrees, it would be possible to end up with the page reversed. On the other hand, if the covered range were smaller than ±89 degrees and the skew angle larger than that, the method would also fail. In order to satisfy both requirements, a first estimation is made by initially rotating the page in big steps in the range of ±89 degrees. Then, the exact skew angle is found by rotating the page in steps of one degree within a smaller range around the first detected angle. In order to define the appropriate big step we calculated the number of necessary rotations for various steps in the range between +89 and -89 degrees. The results are shown in Fig.8. We note that for a big step of 12 degrees the number of rotations is minimized. Consequently, 12 is the demanded step. This step was used in our experiments described below. 7 required rotations ¼ Á º ¼ À º ¼ ¿ º ¼ ½ º ¼ º º Á º À º ¿ º ½ º º » ¼ º ¼ » ½ º ½ » ¾ º ¾ » ¿ º ¿ » big step Figure 8. Thus, the page is rotated from -84 to +84 in steps of 12 degrees and after a first skew estimation and correction of the document’s position. The same procedure is repeated, this time within the range of -6o and +6o, in steps of 1o. Briefly, the steps of the proposed procedure can be summarized as follows: 1. The document is rotated to the right and left in steps of 12o within the range ±90o. For each angle the horizontal projection profile is extracted. 2. The Cohen distribution is calculated for each projection, as well as the maximum intensity of the distribution. 3. The angle whose histogram presents the maximum intensity is selected and the document is rotated by this angle (angle1). 4. The procedure is repeated from step 1 to step 3 but this time the document is rotated within a smaller range (around angle1) by one degree at a time and a more exact angle is calculated (angle2). 5. The estimated skew angle is angle1+angle2. Although the computational time is not extremely long (table 2), a further improvement could be achieved if part of the page is used instead of the whole page. The maximum peak of the histogram is used as the criterion for which part of the page should be used. Here, we assume that the intensity of the text is higher than the one of graphics. If this assumption is not true, the use of the whole page is recommended as well as in the case of handwritten pages which include text lines of different angles. 4 EXPERIMENTAL RESULTS We applied seven of Cohen’s class distributions and several of their smoothed versions to a set of 100 documents consisting of 30 handwritten pages, 20 different application forms, 20 two-columns pages, and 30 other pages with text and graphics of skew angles up to ±85 degrees approximately. In table 2, the results of this test are presented. 8 The rates correspond to an accuracy of ±0.5o and the given mean time is the time per page, when the whole page and a window of 400x400 pixels are used respectively. The CPU time refers to a PC Pentium II at 233 MHz. These results indicate that most of the Cohen’s class distributions are appropriate to our problem but Wigner-Ville is the best in both accuracy and computational cost. Further testing with the Wigner-Ville distribution, in 250 applications forms, verified the complete success of the distribution, even when part of the page was used. Distribution Wigner-Ville (WV) Pseudo WV Smoothed Pseudo WV Rihaczek Margenau-Hill Pseudo Margenau-Hill Page Pseudo Page Choi-Williams Born-Jordan Zhao-Atlas-Marks Success Mean time Rate (whole page used) 100% 30s 87% 42s 91% 49s 44% 56s 98% 50s 81% 58s 92% 57s 23% 65s 100% 58s 86% 56s 89% 64s Table 2. Mean Time (window 400x400p) 4.6s 7.6s 7.8s 8.1s 7.8s 8.2s 8.2s 9.1s 8.1s 8.1s 9.0s 5 CONCLUSION An approach of skew angle estimation based on the Cohen’s class distributions was presented. It is suitable for most types of document pages: printed and handwritten pages, with graphics or borders, poor resolution, and various types and sizes of fonts. Skew angles of up to ±89 degrees are managed with an accuracy of ±0.5. We demonstrated that although most of the applied distributions face successfully the problem, the Wigner-Ville distribution is the best trade-off between rate and time. We continue to work on the improvement of the computational cost, as well as on the algorithm of rotation, so that a better accuracy is achieved. REFERENCES O'Gorman, L., 1993. The document spectrum for page layout analysis. IEEE trans. On Pattern Analysis and Machine Intelligence 15, 1162-1173. 9 Pavlidis, T., J.Zhou. 1991. Page segmentation by white streams. Proc. 1st Int. Conf. Document Analysis and Recognition (ICDAR), Int. Assoc. Pattern Recognition, 945-953. Ciardiello, G., G.Scafuro, M.T.Degrandi, M.R. Spada, M.P.Roccotelli. 1988. An experimental system for office document handling and text recognition. Proc 9th Int. Conf. on Pattern Recognition, 739-743. Postl, W., 1986. Detection of linear oblique structures and skew scan in digitized documents. Proc. 8th Int. Conf. Pattern Recognition, IEEE CS Press. Los Alamitos, Calif. 687-689. Peake, G.S, T.N.Tan. 1997. A General Algorithm for Document Skew Angle Estimation. IEEE International Conference on Image Processing 2, 230-233. Shihari, S.N, Govindaraju. 1989. Analysis of textual images using the Hough transform. Machine Vision and Applications 2, 141-153. Le, D.S., G.R. Thoma, H.Wechsler. 1994. Automated page orientation and skew angle detection for binary document image. Pattern Recognition 27, 1325-1344. Yu, B., A.K.Jain. 1996. A robust and fast skew detection algorithm for generic documents. Pattern Recognition 29, 1599-1629. Chin, W., A.Harvey, A.Jennings. 1997. Skew detection in handwritten scripts. Proc. IEEE on Speech and Image Technologies for Computing and Telecommunications, 319-322. Cohen, L., 1966. Generalized phase-space distribution functions. J.Math. Phys. 7, 781-786. Cohen, L., 1989. Time-Frequency distributions-a review. Proceedings of the IEEE 77, 941-980. 10 Fig. 1: Example of mixed printed and handwritten page with various size of fonts. Fig. 2: Example of handwritten page with text lines that are not parallel to each other. Fig. 3: A right-oriented document page. Fig.4: The WV distributions for several histograms of the document page of Fig. 3. Fig. 5: Curves of maximum intensity extracted from Fig. 3. Fig. 6: Several curves of maximum intensity before (a) and after (b) the subtraction of the noise. Fig. 7: Number of maximum values of several curves of Fig.1with respect to the corresponding skew angle. Fig.8: The relation between big step and rotations. The rotations are minimized for a big step equal to 12. Table 1: The Kernel functions for some Cohen distributions. Table 2: Experimental results from the application of the specified distributions. 11
© Copyright 2025 Paperzz