Identification of different script lines from multi-script documents

Author:Pal, U; Chaudhuri, BB

Article Title:Identification of different script lines from multi-script documents

Abstract:
For wider readership, some documents may be printed in several scripts and languages. For optical character recognition (OCR) of such a document page, a software module is necessary to identify the scripts before feeding them to their individual OCR systems. This paper deals with an automatic technique for the identification of printed Roman, Chinese, Arabic, Devnagari and Bangla text lines from a single document. For this purpose script characteristics, shape-based features, statistical features and some features obtained from the concept of water overflow from the reservoir have been employed. The scheme shows an accuracy of about 97.33%. (C) 2002 Published by Elsevier Science B.V.

Keywords: optical character recognition; script lines; head-line

DOI: 10.1016/S0262-8856(02)00101-4

Source:IMAGE AND VISION COMPUTING

Welcome to correct the error, please contact email: humanisticspider@gmail.com