Article Title:Identification of different script lines from multi-script documents
Abstract:
For wider readership, some documents may be printed in several scripts and languages. For optical character recognition (OCR) of such a document page, a software module is necessary to identify the scripts before feeding them to their individual OCR systems. This paper deals with an automatic technique for the identification of printed Roman, Chinese, Arabic, Devnagari and Bangla text lines from a single document. For this purpose script characteristics, shape-based features, statistical features and some features obtained from the concept of water overflow from the reservoir have been employed. The scheme shows an accuracy of about 97.33%. (C) 2002 Published by Elsevier Science B.V.
Keywords: optical character recognition; script lines; head-line
DOI: 10.1016/S0262-8856(02)00101-4
Source:IMAGE AND VISION COMPUTING
Welcome to correct the error, please contact email: humanisticspider@gmail.com