Abstract
A novel text extraction algorithm from cluttered color document images is developed and tested. The algorithm consists of a color segmentation stage followed by rule-based filtering of non-text regions. Extraction of text segments algorithm uses the measurement of geometrical properties as well as characterness properties and a set of heuristic rules. The algorithm includes a fusion cycle of three different segmentation maps, and a restitution cycle to restore any deleted characters and/or their diacritical marks. The proposed method, proven successful in extraction of texts from many color document images, has applications in color image indexing and retrieval.
Original language | English (US) |
---|---|
Journal | European Signal Processing Conference |
Volume | 1998-January |
State | Published - 1998 |
Event | 9th European Signal Processing Conference, EUSIPCO 1998 - Island of Rhodes, Greece Duration: Sep 8 1998 → Sep 11 1998 |
ASJC Scopus subject areas
- Signal Processing
- Electrical and Electronic Engineering