Asegmentation based optical character recognition system for Bangla printed text

Mahir; Bangabandhu Sheikh Mujibur Rahman Digital University Mahbub

Ahmedul; University of Dhaka Kabir

Telecommunication Computing Electronics and Control

Asegmentation based optical character recognition system for Bangla printed text

Abstract

Bangla ranks as the fifth most spoken language globally, catalyzing significant interest in the development of Bangla optical character recognition (OCR) sys tems. The intricate structure of the Bangla script, including compound char acters, modifiers, and headlines, complicates the formation of words. This research introduces a complete OCR system pipeline for printed Bangla text. It employs a thinning-based segmentation approach combined with a convolu tional neural network (CNN) to recognize Bangla fonts. Additionally, a part of speech (POS)-aware spell checker is proposed that automatically corrects mis spelled words while considering their context within the sentence. We intro duce semi-generalized filters that adapt to new fonts, addressing conjunct for mation challenges in Bangla OCR. This flexible design allows for adaptation to new fonts. The ResNet50 model is utilized to accurately recognize segmented characters and modifiers. We achieve a character segmentation error of 3.354% and an overall segmentation error of 2.332%. The ResNet50 recognition model achieves an accuracy of 98.345%.

Cite

Full View

DOI

10.12928/telkomnika.v24i3.26961

ISSN Information

1693-6930

Pages

945-956

More Information

Volume 24

Issue 3

Publish at 2026-06-01

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now