Asegmentation based optical character recognition system for Bangla printed text
Telecommunication Computing Electronics and Control
Abstract
Bangla ranks as the fifth most spoken language globally, catalyzing significant interest in the development of Bangla optical character recognition (OCR) sys tems. The intricate structure of the Bangla script, including compound char acters, modifiers, and headlines, complicates the formation of words. This research introduces a complete OCR system pipeline for printed Bangla text. It employs a thinning-based segmentation approach combined with a convolu tional neural network (CNN) to recognize Bangla fonts. Additionally, a part of speech (POS)-aware spell checker is proposed that automatically corrects mis spelled words while considering their context within the sentence. We intro duce semi-generalized filters that adapt to new fonts, addressing conjunct for mation challenges in Bangla OCR. This flexible design allows for adaptation to new fonts. The ResNet50 model is utilized to accurately recognize segmented characters and modifiers. We achieve a character segmentation error of 3.354% and an overall segmentation error of 2.332%. The ResNet50 recognition model achieves an accuracy of 98.345%.
Discover Our Library
Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.





