Deep Convolutional Sequence Approach Towards Real-Time Intelligent Optical Scanning

Deep Convolutional Sequence Approach Towards Real-Time Intelligent Optical Scanning

Mansi Mahendru, Sanjay Kumar Dubey, Divya Gaur
Copyright: © 2021 |Pages: 14
DOI: 10.4018/IJCVIP.2021100105
(Individual Articles)
No Current Special Offers


Visual text recognition is the most dynamic computer vision application due to its rising demand in several applications like crime scene detection, assisting blind people, digitizing, book scanning, etc. However, numerous research works were executed on static visuals having organized text and on captured video frames in the past. The key objective of this study is to develop the real-time intelligent optical scanner that will extract every sequence of text from high-speed video, noisy visual input, and offline handwritten script. The scientific work has been carried out with the combination of multiple deep learning approaches, namely EAST, CNN, and Bi-LSTM with CTC. The system is trained and tested on four public datasets (i.e., ICDAR 2015, SVT, Synth-Text, IAM-3.0) and measured on the basis of recall, precision, and f-measure. Based on the challenges, performance has been examined under three different categories, and the outcomes are optimistic and encouraging for future advancement.
Article Preview

1. Introduction

Scene text identification and recognition is one of the active machine vision applications due to the growing demand of industrial automation, helping gadgets for visually challenged peoples etc. In this current digitized world everything gets digital either it is tracking someone, some vehicle, document verification etc. Reading the sequence of textual information is a very trendy application for various industries. In past years many computer vision approaches were used and led the fortunate execution of text retrieval on cleaned documents with improved accuracy. Due to this scientific community start believing that the problem of textual content recognition in visual has been largely solved. Traditional optical scanners failed in the complex scenes due to diverge sequence in the real world and challenges by which these scenes are captured (Baek et al, 2019). Text in video sequences has been regarded as the enormous challenge of this optical scanner which has been only solved in taking video frames as images but not on running video where scene changes after every second. Text inside the ongoing video annotates the details concerning the site where the incident occurs and signage text is used as the forceful indicators for the navigation and notification in scenes. Another biggest challenge for any text recognizer is while extracting text from offline handwritten manuscript in real time where text can be present in any font as well as in any style. Sequence present in handwritten script is also exercised in organizational purposes while scanning hand marked documents as well as help the humanities scholars to digitize the handwritten documents easily. In this study the most interesting phase of visual character recognizer i.e. real time intelligent optical scanner which will extract all the sequence of information from running video and handwritten documents not only with organized text but also with any wild text is considered. There might be many issues that can occur in extracting text from running video and with offline handwritten documents i.e. speed of the video is a major difficulty due to that principal sequence which is not very visible can be neglected by the machine as the visual content changes with great speed, Typography can also arise many issues such as characters and words can be written in various ways and in numerous fonts which can create problem for system to understand the text, another challenge is haziness and downturn in the video and handwritten manuscript, uneven lightning can be one of the reasons of degrading outcomes for the model as due to darkness, information may be hardly visible and scene entanglement in which issues due to background noise can mislead the model in making wrong predictions (Sourvanos & Tsatiris, 2018). In this study various attempts to solve the issues faced by the real time scene text recognizer without any loss of accuracy is examined. By keeping the challenges in mind the system is trained on four data records and different criteria were set to investigate the performance of the system. The output matrix is made for every criteria to examine the performance of the model individually in each case. The remaining paper is arranged as follows: section 2 reviews of previous work, section 3 describes datasets used, Section 4 methodology, section 5 experimental work and analysis finally section 6 conclusion and future scope.

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing