781 427
Full Length Article
Fusion: Practice and Applications
Volume 4 , Issue 2, PP: 56-61 , 2021 | Cite this article as | XML | Html |PDF

Title

An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card

Authors Names :   Rachna Tewani   1 *     Arun K. Dubey   2     Achin Jain   3     Eshika Agarwal   4     Disha Mittal   5  

1  Affiliation :  Data Scientist ,Great Learning, India

    Email :  rachnatewani09@gmail.com


2  Affiliation :  Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  arudubey@gmail.com


3  Affiliation :  Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  achin.mails@gmail.com


4  Affiliation :  Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  eshika2812@gmail.com


5  Affiliation :  Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  dishamittal.it2@bvp.edu.in



Doi   :   https://doi.org/10.54216/FPA.040201

Received: April 12, 2021 Accepted: August 01, 2021

Abstract :

In today's world, everything is getting digitized, and widespread use of data scanning tools and photography. When we have a lot of image data, it becomes important to accumulate data in a form that is useful for the company/organization. Doing it manually is a tedious task and takes an ample amount of time. Hence to simplify the job, we have developed a FLASK API that takes an image folder as an object and returns an excel sheet of relevant data from the image data. We have used optical character recognition and software like pytesseract to extract data from images. Further in the process, we have used natural language processing, and finally, we have found relevant data using the globe and regex module. This model is helpful in data collection from Registration certificates which helps us store data like chassis number, owner name, car number, etc.,  easily and can be applied to Aadhaar cards and pan cards.

Keywords :

Optical character recognition; Aadhar; Pan Card; NLP

References :

Shafait, F., Keysers, D., & Breuel, T. M. (2008, January). Efficient implementation of local adaptive thresholding techniques using integral images. In Document recognition and retrieval XV (Vol. 6815, p. 681510). International Society for Optics and Photonics.

[2]       Smith, R. (2007, September). An overview of the Tesseract OCR engine. In Ninth international conference on document analysis and recognition (ICDAR 2007) (Vol. 2, pp. 629-633). IEEE.

[3]       Wen, Y., Lu, Y., Yan, J., Zhou, Z., von Deneen, K. M., & Shi, P. (2011). An algorithm for license plate recognition applied to intelligent transportation system. IEEE Transactions on intelligent transportation systems, 12(3), 830-845..

[4]       Fan, X., & Fan, G. (2008). Graphical models for joint segmentation and recognition of license plate characters. IEEE Signal Processing Letters, 16(1), 10-13..

[5]       Wu, H., & Li, B. (2011, July). License plate recognition system. In 2011 International Conference on Multimedia Technology (pp. 5425-5427). IEEE.

[6]       Pan, Y. F., Hou, X., & Liu, C. L. (2008, September). A robust system to detect and localize texts in natural scene images. In 2008 The Eighth IAPR International Workshop on Document Analysis Systems (pp. 35-42). IEEE..

[7]       Liang, J., DeMenthon, D., & Doermann, D. (2008). Geometric rectification of camera-captured document images. IEEE transactions on pattern analysis and machine intelligence, 30(4), 591-605.. 

[8]       Wen, Y., Lu, Y., Yan, J., Zhou, Z., von Deneen, K. M., & Shi, P. (2011). An algorithm for license plate recognition applied to intelligent transportation system. IEEE Transactions on intelligent transportation systems, 12(3), 830-845.. 

[9]       Zheng, L., He, X., Samali, B., & Yang, L. T. (2013). An algorithm for accuracy enhancement of license plate recognition. Journal of computer and system sciences, 79(2), 245-255..

[10]     Deselaers, T., Gass, T., Heigold, G., & Ney, H. (2011). Latent log-linear models for handwritten digit classification. IEEE transactions on pattern analysis and machine intelligence, 34(6), 1105-1117..

[11]     Jiao, J., Ye, Q., & Huang, Q. (2009). A configurable method for multi-style license plate recognition. Pattern Recognition, 42(3), 358-369..

[12]     Kocer, H. E., & Cevik, K. K. (2011). Artificial neural networks based vehicle license plate recognition. Procedia Computer Science, 3, 1033-1037.. 

[13]     Desai, A. A. (2010). Gujarati handwritten numeral optical character reorganization through neural network. Pattern recognition, 43(7), 2582-2589..

[14]     Pal, U., Roy, P. P., Tripathy, N., & Lladós, J. (2010). Multi-oriented Bangla and Devnagari text recognition. Pattern Recognition, 43(12), 4124-4136..

[15]     Manwatkar, P. M., & Singh, K. R. (2015, January). A technical review on text recognition from images. In 2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO) (pp. 1-5). IEEE..

[16]     Akopyan, M. S., Belyaeva, O. V., Plechov, T. P., & Turdakov, D. Y. (2019, September). Text recognition on images from social media. In 2019 Ivannikov Memorial Workshop (IVMEM) (pp. 3-6). IEEE.

[17]     Xiaojing Liu, Feiyu Gao, Qiong Zhang and Huasha Zhao, "Graph convolution for multimodal information extraction from visually rich documents", Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2, no. Industry Papers, pp. 32-39, June 2019.

[18]     Seong Ah Chin and Raashid Malik, "Extraction of Text in Images", Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4459-4469, October-November 2018.

 


Cite this Article as :
Rachna Tewani , Arun K. Dubey , Achin Jain , Eshika Agarwal , Disha Mittal, An efficient extraction of information from Indian Government issued documents Aadhar and Pan Card, Fusion: Practice and Applications, Vol. 4 , No. 2 , (2021) : 56-61 (Doi   :  https://doi.org/10.54216/FPA.040201)