902 435
Full Length Article
Fusion: Practice and Applications
Volume 4 , Issue 1, PP: 5-14 , 2021 | Cite this article as | XML | Html |PDF


Exploratory Data Analysis on Username-Password Dataset

Authors Names :   Vanita Jain   1 *     Mahima Swami   2     Rishab Bansal   3  

1  Affiliation :  Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  vanita.jain@bharatividyapeeth.edu

2  Affiliation :  Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  mahima.it1@bvp.edu.in

3  Affiliation :  Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  rishabbansal.it1@bvp.edu.in

Doi   :   https://doi.org/10.54216/FPA.040101

Received January 07, 2021 Accepted May 10, 2021

Abstract :

Passwords act as a first line of defense against any malicious or unauthorized access to one's personal information. With the increasing digitization, it has now become even more important to choose strong passwords. In this paper, the authors analyze a 100 million Email-Password Database to perform Exploratory Data Analysis. The analysis provides valuable insights on statistics about the most common passwords being used, character set of passwords, most common domains, average length, password strength, frequencies of letters, numbers, symbols (special characters), most common letter, most common number, most common symbol, the ratio of letters, numbers, symbols in passwords which highlights the general trend that users follow while creating passwords. Using the results of this paper, users can make intelligent decisions while creating passwords for themselves, i.e., not opting for the most common features that will help them create robust and less vulnerable passwords.

Keywords :

Data Analysis; Username-Password Dataset; Data Security 

References :

[1]    Chanda, Katha. (2016). Password Security: An Analysis of Password Strengths and Vulnerabilities. International Journal of Computer Network and Information Security. 8. 23-30. 10.5815/ijcnis.2016.07.04.

[2]    Li, Yue & Wang, Haining& Sun, Kun. (2017). Personal Information in Passwords and Its Security Implications. IEEE Transactions on Information Forensics and Security. PP. 1-1. 10.1109/TIFS.2017.2705627. 

[3]    Cheng, Long & Liu, Fang & Yao, Danfeng. (2017). Enterprise data breach: causes, challenges, prevention, and future directions: Enterprise data breach. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 7. e1211. 10.1002/widm.1211.

[4]    Yıldırım, M., Mackie, I. Encouraging users to improve password security and memorability. Int. J. Inf. Secure. 18, 741–759 (2019). https://doi.org/10.1007/s10207-019-00429-y

[5]     De Cristofaro, Emiliano & Du, Honglu&Freudiger, Julien &Norcie, Greg. (2013). Two-Factor or not Two-Factor? A Comparative Usability Study of Two-Factor Authentication. USEC. 10.14722/usec.2014.23025. 

[6]    Pinkas, Benny & Sander, Tomas. (2003). Securing Passwords Against Dictionary Attacks. Proceedings of the ACM Conference on Computer and Communications Security. 10.1145/586110.586133. 

[7]    Bošnjak, Leon &Sres, J. &Brumen, B.. (2018). Brute-force and dictionary attack on hashed real-world passwords. 1161-1166. 10.23919/MIPRO.2018.8400211.

[8]    2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2018 - Proceedings (2018)

[9]    https://www.kaggle.com/wjburns/common-password-list-rockyoutxt

[10]  https://crackstation.net 

[11]  https://weakpass.com/download 

[12]  https://wiki.skullsecurity.org/Passwords 

[13] Tull, L.. (2002). Library systems and Unicode: A review of the current state of development. 21. 181-185. 

[14] Hahn, Brian & Valentine, Daniel. (2013). ASCII Character Codes. 10.1016/B978-0-12-394398-9.00026-5.

[15] https://github.com/hmaverickadams/breach-parse

[16]  https://www.python.org 

[17] https://github.com/rishab-rb/EDA_Passwords/blob/main/FINAL%20CODE.ipynb

[18]  https://github.com/rishab-rb/EDA_Passwords/blob/main/EDA.ipynb



Cite this Article as :
Vanita Jain , Mahima Swami , Rishab Bansal, Exploratory Data Analysis on Username-Password Dataset, Fusion: Practice and Applications, Vol. 4 , No. 1 , (2021) : 5-14 (Doi   :  https://doi.org/10.54216/FPA.040101)