Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode

Huda Ragheb; Rand Abdulwahid; Dhamyaa A.; Huda; Muthanna Medin; Ibrahim Haider; Burhan Karar

doi:https://doi.org/10.54216/FPA.200102

Full Length Article

Volume 20 • Issue 1 • PP: 12-23 • 2025

Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode

Huda Ragheb Kadhim ^1*

mail

,

Rand Abdulwahid Albeer ¹

mail

,

Dhamyaa A. Nasrawi ¹

mail

,

Huda Hallawi ¹

mail

,

Muthanna Medin Nasser ¹

mail

,

Ibrahim Haider Jabbar ¹

mail

,

Burhan Karar Abbas ¹

mail

¹College of Computer Science and Information Technology, University of Kerbala, Iraq

* Corresponding Author.

DOI https://doi.org/10.54216/FPA.200102

format_quote Cite this article

Received: December 14, 2024 Revised: February 01, 2025 Accepted: April 02, 2025

View PDF open_in_new

Abstract

Data compression technologies play a big role in various areas where efficient data storage and transmission are essential. Data compression is the science of reducing redundant data to a compact form, which used to safely store files or information. On the other side, Unicode is a global standard for the representation of text and symbols in computers. The basic elements of the Unicode standard are code points, which represent a specific symbol. Unicode provides a unified way to map and manage these points to ensure consistent representation and interpretation of text data across different systems, platforms, and languages. This paper proposes a method to compress texts in Arabic, based on Unicode ligatures, which typically join characters together. This method replaces two or more Unicode Arabic ligature characters with a single Unicode Arabic ligature based on their appearance in the Arabic text file, eliminating the need for coding or decoding. The size of the original and output text files has been compared to show the percentage of compression. The selected dataset: Modern Standard Arabic text involves Arabic news, and Classical Arabic text involves Arabic Holy and Honorific text collected from Kaggle. The percentage of compression depends on the frequency of ligature characters in Arabic documents. Unfortunately, the results were not promising, as the method was only able to compress the file to a very small percentage (6.71 %and 12.82 %, respectively, for Arabic news and Arabic Holy text). We think that the proposed method can be improved by using a hybrid technique of text compression in the future; in addition, consider other properties of Arabic Unicode. Programming can express competency concepts in a well-defined mathematical model for a particular.

Keywords

Arabic ligatures characters Unicode Compression Decompression Redundant data Text compression

References

[1] M. J. Haque and M. N. Huda, “Study on data compression technique,” International Journal of Computer Applications, vol. 159, no. 5, pp. 6-13, 2017.

[2] I. M. Pu, Fundamental data compression, Butterworth-Heinemann, 2005.

[3] H. Jani and J. Trivedi, “A survey on different compression techniques algorithm for data compression,” International Journal of Advanced Research in Computer Science and Technology, vol. 2, no. 3, pp. 1-5, 2014.

[4] N. M. Norwawi and A. S. M. Alomoush, “LIGHTWEIGHT VERSION FOR DIGITAL QURAN MODEL BY HANDLING DUPLICATION,” PERINTIS eJournal, vol. 13, no. 1, pp. 69-76, 2023.

[5] P. Raundale, “Comparative Study of Data Compression Techniques,” International Journal of Computer Applications, vol. 178, no. 28, pp. 1-10, 2019.

[6] Z. M. Alasmer, B. M. Zahran, B. A. Ayyoub, M. A. Kanan, A. I. Hammouri, and J. Ababneh, “A Comparison between English and Arabic text compression,” Journal of Contemporary Engineering Sciences, vol. 6, no. 3, pp. 111-119, 2013.

[7] E. A. Jrai, S. Alsharari, L. Almazaydeh, K. Elleithy, and O. Abu-Hamdan, “Improving LZW Compression of Unicode Arabic Text Using Multi-Level Encoding and a Variable-Length Phrase Code,” IEEE Access, vol. 11, pp. 51915-51929, 2023.

[8] I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, and D. Nouvel, “Arabic natural language processing: An overview,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 5, pp. 497-507, 2021.

[9] T. A. Hilal and H. A. Hilal, “Arabic text lossless compression by characters encoding,” Procedia Computer Science, vol. 155, pp. 618-623, 2019.

[10] S. A. Al-Busaeed and U. A. İnan, “A New Arabic Coding Scheme,” International Journal of Engineering and Natural Sciences, vol. 2, no. 3, pp. 22-28.

[11] M. AbuSafiya, “Speeding up Natural Language Text Search using Compression,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 4, 2021.

[12] M. Needleman, “The unicode standard,” Serials Review, vol. 26, no. 2, pp. 51-54, 2000.

[13] T. D. Kamusella, “The Arabic language: A Latin of modernity?,” Journal of Nationalism, Memory and Language Politics, 2017.

[14] D. A. AL-Nasrawi, A. F. Almukhtar, and W. S. AL-Baldawi, “From Arabic Alphabets to Two Dimension Shapes in Kufic Calligraphy Style Using Grid Board Catalog,” Communications in Applied Sciences, vol. 3, no. 2, 2015.

[15] Archived Code Charts, “CodeCharts_16.0”.

[16] B. Vijayalakshmi and N. Sasirekha, “Comparative Analysis of Lossless Text Compression Methods with Novel Tamil Compression Technique,” International Journal of Research in Engineering and Science (IJRES), vol. 9, no. 7, pp. 38-44, 2021.

Cite This Article

Choose your preferred format

format_quote

Kadhim, Huda Ragheb, Albeer, Rand Abdulwahid, Nasrawi, Dhamyaa A., Hallawi, Huda, Nasser, Muthanna Medin, Jabbar, Ibrahim Haider, Abbas, Burhan Karar. "Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode." Fusion: Practice and Applications, vol. Volume 20, no. Issue 1, 2025, pp. 12-23. DOI: https://doi.org/10.54216/FPA.200102

Kadhim, H., Albeer, R., Nasrawi, D., Hallawi, H., Nasser, M., Jabbar, I., Abbas, B. (2025). Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode. Fusion: Practice and Applications, Volume 20(Issue 1), 12-23. DOI: https://doi.org/10.54216/FPA.200102

Kadhim, Huda Ragheb, Albeer, Rand Abdulwahid, Nasrawi, Dhamyaa A., Hallawi, Huda, Nasser, Muthanna Medin, Jabbar, Ibrahim Haider, Abbas, Burhan Karar. "Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode." Fusion: Practice and Applications Volume 20, no. Issue 1 (2025): 12-23. DOI: https://doi.org/10.54216/FPA.200102

Kadhim, H., Albeer, R., Nasrawi, D., Hallawi, H., Nasser, M., Jabbar, I., Abbas, B. (2025) 'Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode', Fusion: Practice and Applications, Volume 20(Issue 1), pp. 12-23. DOI: https://doi.org/10.54216/FPA.200102

Kadhim H, Albeer R, Nasrawi D, Hallawi H, Nasser M, Jabbar I, Abbas B. Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode. Fusion: Practice and Applications. 2025;Volume 20(Issue 1):12-23. DOI: https://doi.org/10.54216/FPA.200102

H. Kadhim, R. Albeer, D. Nasrawi, H. Hallawi, M. Nasser, I. Jabbar, B. Abbas, "Lossless Compression without Coding and Decoding using Arabic Ligature Characters Unicode," Fusion: Practice and Applications, vol. Volume 20, no. Issue 1, pp. 12-23, 2025. DOI: https://doi.org/10.54216/FPA.200102

Digital Archive Ready