DNA Sequence Identification via Biologically Guided Feature Engineering and Hybrid ML–LSTM Networks

 

 

 

Marwa Mawfaq Mohamedsheet Al-Hatab1, Maysaloon Abed Qasim2, Sinan S. Mohammed Sheet1

 

1Technical Engineering College, Northern Technical University, Mosul, Iraq

 

2Technical Engineering College for Computer and Artificial Intelligence, Northern Technical University, Mosul, Iraq

 

Emails: marwa.alhatab@ntu.edu.iq; maysloon.alhashim@ntu.edu.iq; sinan_sm76@ntu.edu.iq

 

 

Abstract

The promoter is the part of DNA, which is responsible of initiating RNA polymerase transcription of a gene. The location of this part of DNA is upstream the transcription start site. According to researches, the genetic promotors contribute majorly in many human diseases such as cancer, diabetes and Huntington’s disease. Therefore, promotor detection corresponds as a very crucial task. In this study, a hypered detection system, which integrates biologically developed feature extraction with traditional machine learning (ML) algorithms in addition to use Long Short-Term Memory (LSTM) network as a deep learning approach, has been proposed. The dataset used includes 106 nucleotide sequences. Results obtained from the study show that the perfect performance across all metrics (accuracy, sensitivity, specificity, precision, and F1-score) has been achieved when Naive Bayes used as a classifier, which reach 100% and AUC=1.The confusion matrix analyses and ROC curve confirm that LSTM model achieved 100% training accuracy and 84.38% test accuracy. The architecture and performance of the proposed model make it applicable in IoT-based intelligent genomic and healthcare systems, which enabling real-time and remote promoter detection.

 

 

 

Received: March 14, 2025 Revised: June 02, 2025 Accepted: July 10, 2025

 

Keywords: Promoter detection; Machine learning; LSTM