Full Length Article
DOI: https://doi.org/10.54216/JISIoT.180216
Assessing Quality Attributes of Microservices in Hadoop and Spark Clusters: A Performance Benchmarking Approach in Dockerized and Non-Dockerized Architectures
The rapid expansion of big data has accelerated the adoption of distributed computing frame- works such as Apache Hadoop and Apache Spark, enabling efficient large-scale data processing. While Spark’s in-memory computation model significantly enhances performance compared to Hadoop’s traditional MapReduce, the deployment architecture—whether Dockerized or non- Dockerized—plays a crucial role in affecting performance, scalability, and resource management. This study evaluates the impact of containerized and non-containerized multi-node cluster architectures on the performance of Hadoop and Spark, utilizing standardized workloads such as WordCount and TeraSort. Key performance metrics, including execution time, throughput, and resource utilization, are analyzed across various configurations with parameter tuning. Beyond pure performance benchmarking, the study also assesses the quality attributes of microservices in big data environments, focusing on scalability, maintainability, fault tolerance, and resource efficiency. The comparative analysis between monolithic and microservice-based architectures highlights the advantages of modularity and independent scaling inherent to microservices. Experimental findings indicate that Spark outperforms Hadoop on small to medium-scale workloads, while Hadoop exhibits superior robustness for processing extremely large datasets. Dockerized deployments offer better resource isolation and management flexibility, whereas non-Dockerized setups demonstrate reduced overhead under certain configurations. These insights contribute to optimizing deployment strategies and architectural decisions for microservices-based big data processing frameworks.
Saad Hussein Abed Hamed,
Mondher Frikha,
Heni Bouhamed
visibility
2464
download
1808