The strategic importance of customer retention in small and medium-sized enterprises (SMEs) is due to the fact that the resources are limited, and the indiscriminate customer acquisition and customer retention campaigns are economically inefficient. However, the descriptive reporting used by many SMEs does not have the advantages of transactiondriven analytics that allows differentiating between high-value and low-yield customer relationships. This paper creates a repli-cable customer-analytics pipeline in SME-type retail environments, using publicly available transactional data. In con-trast to the macro-level forecasting research, the paper integrates customer value segmentation with the futureoriented repeat-purchase prediction and translates the results into retention actions explicitly. The customer-level features were based on invoices, quantities, prices, product variety, and return behavior and were derived using the public Online Retail dataset. Observation windows on a monthly were transformed into a repeat-purchase 90-day problem. Three predictive models—logistic regression, random forest, and gradient boosting—were compared after customer segmentation based on recency, frequency, and monetary behavior. The findings indicate that random forest model had the highest discrimination (ROC-AUC = 0.750; PR-AUC = 0.821), followed by logistic regression, which was only slightly less than it and more interpretable. Segment analysis also showed a very concentrated revenue base with Champions having 27.5 percent of the customers but 67.2 percent of recent revenue and 81.0 rate of repeat purchasing. The paper provides a submission-ready, transparently reproducible, and managerially understandable design that is particularly applicable in SMEs that want low-cost retention analytics, customer ranking, and allocation of marketing resources.
link
https://doi.org/10.54216/AJBOR.140201