Handling missing values and clustering industrial liquid waste using K-medoids
Indonesian Journal of Electrical Engineering and Computer Science

Abstract
The textile industry is a significant contributor to environmental pollution due to its wastewater, which contains hazardous substances such as dyes, heavy metals, and chemicals that can severely harm aquatic ecosystems. Effective management of this wastewater is crucial to mitigate its environmental impact. This study focuses on classifying industrial liquid waste data using the K-medoids clustering method, chosen for its robustness to noise and outliers compared to K-means. To address challenges in wastewater data processing, such as missing values and varying data scales, two approaches are compared: replacing missing values with zero and K-nearest neighbors (KNN) imputation, alongside Z-score normalization for data uniformity. The clustering quality is evaluated using the Davies-Bouldin index (DBI) for cluster variations of k=2, 3, 4, and 5. The results show that the best clustering quality is achieved at k=2, with the smallest DBI values obtained using KNN imputation (0.139) and zero replacement (0.149). The superior performance of KNN imputation highlights its effectiveness in handling missing data. These findings provide valuable insights into the characteristics of textile industry wastewater pollution, offering a robust framework for effective wastewater management. The study concludes with practical recommendations for policymakers and industry stakeholders to adopt advanced data-driven approaches for sustainable wastewater treatment strategies.
Discover Our Library
Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.
