Indonesian J our nal of Electrical Engineering and Computer Science V ol. 38, No. 2, May 2025, pp. 1073 1085 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v38.i2.pp1073-1085 1073 Enhancing SDN security using ensemble-based machine lear ning appr oach f or DDoS attack detection Abdinasir Hirsi 1 , Lukman A udah 1,2 , Adeb Salh 3 , Mohammed A. Alhartomi 4 , Salman Ahmed 5 1 Adv anced T elecommunication Research Center (A TRC), F aculty of Electrical and Electronic Engineering, Uni v ersiti T un Hussein Onn Malaysia, P arit Raja, Malaysia 2 F aculty of Electrical Engineering, Uni v ersiti T un Hussein Onn Malaysia, P arit Raja, Malaysia 3 F aculty of Information and Communication T echnology , Uni v ersity T unku Abdul Rahman (UT AR), Kampar , Malaysia 4 Department of Electrical Engineering, Uni v ersity of T ab uk, T ab uk, Saudi Arabia 5 VLSI and Embedded T echnology (VEST) F ocus Group, F aculty of Electrical and Electronic Engineering, Uni v ersiti T un Hussein Onn Malaysia, P arit Raja, Malaysia Article Inf o Article history: Recei v ed Jun 14, 2024 Re vised No v 5, 2024 Accepted No v 11, 2024 K eyw ords: Dataset of SDN DDoS attacks Ensemble machine learning Principal component analysis SDN security ABSTRA CT Softw are-dened netw orking (SDN) is a groundbreaking technology that trans- forms traditional netw ork frame w orks by separating the control plane from the data plane, thereby enabli ng e xible and ef cient netw ork management. Despite its adv antages, SDN introduces vulnerabilities, particularly distrib uted denial of service (DDoS) attacks. Existing studies ha v e used single, h ybrid, and ensemble machine learning (ML) techniques to addr ess attacks, often r elying on generated datasets that cannot be tested because of accessibility issues. A major contrib u- tion of this study is the creation of a no v el, publicly accessible dataset, and benchmarking the proposed approach ag ainst e xist ing public datasets to demon- strate its ef fecti v eness. This paper proposes a no v el approach that combines ensemble le arning models with principal component analysis (PCA) for fea- ture selection. The inte gration of ensemble learning models enhances predicti v e performance by le v eraging multiple algorithms to impro v e accurac y and rob ust- ness. The results sho wed that the ensemble of random forests (ENRF) model achie v ed the highest performance across all metrics with 100% accurac y , preci- sion, recall, and F1-score. This study pro vides a comprehensi v e solution to the limitations of e xisting models by of fering superior performance, as e videnced by the comparati v e analysis, establishing this approach as the best among the e v aluated models. This is an open access article under the CC BY -SA license . Corresponding A uthor: Lukman Audah F aculty of Engineering T echnology , Uni v ersiti T un Hussein Onn Malaysia P arit Raja, 86400, Johor , Malaysia Email: hanif@uthm.edu.my 1. INTR ODUCTION Distrib uted denial-of-service (DDoS) attacks are web attacks that are designed to disrupt services and den y le gitimate user access [1]. These attacks o v erwhelm the tar gets with e xcessi v e traf c, causing service outages [2]. The y can tar get v arious layers of the OSI model, making it v ersatile and challenging to mitig ate [3]. DDoS methods ha v e e v olv ed and ha v e become increasingly sophisticated o v er time [4]. High-rate DDoS attacks generate massi v e traf c v olumes to o v erwhelm tar gets quickly , whereas lo w-rate DDoS attacks use minimal traf c to e v ade detection and gradually de grade the performance [5], [6]. High-rate attacks are easier J ournal homepage: http://ijeecs.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
1074 ISSN: 2502-4752 to detect, b ut cause immediate disruption, whereas lo w-rate attacks are stealthier and persist longer [7], [8]. Despite adv ancements in security , DDoS attacks remain a si gnicant threat to softw are-dened netw orking (SDN) o wing to their cent ralized control and programmability [9]. Attack ers use handlers to control compro- mised systems and install mal w are within SDNs [10]. These compromised systems, or “zombies” form botnets that launch coordinated DDoS attacks [11]. V arious solutions ha v e been proposed, including traditional security measures, mo ving tar get defense strate gies, and AI-based methods, such as machine learning (ML) and deep learning (DL) [12]. Although man y studies ha v e focused on single or h ybri d ML models for DDoS detection, de v eloping ensemble learning methods is crucial for impro ving accurac y [13]. In addition, studies of [14]-[18] ha v e used generated datasets that are not publicly a v ailable, thereby l imiting thei r reproducibilit y . W e used a no v el dataset that is publicly accessible in the Mendele y data repository , allo wing for broader testing and v alidat ion [19]. The core idea of t his study is to impro v e the detection of DDoS attacks by de v eloping an ensemble ML fra me w ork that inte grates multiple classiers and le v erages their combined strengths to impro v e the accurac y . It e v aluates the ef fecti v eness of tradit ional ML methods and incorporat es principal component analysis (PCA) for optimized feature selection. The proposed approach w as compared with e xisting DDoS detection techniques using no v el and CICDDoS19 datasets. Furthermore, this study pro vides a rob ust solution for mitig ating DDoS threats and contrib utes v aluable insights and resources to the c ybersecurity eld. T o the best of our kno wledge, this study uniquely mer ges the assessment of v arious ML methods, de v elopment of an ensemble frame w ork, and performance comparison using PCA wit hin a single study . The main contrib utions of this study are as follo ws. - Ef fecti v eness assessment: e v aluate the ef fecti v eness of v arious machine learning methods in detecting DDoS attacks. - Ensemble frame w ork de v elopment: de v elop an ensemble-based machine learning frame w ork that inte grates multiple classiers to enhance detection precision. - Feature selection with PCA: emplo y PCA for feature selection to impro v e model performance by reducing dimensionality and retaining essential features. - No v el dataset: a major contrib ution of this study is the creation of a no v el, publicly accessible dataset that addresses reproducibility issues found in pre vious studies. - Performance comparison: the performance of the proposed ensemble approach w as compared with e xisting DDoS detection techniques using our no v el publicly accessible dataset and the CICDDoS19 dataset. The remainder of this paper is structured as follo ws: section 2 co v ers related w ork; section 3 discusses the proposed model de v elopment frame w ork for SDN security; section 4 details the e xperimental setup and per - formance e v aluation; and section 5 concludes the study with future w ork. 2. RELA TED W ORKS The research community greatly appreciates its pioneering w ork on ML models that proacti v ely and reacti v ely defend ag ainst DDoS attacks in SDN en vironments. These mechanisms enhance netw ork security by identifying and pre v enting DDoS attacks on di v erse infrastructure, including wired, wireless, mobile, and sensor netw orks. Their research has not only adv anced theory , b ut also practical solutions to combat these pre v alent security threats. K umar and Selv akumar [20] proposed adapti v e learning mechan- ics to detect DDoS attacks. The ensemble approach combines multiple classiers to reduce errors and im- pro v e detection capabilities. F or detection accurac y , the KDD dataset achie v ed 98.2% accurac y , and the mix ed traf c dataset achie v ed 98.8% and 99.2% on the SSENET2011 dataset. In addition, the NFBoost al- gorithm achie v ed a signicantly lo wer f alse positi v e rate than the other methods, with an impro v ement of up to 78.26%. Some studies ha v e focused on enhancing the accurac y of intrusion detection syst ems (IDS) in classifying traf c as normal or malicious. F or e xample, Jabbar et al. [21] e xpounded that the random forest (RF) a v erage one-dependence estimator (RF A ODE) ensemble classier signicantly impro v es the ac- curac y and reduces the error rate of IDS compared to indi vidual classiers such as A ODE, Na ¨ ıv e Bayes (NB), and RF . RF A ODE achie v ed an accurac y of 90.51% and a f alse alarm rate (F AR) of 0.14% using the K yoto dataset. The analysis used 15 of the 24 a v ailable features. Shirmarz et al. [22] introduced a ne w ensemble approach combining decision tree (DT), K-nearest neighbor (KNN), and support v ector machine (SVM) techniques. This method aims to impro v e SDN control threat s. The ensemble achie v ed an accurac y of 99.4%, despite the results of the indi vidual classiers. Additionally , the system maintained a lo w f alse- Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 2, May 2025: 1073–1085 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1075 positi v e rate, making it practical for real-w orld applications. PCA w as emplo yed to re du c e the feature set from 76 to 24, thereby enhancing classier performance. Firdaus et al. [23] introduced ensemble technique that inte grates K-means clustering and RF classication to impro v e the detection accurac y of service disruption at- tacks in SDN en vironment. This study achie v ed a higher detection accurac y and lo wer f alse positi v e rate (FPR) compared to traditional methods. Experiments were conducted using specied hardw are and softw are setups to ensure the v alidity of the results. Alashhab et al. [24] reported mitig ation of o v erloading attacks using online ensemble method in SDN netw ork. The prototype addresses the limitations of traditional stati c mechanisms by incorporating online learning approaches to adapt to e v olving attack patterns in real-time. The system attained accurac y of 99.2% for an y type of denial attack. Ov erall, their w ork contrib utes to handling zero-day , lo w-rate, e v olving disrupti v e traf c. Finally , Christila and Si v akumar [25], multilayer ensemble learning w as proposed to boost service attacks of an SDN controller . Multiple ensemble methods pro vided impro v ed stability . 3. PR OPOSED MODEL DEVELOPMENT FRAMEW ORK FOR SDN SECURITY In this section, we present the propos ed model de v elopment pipeline to enhance SDN security , as sho wn in Figure 1. The pipeline consists of eight phases, each meticulously designed to ensure a rob ust and ef cient model for detecting and mitig ating threats in SDN en vironments. Training various dataset Other Public Dataset Novel Dataset Dataset 1 Preprocessing Eliminating duplicates Normalize numerical columns 2 Feature Selection Feature Selection using PCA 3 Model   Evaluation 7 PRC ACC RCL F1 Evaluation Attack Normal Classification Ensemble Method Selection Various Ensemble learning models   4 Hyperparameter Tuning 5 Optimizing ensemble methods Training and   Testing Testing Training 6 Comparison with Existing models 8 Figure 1. The proposed model de v elopment frame w ork for SDN security illustrates the phases from dataset compilation, preprocessing, and feature selection through ensemble method selection, h yperparameter tuning, training and testing, model e v aluation, and comparison with e xisting models 3.1. Dataset In the rst phase of the project, we collected a dataset that included both proprietary data and CICD- DoS2019 dataset. This pro vides a thorough o v ervie w of possible netw ork threats and a strong basis for the ne xt steps. 3.1.1. Generated dataset W e created a ne w dataset using Mininet, resulting in 1,048,757 ro ws and 21 columns. Our setup in- cludes 12 switches, an R YU controller , and 24 host de vices. The process in v olv es designing a realistic netw ork topology in Mininet, conguring it, and using an R YU controller to manage traf c. W e used the MGEN and hping3 tools to generate v arious types of netw ork traf c, including DDoS attacks. Flo w statistics were recorded e v ery 30 s and sa v ed in a CSV le called “SDN-DDoS T raf c Dataset.csv , which is a v ailable in Me nd e le y . The data were then cleaned and normalized to prepare for a nalysis. T able 1 outlines the DDoS attacks and features included in this dataset. In addition, T able 2 compares v arious generated datasets, highlighting the features, controllers, attack tools, and en vironments used in each study . Enhancing SDN security using ensemble-based mac hine learning ... (Abdinasir Hir si) Evaluation Warning : The document was created with Spire.PDF for Python.
1076 ISSN: 2502-4752 T able 1. Comparison of dif ferent datasets and attacks Dataset At tacks Instance No. of features TCP 350358 16 No v el dataset UDP 348790 16 ICM 349727 16 UDP ood 3125400 21 CICDDoS2019 SYN ood 1851263 21 UDPlag 625243 21 3.1.2. CIC-DDoS2019 Researchers oft en use dif ferent datasets to test DDoS attack detection models; ho we v er , some of these datasets are outdated. Furthermore, the CIC-DDoS2019 dataset is a recent and widely accepted resource for netw ork security [26]-[29]. It includes both normal and malicious traf c and of fers a comprehensi v e tool for e v aluating DDoS detection methods. This dataset w as created using CICFlo wmeter v3, which e xtracts features such as o w dura tion, total forw ard pack ets, total backw ard pack ets, and pack et length dis trib ution. These features f acilitate a thorough traf c analysis and enhance the ef fecti v eness of DDoS detection models. T able 2. Comparison of our no v el dataset with other e xisting datasets Ref. Dataset Features Controller Attack tools SDN en vironment [23] InSDN 15 R YU Hping3 Mininet using 4 OvS switches [24] Custom dataset 22 R YU Scap y , Iperf, and Hping3 Mininet using 80 hosts [25] Custom dataset Not mentioned R YU Hping3 Mininet emulator [29] InSDN 77 ONOS Tcpdump, hping3, and LOIC Mininet with 1 OvS This paper SDN-DDoS dataset 21 R YU MGEN and Hping3 Mininet with 12 OvS switches 3.2. Pr epr ocessing During pre-processing, we cleaned and transformed the ra w data. This in v olv ed handling missing v alues, normalizing the data, and encoding cate gorical v ariables to prepare the dataset for analysi s and feature selection. 3.3. F eatur e selection A critical aspect of our methodology is the selection of features used for training ML models. Gi v en the v ast amount of data generated, we encountered the challenge of limited feature space and computational comple xity . The concept of a limited feature space refers to the restriction on the number of fe atures that can be feasibly processed and analyzed o wing to computational constraints and the risk of o v ertting. T o address this issue, we used PCA from the “sklearn.decomposition” module to select important features and reduce the dataset’ s comple xity [30], [31]. PCA remo v es redundant and irrel e v ant features, thereby impro ving model performance [32]. In our study , we congured PCA to maintain 20 k e y components. This is represented by (1). P C A = P C A ( N o ofcomponents = 20) (1) This conguration reduced to 20 features, which encapsulated the most signicant v ariance in the data. T o identify the most inuential features from the original dataset, we applied the follo wing method in (2). Selected Features = X.Columns [ PCA.Componenets . argmax( axis = 1)] (2) This technique identies the original features with the highest contrib ution to each of the 20 prin- cipal components. The “pca.components attrib ute represents the principal ax es in the feature space, and “ar gmax(axis=1)” locates the feature with the maximum weight for each component. As a result, “selected features” li sts the most critical origi n a l features, allo wing for a more focused and ef fecti v e anal- ysis. Ov erall, PCA of fers signicant benets and assumes that the principal components capture the linear relationships among features. In cases where the underlying relationships are nonlinear , PCA may not ef fec- ti v ely capture comple x interactions, potentially leading to suboptimal feature representation. T able 3 lists the features e xtracted in the e xperiments. These features were selected to pro vide a comprehensi v e representation of the netw ork traf c, enabling ef fecti v e DDoS detection. Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 2, May 2025: 1073–1085 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1077 T able 3. Recorded features of the datasets Extracted features P ack et count per o w , o w duration (minutes), source IP address, port bandwidth usage, aggre g ate duration, destination IP address, pack et transmission rate, o w count, P ack et in messages count, bytes per o w , port number , o w duration (seconds), total pack et count, transmitted byte v olume, byte accumulation, recei v ed byte v olume 3.4. Ensemble method selection The fourth phase in v olv ed selecting an appropriate ensemble method. Ensemble methods that combine the predictions of multiple models are chosen to l e v erage their ability to impro v e the accurac y and rob ustness o v er single models [33]. V arious ensemble techniques were e v aluated to identify the most ef fecti v e approach to the dataset. RF emer ged as the best-performing model for our purposes. RF operates by constructing a multitude of DT during training and outputting the class that is the mode of the classes (classication) or the mean prediction (re gression) of the indi vidual trees [34]. Ensemble of random forest (ENRF) le v erages this mechanism to ef fecti v ely det ect DDoS at tacks. Each tree is trained on a random subs et of the dataset t o ensure di v ersity among the trees. During detection, an incoming pack et is passed through all decision trees, and each tree independently classies it as either normal or abnormal. The detection process in ENRF is as follo ws: - P ack et e v aluation: each pack et is e v aluated by all decision trees in the forest. - Majority v oting: each tree pro vides a v ote on whether a pack et is normal (benign) or abnormal (malicious). The nal classication is determined based on the majority v ote of the trees. - Anomaly detection: by combining the outputs of multiple trees, ENRF enhances the rob ustness and accu- rac y of DDoS detection, reducing the lik elihood of f alse positi v es and ne g ati v es. Algorithm 1 ef fecti v ely identies normal and abnormal pack ets by learning the patterns and char ac- teristics of benign and mal icious traf c from a dataset. Specically , the RF DDoS detection algorithm w as applied to our dataset to distinguish between benign and malicious attacks, thereby demonstrating its ef cac y in identifying DDoS threats. This ensemble approach ensures that the model generalizes well to unseen data and maintains a high performance in real-w orld scenarios. Furthermore, Figure 2 illustrates the w orko w pro- cess for each recei v ed pack et, detailing the steps from pack et arri v al to pack et handling, using a Python script in the R YU controller . Algorithm 1. Ensemble of decision trees for DDoS detection 1: Initialize the Ensemble: Initialize a set T of decision trees. 2: Build the Decision T r ees: 3: f or t = 1 to T do 4: F eatur e Selection: Randomly sample m features from the input features. 5: T r ee Construction: Construct a ne w decision tree D t by recursi v ely partitioning the dataset based on the selected features. 6: At each node: 7: Select the feature that maximizes the information g ain. 8: Continue splitting until the maximum tree depth is reached or all instances belong to the same class. 9: Add T r ee to Ensemble: Add D t to the ensemble. 10: end f or 11: Classify Instances: 12: f or each instance x i in the training set do 13: F eatur e Extraction: Generate a feature v ector z i by e xtracting rele v ant features using PCA. 14: Pr ediction with T r ees: F or each decision tree D t , determine the class prediction y i,t by follo wing the decision path of x i . 15: Aggr egate Pr edictions: Combine predictions to deri v e y i : 16: if majority of the trees predict y i = 1 then 17: classify x i as a DDoS instance. 18: else 19: classify x i as a normal instance. 20: end if 21: end f or 22: Output the Ensemble: Pro vide the ensemble of decision trees as the nal output. Enhancing SDN security using ensemble-based mac hine learning ... (Abdinasir Hir si) Evaluation Warning : The document was created with Spire.PDF for Python.
1078 ISSN: 2502-4752 Packet Arrival OpenFlow Switch   receives a packet Flow Table Check   Match found:   Forward No match:   Forward   to controller SDN Controller Processing Capture packet in Traffic Collector Module Preprocess packet data Ensemble ML-Based IDS Mirror packet   to controller IDS classifies packet: Not DDoS: Forward   to destination Alert SDN controller Packet Handling Block packet   based on   new flow rule Figure 2. W orko w process for each recei v ed pack et 3.5. Hyper parameter tuning Hyperparameter tuning is critical for optimizing the performance of a selected ensemble method [35]. This phase in v olv es systematically adjusting the h yperparameters t o determine the best conguration that max- imizes the predicti v e po wer of the model while a v oiding o v ertting. The RF classier in this study w as con- gured with specic h yperparameters to enhance the model performance. The model w as constructed with 10 estimators, and bootstrap sampling w as utilized with the Gini impurity criterion to e v aluate split quality . The number of features considered at each split w as set to the square root of the total number of features, with no constraints on the maximum depth of the trees. The minimum number of samples required to split a node w as set to 2, and the minimum number of samples for a leaf node w as 1. No minimum decrease in impurity w as mandated for a split to occur . The random state w as x ed at 42 to ensure reproducibility and the model w as operated on a single processor . The model did not emplo y out-of-bag scoring or w arm starts, and the def ault settings were used for the minimum weight fraction of lea v es, maximum number of leaf nodes, class weights, and v erbosity le v el. 3.6. T raining and testing The dataset w as di vided into tw o parts, 80% for training and 20% for testing. This ensures that the model is well trained while maintaining suf cient data for an objecti v e e v aluation. 3.7. Model e v aluation The scheme’ s performance w as measured using metrics lik e accurac y , precision, recall, and F1-score to e v aluate ho w well it detects and pre v ents security threats in an SDN en vironment. 3.8. Comparison with existing models W e compared our ne w system with e xisting ML models from recent studies. This comparison high- lights the impro v ements and ef fecti v eness of the proposed approach in enhancing SDN security . 4. EXPERIMENT AL SETUP AND PERFORMANCE EV ALU A TION W e used the scikit-learn library for machine learning algorithms and performance e v aluations because of its e xtensi v e range of ef cient tools for data analysis. Scikit-learn, b uilt on NumPy , SciPy , and matplotlib, of fers a wide v ariety of adv anced ML models [36]. Its well-documented API mak es it easy to i nte grate into data processing w orko ws. In this study , we emplo yed ensembl e models such as RF , gradient boosting (GB), and bagging (B A) to enhance the performance by combining multiple algorithms. 4.1. P erf ormance metrics and e v aluation W e e v aluated the model using v arious metrics, including accurac y (A CC), precision (PRC), recall (RCL), F1-score (F1), area under the curv e (A UC), FPR, and true positi v e rat e (TPR). These metrics pro vide a comprehensi v e e v aluation of the performance of the model in v arious aspects of classication. Accurac y measures the proportion of correctly classied instances among all instances. This w as calculated using in the (3). AC C = T P + T N T P + T N + F P + F N (3) The acron yms true positi v es (TP), true ne g ati v es (TN), f al se positi v es (FP), and f alse ne g ati v es (FN) represent true pos iti v es, true ne g ati v es, f alse positi v es, and f alse ne g ati v es, respecti v ely . A CC w as us ed to pro vi de general Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 2, May 2025: 1073–1085 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1079 model correctness. PRC, also kno wn as the positi v e predicti v e v alue, indicates the proportion of TP predictions among all positi v e predictions. It is dened as (4). P R C = T P T P + F P (4) Precision is another important parameter to measure the ability to identify actual attacks without f als ely alarm- ing benign traf c. RCL or sensiti vity measures the proportion of actual positi v es correctly identied by the model. The equation for recall is as (5). R C L = T P T P + F N (5) Recall reects the ef fecti v eness of the model in identifying denial of attacks. The F1-score is the harmonic mean of the precision and recall, pro viding a balance between the tw o metrics. It is calculated as (6). F 1 = 2 P R C R C L P R C + R C L (6) The F1-score is v aluable in scenarios where we need to balance precision and recall, which are essential in- service attacks to ensure both true attack detection and the minimization of f alse alarms. A UC represents the de gree or measure of separability , sho wing ho w well the model can distinguish between classes. This w as deri v ed from the recei v er operating characteristic (R OC) curv e. A higher A UC indicates a better performance of the model in dif ferentiating between the positi v e and ne g ati v e classes. The FPR is calculated as (7). F P R = F P F P + T N (7) The TPR, or recall, is calculated as (8). T P R = T P T P + F N (8) The confusion matrix, detailed in T able 4, is a crucial component for e v aluating the performance of our clas- sication system. It delineates the results of the cla ssication process and cate gorizes the outcomes into four distinct types: TP , TN, FP , and FN. T able 4. Confusion matrix outcomes Cate gory Explanation Outcome TP Instances where the model correctly identies a DDoS attack. Successful i dentication of an actual DDoS attack, ensuring appro- priate countermeasures are acti v ated. TN Instances where the model accurately recognizes le- gitimate, non-attack traf c. Accurate recognition of non-attack traf c, allo wing normal opera- tions to proceed without disruption. FP Instances where the model incorrectly ags normal traf c as a DDoS attack, leading to f alse alerts. Incorrect identication of normal traf c as an attack, which could lead to unnecessary interv entions and alert f atigue. FN Instances where the model f ails to detect an actual DDoS attack, posing a potential security risk. F ailure to detect an attack, which can result in undetected mali cious acti vities and potential netw ork breaches. 4.2. P erf ormance analysis and r esults Figure 3 and T able 5 sho w the performance metrics (A CC, PRC, RCL, and F1-score) of the v arious ML models for DDoS attack detection: ENRF , fuzzy neural netw ork (FNN), SVM, generalized linear model (GLM), NB, and XGBoost. Notable performance impro v ements across these models were partly due to the feature selection process using PCA. The ENRF model achie v ed a perfect score across all metrics (100.0%), indicating its e xceptional ef fecti v eness in distinguishing between DDoS attacks and le gitimate traf c without an y f alse positi v es or f alse ne g ati v es, making it ideal for critical security applications. The FNN model achie v ed an accurac y of 99.84%, precision of 96.61%, recall of 96.74%, and F1-score of 96.36%, indicating that it is suitable for en vironments where slight misclassications are tolerable. The SVM model performed e xception- ally well, achie ving 99.92% across all metrics, making it highly ef fecti v e in detecting attacks. In contrast, the Enhancing SDN security using ensemble-based mac hine learning ... (Abdinasir Hir si) Evaluation Warning : The document was created with Spire.PDF for Python.
1080 ISSN: 2502-4752 GLM model achie v ed 84.34% accurac y , indicating the challenges in distinguishing between attack and non- attack traf c o wing to its linear nature. The NB model had an accurac y of 96.85%, with a precision of 85.33%, recall of 82.14%, and F1-score of 80.76%, suggesting a moderate performance with a higher rate of f alse pos- iti v es. The XGBoost model also performed impressi v ely , with 99.74% accurac y , 99.95% precision, 99.84% recall, and an F1-score of 90.15%. Despite a slight drop in the F1-score compared with ENR F and SVM, its high precision and recall, along with computational ef cienc y , mak e it compatible with lar ge-scale SDN en vi- ronments. Ov erall, the use of PCA for feature selection played a critical role in enhancing the performance of these models. Accuracy ENRF FNN SVM GLM NB XGBoost ML Models 0 20 40 60 80 100 Percentage Precision ENRF FNN SVM GLM NB XGBoost ML Models 0 20 40 60 80 100 Percentage Recall ENRF FNN SVM GLM NB XGBoost ML Models 0 20 40 60 80 100 Percentage F1-Score ENRF FNN SVM GLM NB XGBoost ML Models 0 20 40 60 80 100 Percentage Figure 3. Performance of ENRF and other ML models for DDoS attack detection T able 5. Performance metrics of dif ferent models Model Accurac y Prec ision Recall F1-score ENRF 100.0% 100.0% 100.0% 100.0% FNN 99.84% 96.61% 96.74% 96.36% SVM 99.92% 99.84% 99.84% 99.84% GLM 85.87% 84.34% 84.34% 84.34% NB 96.85% 85.33% 82.14% 80.76% XGBoost 99.74% 99.95% 99.84% 90.15% T able 6 presents the performance metrics of ensemble-based ML classiers. The RF classier demon- strated e xceptional performance, with a recall and F1-score of 1.0, indicating a wless detection of DDoS attacks and no FN. An FPR of 0.0000 conrmed its precision, as there were no FP . Furthermore, the lo w testing time of 0.25364 s underscores RF’ s suitability of RF for real-time DDoS detection, o wing to its high accurac y and ef cienc y . The GB classier also performed commendably , with a recall and F1-score of 0.99, reecting high accurac y in detecting attacks. An FPR of 0.0045 w as minimal, indicating a v ery lo w rate of f alse alarms. Although the testing time for GB w as 0.53461 s, it remained acceptable for practical applications. The slight increase in testing time w as of fset by its near -perfect classication performance, making GB a strong candidate for DDoS detection. B A, e xhibits a recall of 0.98 and an F1-score of 0.97, which are mar ginally lo wer than those of RF and GB. An FPR of 0.0085 suggests a higher f alse-positi v e rate, leading to more f alse alarms. The most signicant dra wback of the B A is its testing time, which is substantially higher at 10.23563 s. This Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 2, May 2025: 1073–1085 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 1081 e xtended duration could be a limitation in scenarios that require rapid detection. Despite its high accurac y , the increased computational cost and potential for more f alse positi v es may limit B A s practical use i n real-time DDoS detection. Ov erall, the anal ysis re v ealed that RF of fe rs the best balance between accurac y , precision, and computational ef cienc y , making it the most suitable for real-time applications. GB pro vides nearly equi v- alent performance with a slight increase in testing time, making it a viable alternati v e when precision is critical and minor delays are acceptable. Con v ersely , bootstrap aggre g ation (B A), while ef fecti v e, incurs a signicant computational o v erhead, hindering its applicability in time-sensiti v e en vironments. T able 6. Performance metrics of ensemble based ML classiers Ensemble based ML classiers Recall FPR F1-score T esting time RF 1.0 0.0000 1.0 0.25364 GB 0.99 0.0045 0.99 0.53461 B A 0.98 0.0085 0.97 10.23563 Figure 4 presents the R OC curv es for v arious ML models e v aluated for their ef fecti v eness in dete cting DDoS attacks. The models included RF (A UC = 1.000), GB (A UC = 0.987), B A (A UC = 0.983), GLM (A UC = 0.879), SVM (A UC = 0.953), FNN (A UC = 0.929), NB (A UC = 0.970), and XGBoost (A UC = 0.930). Ev ery curv e illustrates the trade-of f between the TPR and F PR for dif ferent threshold settings. The RF model conrmed perfect discrimination with an A UC of 1.000, indicating that RF can dif ferentiate benign and malicious pack ets without an y f alse positi v es or ne g ati v es. This performance le v el is optimal for critical security applications that require precision. The performance of the GB and B A models is e x emplary , as e videnced by their A UC v alues of 0.987 and 0.983, respecti v ely . These frame w orks are acceptable for real-time DDoS detection because the y balance a high TPR with a lo w FPR. In contrast, the GLM model, with an A UC of 0.879, sho wed relati v ely lo wer performance. This may be due to the linear nature of GLM, which could struggle to capture the nonlinear patterns inherent in the DDoS attack data. The SVM and FNN models, with A UC v alues of 0.953 and 0.929, respecti v ely , demonstrated strong performance, b ut still fell short of the ensemble methods. Notably , the NB model (A UC = 0.970) and XGBoost (A UC = 0.930) also sho wed high ef fecti v eness, although their slightly lo wer A UC v alues suggest a tradeof f between design simplicity and computational ef cienc y . Ov erall, our results highlight the critical role of adv anced ML techniques in enhancing netw ork security and mitig ating risks associated with DDoS attacks. Figure 5 depicts the performance metrics of the RF model on the no v el and CIC-DDoS2019 datase ts. The model achie v ed perfect scores across all metrics for both datasets, with v alues of 1.0 for A CC, PRC, RCL, F1-score, and A UC. This indicates that the RF accurately identied DDoS attacks and normal traf c without errors. Moreo v er , the model performed well on b ot h datasets, thereby demonstrating its reliability and adaptability . Such performance is essential for real-time DDoS detection systems to maintain accurac y and a v oid f alse alarms, thereby ensuring timely threat mitig ation. 4.3. Comparati v e analysis of DDoS detection techniques T able 7 presents a summary of se v eral schemes, sho wing k e y performance metrics, such as A CC, PRC, RCL, and F1-score. NFBoost, as referenced in [20], conrmed an accurac y of 98.2%. The REA ODE model in [21] has an accurac y of 90.51%. According to [22], the boosting ensemble classier achie v ed an accurac y of 99.4%. the authors in [23] do not pro vided the custom dataset. The researches [20]-[23] did not specify the precision, recall, and F1-score for this model. The y indicated a strong performance in terms of accurac y , b ut the lack of information on other metrics lea v es a g ap in fully e v aluating the model’ s ef cienc y in distinguishing between at tack and non-attack scenarios. Ala shhab et al. [24], the ensemble onli ne model boasts an accurac y of 99.2%, with precision, recall, and F1-scores at 98.78%, 98.81%, and 98.78% respecti v ely . Christila and Si v akumar [25], an accurac y of 99.42% w as achie v ed. Our ENRF method surpassed all the other models, achie ving 100% A CC, PRC, RCL, and F1-scores. This v alidates that the ENRF model is highly reliable and ef fecti v e for detecting DDoS attacks, making it the strongest solution among those compared. The critical analysis sho ws that the reason behind achie ving a perfect score is that ENRF utilizes PCA for feature selection, which ef fecti v ely reduces the dimensionality of the dataset while ret aining the most signicant features. This minimizes noise and impro v es the focus of the model on rele v ant data. In addition, the po wer of the ensemble, by combining multiple RF , enhances the accurac y by aggre g ating the predictions of numerous Enhancing SDN security using ensemble-based mac hine learning ... (Abdinasir Hir si) Evaluation Warning : The document was created with Spire.PDF for Python.
1082 ISSN: 2502-4752 decision trees, thereby reducing the lik elihood of o v ertting to an y part icular dataset. Sys tematic tuning of h yperparameters, such as the number of estimators, max dept h, and criterion for splitting, ensures that the RF classiers are optimized for the best performance. In future w ork, the focus will be on ensuring that t he no v el dataset comprehensi v ely co v ers all the possible DDoS attack scenarios. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Rate (FPR) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 True Positive Rate (TPR) ROC Curve of the Model with Different Classifiers RF (AUC = 1.000) Gradient Boosting (AUC = 0.987) Bootstrap Aggregation (AUC = 0.983) GLM (AUC = 0.879) SVM (AUC = 0.953) FNN (AUC = 0.929) NB (AUC = 0.97) XGBoost (AUC = 0.93) Random guess Figure 4. R OC curv es for v arious machine learning models on a no v el DDoS detection dataset Performance Metrics for Different Datasets Novel Dataset CIC-DDoS2019 Dataset 1 Value Accuracy Precision Recall F1-Score AUC Figure 5. Performance metrics for the RF model on tw o datasets (no v el dataset and CIC-DDoS2019) T able 7. Comparison of DDoS attack detection models. * stands for not specied Model with reference Accurac y Precision Recall F1-score NFBoost [20] 0.982 - 0.992 * * * RF A ODE [21] 0.9051 * * * Boosting ensemble classier [22] 0.993 0.993 * 0.996 Ensemble K-means and RF [23] 1.0 1.0 1.0 1.0 Ensemble online [24] 0.992 0.9878 0.9881 0.9878 MEDR-DDoSAD [25] 0.9942 0.9938 0.9942 0.9940 Proposed model-ENRF 1.0 1.0 1.0 1.0 Indonesian J Elec Eng & Comp Sci, V ol. 38, No. 2, May 2025: 1073–1085 Evaluation Warning : The document was created with Spire.PDF for Python.