Inter national J our nal of P o wer Electr onics and Dri v e System (IJPEDS) V ol. 17, No. 1, March 2026, pp. 752 764 ISSN: 2088-8694, DOI: 10.11591/ijpeds.v17.i1.pp752-764 752 Machine lear ning based models f or solar ener gy Dalila Cheri 1 , Abdeldjalil Dahbi 2,3 , Mohamed Lamine Seb bane 1 , Bassem Baali 1 , Ahmed Y assine Kadri 3 , Messaouda Chaib 3 1 Institute of Electrical and Electronic Engineering, Uni v ersity of Boumerdes, Boumerdes, Algeria 2 Unit ´ e de Recherche en Ener gies Renouv elables en Milieu Saharien (URERMS), Centre de D ´ ev eloppement des Ener gies Renouv elables (CDER), Adrar , Algeria 3 Laboratory of sustainable De v elopment and computing, (L.D.D.I), Uni v ersity of Adrar , Adrar , Algeria Article Inf o Article history: Recei v ed Aug 24, 2024 Re vised Dec 25, 2025 Accepted Jan 22, 2026 K eyw ords: Machine learning Photo v oltaics Po wer forecasting Solar generation W eather conditions ABSTRA CT Photo v oltaic (PV) technology is one of the most promising forms of rene w able ener gy . Ho we v er , po wer generation from PV technologies is highly dependent on v ariable weather condi tions, which are neither constant nor controllable, which can af fect grid stability . Accurate forecasting of PV po wer production is essential to ensure reliable operation within the po wer system. The primary challenge of this study is to accurately predict photo v oltaic ener gy production, considering that weather conditions, such as irradiance, temperature, and wind speed, are random v ariables. The k e y contrib ution of this article is de v eloping a machine learni ng model to predict the ener gy production of a real PV po wer plant in Algeria. Using real measurements sourced from the Center of Rene w able Ener gy De v elopment (CDER) in Adrar , Algeria, in 2021. The data are from tw o PV po wer plants located in harsh desert climate conditions. The results presented in this study of fer a comparison of se v eral predicti v e methods applied to real-w orld data from a PV po wer plant situated in the Saharan Re gion. Our ndings re v eal that the articial neural netw ork (ANN) model yields the most accurate predictions of 94.96%, with the smallest prediction error: root mean square (RMSE) and mean absolute error (MAE) are 7.78% and 3.80%, respecti v ely . This is an open access article under the CC BY -SA license . Corresponding A uthor: Dalila Cheri Institute of Electrical and Electronic Engineering, Uni v ersity of Boumerdes Boumerdes, Algeria Email: da.cheri@uni v-boumerdes.dz 1. INTR ODUCTION Solar ener gy is one of the most promising sources for generating po wer for residential, c ommercial, and industrial applications. This is particularly true gi v en that the cost of solar modules continues to decrease, in contrast to the rising costs of ener gy generation from fossil fuels and other polluting sources. Therefore, it is becoming more practical to use rene w able ener gy resources such as solar ener gy , which can con v ert solar irradiance into electric ener gy through the photo v oltaic ef fect [1], [2]. Ener gy generated by photo v oltaic (PV) systems is directly inuenced by geographical and weather conditions such as solar irradiance, temperature, and site-specic f actors [3], [4]. Ho we v er , the v ariability of PV output po wer poses signicant challenges to the po wer grid’ s operation, including issues related to system stability , reliability , and electric po wer balance. T o f acilitate ef fecti v e decision-making and ensure grid stability , solar PV po wer forecasting has emer ged as a crucial solution to these issues. Accurate forecasting of PV po wer helps reduce the impact of output uncertainty on the grid, making the system more reliable and ef cient while maintaining po wer quality . J ournal homepage: http://ijpeds.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Po w Elec & Dri Syst ISSN: 2088-8694 753 Pre vious research has adv anced signicantly in PV po wer forecasting, with v arious approaches proposed. Antonanzas et al. [5] pro vide a comprehensi v e re vie w of photo v oltaic forecasting methods, co v ering ph ysical, statistical, and machine-learning approaches, and underline the importance of accurate forecasts for reliable grid operation. Al Amin and Hoque [6] applied ARIMA models for short-term predictions, obtaining moderate accurac y b ut f acing challenges with non-linear weather ef fects. Machine learning (ML) techniques ha v e sho wn signicant contrib utions in o v ercoming these challenges, of fering potential impro v ements in terms of accurac y and reliability compared to traditional methods. The objecti v e of this article is to de v elop machine learning techniques to generate models and mathematical relat ionships that can forecast ener gy generation, as solar photo v oltaic systems are subject to uctuations and weather dependence. Despite these adv ances, challenges remain in generalizing models across dif ferent climates and optimizing ef cienc y , which this study aims to address. Our study is based on PV po wer generation data collected o v er one year at 30-minute interv als from tw o locations in Algeria: Kabereten (Adrar) and El Hadjira (Ouar gla). Using this datas et, we de v eloped four machine learning models: linear re gression, polynomial re gression, support v ector re gression (SVR), and articial neural netw orks (ANN). W e analyze and preprocess the data to optimize the performance of the model and then compare the performance of v arious models to identify the most ef fecti v e approach for po wer prediction. The article consists of three sections: i) The rst section introduces PV systems and the v arious f act o r s that can af fect their performance, emphasizing the importance of accurate PV po wer forecasting in the ener gy industry; ii) The second section e xplores commonly used techniques for PV po wer prediction, pro viding an o v ervie w of the machine learning models used in this study , along with theoretical information and e v aluation metrics; and iii) The third section presents the datasets used in our research, describing the pre-processing and feature engineering steps tak en to ensure their suitability for analysis. This section also presents the study’ s results and ndings, follo wed by a comprehensi v e discussion of the results. 2. RELA TED W ORK Man y research ef forts ha v e focused on pro viding more accurate forecasts for solar po wer generation. ML and articial intelligence (AI) forecasting models of fer the adv antage of directly predicting PV po wer without the intermediary step of forecasting solar irradiance. This approach also pro vides e xibility in forecasting horizons. Most state-of-the-art forecasting models use ANN, re gression models, and support v ector machines (SVM). These data-dri v en techniques le v erage historical observ ations to train models, e n a bling them to compute predictions by analyzing past v alues of input v ariables [7], [8]. Cons equently , po wer output can be directly predicted based on the input v ariables used. T able 1 summarizes the current studies in the literature that are closest to the method proposed in our w ork for predicting solar po wer generation. These studies used dif ferent datasets and locations than ours, along with v arying preprocessing and algorithmic techniques. According to T able 1, pre vious research indicates that PV po wer generation primarily depends on meteorological f actors such as irradiance, temperature, wind speed, and relati v e humidity . The current o w through solar cells increases signicantly with higher irradiance, leading to a rise in po wer output [9]. Higher temperatures can reduce panel ef cienc y by decreasing po wer output as the v oltage drops with increasing temperatures [10]. Higher wind speeds can lo wer air a n d solar cell operating temperatures, enhancing the ef cienc y of a solar PV system [10]. Increases in relati v e humidity can signicantly decrease PV v oltage; lo w relati v e humidity impro v es ef cienc y , while high relati v e humidity reduces it [11]. Due to the intrinsic nature of these f actors, the output po wer is v ariable and uncertain, resulting in unstable uctuations [12]. Pre vious w ork focused primarily on the de v elopment of solar PV po wer output forecasting models using traditional statistical and ph ysical approaches, as well as machine learning techniques such as linear re gression, polynomial re gression, SVR, ANN, long short-term memory (LSTM), and con v olutional neural netw orks (CNN)-LSTM. Although these studies attempted to achie v e higher accurac y by applying v arious input parameters (e.g. temperature, solar radiation, wind speed) at multiple locations, the y were lik ely to miss the nonlinear nature of weather -dependent PV po wer generation, especially in f ast-changing en vironments. Furthermore, most studies focused on a single method or did not include a comparati v e study of multiple machine learning models under similar conditions, and little attention w as gi v en to localized case studies in re gions such as Algeria, where en vironmental conditions can directly af fect PV performance. In this study , we close these g aps by suggesting and contras ting four ML models, that is: linear re gression, polynomial re gression, SVR, and ANN, using real data for tw o areas in Algeria. This study pro vides a deeper understanding of the nature of the model under dif ferent climatic conditions and suggests better forecasting methods based on localized PV systems. Mac hine learning based models for solar ener gy (Dalila Cheri) Evaluation Warning : The document was created with Spire.PDF for Python.
754 ISSN: 2088-8694 T able 1. An o v ervie w of methods emplo yed in PV po wer prediction Authors Location Data parameters Method Accurac y Error MAE RMSE V erma et al. [13] India T emperature Linear re gression 74.4% 6% / Cloud co v er Log arithmic re gression 47.4% 15% / W ind speed Polynomial re gression 75.1% 6.1% / Humidity ANN 92% 3% / Rainf all K uriak ose et al. [14] India Solar radiance ANN 80.97% 6.53% / T emperature Linear re gression 83.21% 6.66% / W ind speed SVR 83.88% 6.74% / Relati v e humidity Ab uella and Cho wdhury [15] USA T emperature ANN 97.09% / 5.54% Cloud co v er MLR 96.98% / 5.71% Pressure Humidity W ind component Solar radiation Thermal radiation Net solar radiation Liquid w ater Ice w ater Aslam et al. [16] German y Day LSTM 86.8% 3.57% 7.07% T emperature LSTM-attention 86.44% 3.67% 7.2% W ind CNN-LSTM 85.25% 3.78% 7.38% Sk y co v er Ensemble method 87.4% 3.69% 6.85% Humidity Precipitation Uddin et al. [17] Indonesia Radiation K-NN 64.9% / / Air temperature W ind speed Sunshine (minutes) Air humidity Air pressure 3. METHODOLOGY This section focuses on the machine learning models used for PV po wer forecasting [18]–[20]. W e e xamine both linear and non-linear models, e v aluating t h e ir performance and comple xity . The models are or g anized in a hierarch y , from the simplest to the most comple x, to identify the most suitable approach for accurate and reliable PV po wer forecasting. Specically , we emplo yed four models: a) Linear re gression: Assumes a linear relationship between a v ariable of input weather parameters and dependent v ariables [21]. b) Polynomial re gression: Allo ws for modeling non-linear relationships between v ariables as nth-de gree polynomials [22]. W e ha v e tested dif ferent polynomial de grees from n = 0 to n = 10 in order to nd the optimal de gree that ts the data to a v oid o v ertting while ef fecti v ely capturing the underlying patterns in the data. c) SVR: Outputs an optimal h yperplane with at most ε de viation to perform re gression tasks, tt ing the error within a threshold [23]. SVR e xcels at modeling intricate, non-linear relationships using k ernel functions that transform the input space into higher -dimensional feature spaces. In this w ork, we studied three distinct SVR k ernel functions: a linear k ernel, a polynomial k ernel, and a radial basis function (RBF) k ernel. W e found that the linear and polynomial k ernels performed poorly compared to the RBF k ernel. As a result, we focused on testing SVR using the RBF k ernel on our tw o datasets. d) ANN: Mimics brain neurons and e xcels at learning patterns from training data to predict output v ariables. It consists of layers of interconnected nodes: an input layer , one or more hidden layers for processing, and an output layer [24]. The nodes are interconnected, with the input layer containing a number of nodes equal to the dataset’ s features and only one output node.to introduce non-linearity into the model, we use an acti v ation function allo wing it to learn comple x pa tterns. In our e xperiment, we used the linear acti v ation function to test the performance of the model and we appro v ed the rectied linear unit (ReLU) acti v ation function to capture the non-linearity in the results obtained. Int J Po w Elec & Dri Syst, V ol. 17, No. 1, March 2026: 752–764 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Po w Elec & Dri Syst ISSN: 2088-8694 755 4. EXPERIMENTS AND RESUL TS In this section, we delv e into the datasets used for predicting the po wer output of tw o s olar po wer plants, analyzing the relationships between v arious en vironmental f actors and po wer output. W e also detail the dif ferent e xperiments conducted, including parameter selection and e v aluation of predicti v e algorithms, to identify the most suitable approach for achie ving accurate and reliable PV po wer forecasting. 4.1. Data description and analysis The methodology be gins with the collection of solar ener gy data from the Rene w able Ener gies Research Unit in Saharan En vironment (URERMS), Center of Rene w able Ener gy De v elopment (CDER), co v ering tw o PV po wer plants in Algeria. The ra w data went through a cleansing process, where ne g ati v e and mi ssing v alues were processed to maintain inte grity . F ollo wing this, e xploratory data analysis w as used to identify correlations and patterns, which led the feature selection process. K e y features such as solar irradiance, temperature, wind speed, and relati v e humidity were selected based on their rele v ance to PV performance. Multiple re gression and machine learning models including linear re gression, polynomial re gression, SVR, and ANN were trained using a 70/15/15 data split for training, v alidation, and testing. Model performance w as e v aluated using metrics lik e mean absolute error (MAE), root mean square error (RMSE), and the coef cient of determination (R²). Figure 1 illustrates the o v eral l w orko w adopted in this study for solar ener gy prediction using machine learning models. This structure ensured that each model w as e v aluated on consistent and reliable data, pro viding an accurate comparison of predicted accurac y . Figure 1. W orko w for proposed solar ener gy prediction using machine learning models 4.1.1. Data collection The data used in the models and ANN were collected from a meteorological weather station installed a t the PV po wer plant site. This data w as carefully processed to ensure its suitability for the proposed application in the Algerian ener gy mark et. Gi v en that the datasets are based on real measurements from an actual PV po wer plant operat ing in desert climate conditions, the results obtained are highly rele v ant and can serv e as a v aluable reference for similar applications in other PV po wer plants within Saharan re gions. The meteorological data w as sourced from Rene w able Ener gies Research Unit in Saharan En vironment (URERMS), CDER Adrar , Adrar , Algeria for the year 2020. The dataset includes half-hourly measurements g athered by multi ple sensors connected to PV systems at tw o stations located in Ouar gla, Algeria, and Adrar , Algeria. The rst dataset is sourced from the Kaberetene photo v oltaic po wer plant in Adrar , which spans 6 hectares with a capacity of 3 MWp. Located near Ksar Kabertene, about 60 km from the wilaya of Adrar , Algeria (31° 50’ N, 78’ E), the f acility comprises three sub-elds, each with a 1 MWp capacity . It uses 93 matrices, each containing 44 panels or g anized into 2 strings of 22 panels connected in series. The second dataset comes from the El Hadjira PV po wer plant in Ouar gla, which co v ers 60 hec tares with a capacity of 30 MW . Situated near El Hadjira, about 99 km from the wilaya of Ouar gla, Algeria (32.6016° N, 5.8339° E), the plant consist s of 30 subelds, each equipped with polycrystalline silicon modules. Each subeld generates 1 MWp, housing 4004 modules or g anized into 91 strings of 44 modules each. Each module is rated at 250 W with an ef cienc y of 15%. The photo v oltaic eld array data from both plants contain time-series data collected by se v eral sensors link ed to the PV systems, measured in 2020 at 30-minute interv als from 6:00 AM to 8:00 PM. The Kaberetene dataset contains 10,364 entries, while the El Hadjira dataset contains 9,570 entries. Both datasets include 7 columns or features: total po wer (kW), TSA, R Globale (W/m²), temperature Mac hine learning based models for solar ener gy (Dalila Cheri) Evaluation Warning : The document was created with Spire.PDF for Python.
756 ISSN: 2088-8694 (°C), wind speed (m/s), humidity (%), and pressure (HP A). The dataset is highly rele v ant to the Algerian ener gy mark et, as it includes data from the southern re gion of Algeria which is kno wn for i ts strong solar irradiance, the dataset represents a v ariety of weather conditions in desert en vironments, where solar ener gy can v ary signicantly . This data is crucial for e v aluating solar ener gy potential, impro ving forecasting models, and optimizing rene w able ener gy inte gration into Algeria’ s grid, helping to reduce reliance on fossil fuels. The dataset’ s half-hourly resolution allo ws for a detailed analysis of ener gy generation which is crucial for impro ving the inte gration of solar po wer into the national grid. 4.1.2. Data exploration and pr epr ocessing W e be g an with thorough data e xploration and pre-processing to ensure data quality and suit ability for re gression modeling. This e xploratory dat a analysis (ED A) in v olv ed understanding distrib utions and relationships using statistical analyses and visualizations to dene the most appropriate forecasting models for our dataset. A scatter plot matrix and a correlation matrix were created to identify the most rele v ant input v ariables for modeling po wer output. Irradiance sho wed a strong correlation with po wer output: when irradiance is high, po wer is lik ely to be high as well. Ho we v er , at lo w irradiance le v els, there w as more signicant v ari ation in po wer v alues. This observ ation aligns with the PV cell w orking principle, suggesting that a linear re gression model w ould be appropriate for predicting po wer based on irradiance. T emperature and wind speed demonstrated moderate correlations with po wer , indicating comple x (non-linear) relationships. Relati v e humidity had a high ne g ati v e cor relation with temperature, as increased humidity can lead to precipitation and subsequently lo wer ambient temperatures. Pressure sho wed nearly zero correlation with po wer output and w as therefore e xcluded from further analysis. Based on this analysis, irradiance, temperature, wind speed, and relati v e humidity were chosen as the input v ariables due to their direct or indirect ef fects on PV cell performance and po wer output. The dataset contained ne g ati v e v alues for po wer and solar irradiance, which were measured during the night when there is no solar irradiance, and po wer is dra wn from the battery or grid. Thes e v alues were set to null to sanitize the data. Missing v alues were found in solar radiation and po wer data during the early and late hours of the day , lik ely due to sensor of fsets and in v erter f ailures. These were set to zero. F or missing v alues during mid-day periods, lik ely due to sensor or in v erter breakdo wns, those data points were e xcluded from processing to ensure accurate analysis. Finally , the dataset w as split into 70% training, 15% v alidation, and 15% testing sets, with techniques lik e cross-v alidation emplo yed to ensure rob ust model e v aluation. 4.2. Model e v aluation: perf ormance metrics Performance metrics are statistical measures used to e v aluate the ef fecti v eness of a model. The y of fer a means of e v aluating a model’ s ef cac y by contrasting its forecasts with actual outcomes. The ef fecti v eness of the method is determined by the error between the actual output po wer v alues and the predicted v alues, with the most accurate method being the one that produces the smallest error . W e analyzed and compared machine learning-based forecasting methods for PV po wer generation. The e v aluation criteria we dened include error rates, specically MAE, RMSE, and score. These metrics of fer a comprehensi v e assessment of the methods’ ef fecti v eness and applicability [25]. The adv antage of utili zing MAE loss function lies in pro viding the a v erage size of the er ror in t h e tar get v a riable’ s units, making it simple to analyze and comprehend. The RMSE is calculated as the square root of the a v erage of the squared dif ferences between the actual and predicted v alues. score indicates goodness of t, therefore measures ho w well unseen samples are lik ely to be predicted by the model, through the proportion of e xplained v ariance. M AE = P n i =1 | y i ˆ y i | n (1) R M S E = v u u t n X i =1 | y i ˆ y i | 2 n (2) R 2 = 1 P n i =1 | y i ˆ y i | 2 P n i =1 | y i ¯ y i | 2 (3) Int J Po w Elec & Dri Syst, V ol. 17, No. 1, March 2026: 752–764 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Po w Elec & Dri Syst ISSN: 2088-8694 757 Where y i represents the actual v alue, ˆ y i denotes the predicted v alue, ¯ y i is the mean of the actual v alues, and n is the total data points number . V alues closer to 1.00 indicate a better model, noting that it can be ne g ati v e (because the model can be arbitrarily w orse). 4.3. Experiments The objecti v e of our study is to predict the po wer output of a solar po wer plant based on v arious weather f actors. Re gression analysis is well-suited for this task bec ause it quantitati v ely captures the relationships between the input v ariables (irradiance, temperature, wind speed) and the output v ariable (po wer). Gi v en the li near relationship between po wer and irradiance where po wer output increases proportionally with irradiance we suggest that a linear re gression model w ould be appropriate for predicting po wer . Although temperature and wind speed ha v e a nonlinear relationship with po wer , irradiance is considered the dominant feature. Note that the models de v eloped are applied to tw o datasets. 4.3.1. Experiment 1: P o wer modeling using linear r egr ession Linear re gression w as emplo yed to model and predict the amount of solar po wer generated based on v arious weather -related features. It w as utilized to model and predict the amount of solar po wer generated based on v arious weather -related features. The process be g an with data scaling, where the input datasets: X train , X v al , and X test , along with their corresponding tar get v ectors, Y train , Y v al , and Y test , were preprocessed to ensure consistenc y in feature magnitudes. A li near re gression() w as initialized and trained on the scaled training data. F ollo wing training, the model w as used to generate predictions for the training, v alidation, and testing sets. T o e v aluate t he models performance and its ability to generalize to ne w data, se v eral metrics were calculated, including MAE, RMSE, and the coef cient of determination ( R 2 score). By modeling po wer output as a linear function of irradiance and temperature, we g ained initial understandings into the data and the relationships between v ariables. The linear function with tw o inputs w as learned as (4). P ow er = 136 . 84 + 2518 . 19 · ir r adiance 336 . 52 · temper atur e (4) W e added wind speed as a third input v ariable to our model in order to handle outliers and impro v e its predicti v e accurac y . This additional v ariable w as e xpected to impro v e the model s performance and reduce dif ferences between the actual and predicted v alues by capturing comple x interact ions af fecting po wer output. The linear function with three inputs w as learned as (5). P ow er = 122 . 66 + 2511 . 72 · ir r adiance 335 . 43 · temper atur e + 49 . 45 · w indspeed (5) The performance metrics for the linear re gression models with tw o and three inputs are summari zed in T able 2. Although the three-input model demonstrates greater accurac y and less error compared to the tw o-input model. the linear re gression model f ailed to capture the non-linear relationship of the po wer with both temperature and wind speed. Therefore to enhance the accura c y of our model, gi v en the comple xity observ ed in the relationships between the input v ariables and po wer output, we ha v e selected polynomial re gression. T able 2. Performance metrics for linear re gression models with tw o and three inputs Metrics Kaberetene El Hadjira T raining set V alidation set T esting set T raining set V alidation set T esting set RMSE (2 inputs) 8.98% 8.40% 8.61% 8.63% 7.35% 8.67% RMSE (3 inputs) 8.98% 8.40% 8.59% 8.62% 7.33% 8.66% MAE (2 inputs) 5.42% 6.02% 5.04% 5.41% 4.77% 5.76% MAE (3 inputs) 5.42% 6.01% 5.03% 5.41% 4.75% 5.76% (2 inputs) 91.95% 93.65% 92.29% 92.81% 93.99% 93.27% (3 inputs) 91.97% 93.66% 92.31% 92.82% 94.02% 93.30% 4.3.2. Experiment 2: P o wer modeling using logostic (polynomial) r egr ession Polynomial re gression’ s ability to capture non-linear relationships and interactions ef fecti v ely , thi s approach allo ws for modeling non-linear patterns as nth-de gree polynomials by incorporating higher -order and interaction terms, pro viding a more accurate t for the data. Input features were transformed into polynomial Mac hine learning based models for solar ener gy (Dalila Cheri) Evaluation Warning : The document was created with Spire.PDF for Python.
758 ISSN: 2088-8694 features using polynomial re gression(), where n is the de gree of the polynomial. The v alidation MAE w as computed for each de gree, and the best-performing de gree w as selected as the optimal model. The model w as retrained on the training data using the best de gree, and performance w as e v aluated on the test set using the follo wing metrics: MSE, MAE, and R2 score for training, v alidation, and test sets . The optimal polynomial de gree is 7 for the Kaberetene dataset and 9 for the El Hadjira dataset as sho wn in Figure 2, where we perform dif ferent polynomial de grees to identify the optimal de gree. It is observ ed that those polynomial de grees pro vided the best balance between model comple xity and prediction accurac y . The performance of the selected polynomial re gression models w as e v aluated using se v eral metrics. The results are summarized in T able 3. Based on the result obtained, it is noticed that polynomial re gression sho ws better accurac y than the linear re gression model for the tw o data sets and its e x i bility to deal with the comple xity of the temperature and wind speed. T o impro v e the accurac y we ha v e used ne w models such as support v ector re gression and see this model can capture the comple x relationship better than polynomial re gression. (a) (b) Figure 2. V alidation MAE vs polynomial de gree for (a) K eberatene dataset and (b) El Hadjira dataset T able 3. Performance metrics for polynomial re gression models with tw o and three inputs Metrics Kaberetene El Hadjira T raining set V alidation set T esting set T raining set V alidation set T esting set RMSE (2 inputs) 8.13% 7.42% 7.94% 7.67% 6.49% 7.61% RMSE (3 inputs) 8.06% 7.43% 8.35% 7.69% 6.49% 7.62% MAE (2 inputs) 3.83% 4.59% 3.59% 3.76% 3.44% 4.11% MAE (3 inputs) 3.81% 4.56% 3.77% 3.81% 3.47% 4.21% (2 inputs) 93.41% 95.04% 93.43% 94.31% 95.31% 94.81% (3 inputs) 93.52% 95.03% 92.75% 94.28% 95.32% 94.81% 4.3.3. Experiment 3: P o wer modeling using SVR SVR is an approach that handles non-linearity and comple x relationships similar to polynomial re gression. T o reduce training time, a subset of 1000 samples w as randomly selected from the scaled training set using resample() with a x ed random seed. A randomized search w as performed o v er the follo wing parameter grid: K ernel = linear’, rbf ’, Re gularization parameter C = 1, 10, 100, K ernel coef cient γ = scale’, 0.01, 0.1, Epsilon ε = 0.1, 0.5. The search tested 10 random combinations using 3-fold cross-v alidation, optimized for ne g ati v e mean squared error . In this model, the best estimator w as s elected based on the lo west a v erage v alidation error across folds. The resulting optimal parameters were the RBF as the k ernel function we established the optimal parameters to be C = 10, γ = 0.01, and ε = 0.1. Using these parameters, the follo wing table pro vides an o v ervie w of the predicti v e performance of the SVR model across dif ferent datasets. From T able 4, we ha v e observ ed a lo w v alidation accurac y compared to training accurac y , which means that there is o v ertting, basically the model has learned the training data v ery well and f ailed to capture the underlying Int J Po w Elec & Dri Syst, V ol. 17, No. 1, March 2026: 752–764 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Po w Elec & Dri Syst ISSN: 2088-8694 759 patterns ef fecti v ely , leading to poor predicti v e performance. T o address o v ertting and lo w perfor mance of support v ector re gression (SVR), we ha v e applied the articial neural netw orks (ANN) model to get better performance and accurac y . T able 4. Performance metrics for SVR models with tw o and three inputs Metrics Kaberetene El Hadjira T raining set V alidation set T esting set T raining set V alidation set T esting set RMSE (2 inputs) 9.02% 34.91% 8.44% 8.60% 33.28% 8.69% RMSE (3 inputs) 9.03% 34.91% 8.4% 8.62% 33.28% 8.73% MAE (2 inputs) 5.89% 30.93% 5.30% 5.76% 28.86% 6.22% MAE (3 inputs) 5.95% 30.93% 5.34% 5.83% 28.86% 6.31% (2 inputs) 91.89% 36.97% 92.58% 92.86% 34.02% 93.24% (3 inputs) 91.87% 39.97% 92.64% 92.82% 37.21% 93.18% 4.3.4. Experiment 4: P o wer modeling using ANNs ANNs allo w the modeling of intricate v ariables through multiple layers of neurons through neural netw ork architecture, W e aim to achie v e impro v ed predicti v e performance and generalization. It contains an input layer with 2 or 3 input v ariables, tw o hidden layers with 64 Neurons and 32 Neurons respecti v ely , and an output layer . The ReLU acti v ation function is used to capture the comple x relationships in the data. The model w as trained using the full scaled training dataset with the follo wing h yper parameters: Epochs: 100, Batch size: 32, V alidation set: A separate v alidation split w as used during training to monitor generalization. The model w as e v aluated on the training, v alidation, and test datasets using the follo wing metrics: MSE, MAE, R2, as sho wn in T able 5 summarize the performance metrics of our ANN model. These metrics pro vide a detailed o v ervie w of the model’ s accurac y and generalization capabilities across the training, testing, and v alidation sets. The performance metrics tables indicate that the ANN model s uccessfully captures comple x relationships. Its architecture allo ws learn from intricate patterns, enhancing predicti v e accurac y and reducing error across dif ferent datasets compared to other models. T able 5. Performance metrics for ANNs models with tw o and three inputs Metrics Kaberetene El Hadjira T raining set V alidation set T esting set T raining set V alidation set T esting set RMSE (2 inputs) 8.41% 8.27% 7.78% 7.89% 3.12% 8.14% RMSE (3 inputs) 8.16% 7.09% 8.22% 7.63% 6.49% 7.50% MAE (2 inputs) 4.51% 5.55% 3.80% 4.25% 6.12% 4.80% MAE (3 inputs) 3.58% 4.18% 3.69% 3.57% 3.38% 3.91% (2 inputs) 92.94% 93.85% 93.69% 93.98% 95.83% 94.08% (3 inputs) 93.36% 95.47% 92.97% 94.37% 95.31% 94.96% 4.3.5. Experiment 5: P o wer testing using differ ent modules In this study , we e xplore the performance of v arious re gression models, including linear re gression, polynomial re gression, SVR, and ANN, in predicting the po wer output of a solar po wer plant. Each model w as e v aluated according to standard performance metrics (RMSE, MAE, and R2 score) between training, testing, and v alidation sets. T o further assess the generalizability and strength of our models, we used them to predict po wer output with a ne w dataset that w as not included in the original dataset of the Kaberetene data set. This dataset contains data collected o v er four days in 2021: January 15, April 15, July 15, and October 15 with each day representing a dif ferent season of the year . Figures 3–6 presents a comparison between the actual po wer generated and the predicted po wer using re gression models de v eloped with tw o input v ariables (irradiance and temperature) and three input v ariables (irradiance, temperature, and wind speed). In the graph, the blue line represents the actual po wer generated, while the orange line represents the predicted po wer . Figure 6 highlights the output ef cienc y o v er the four days compared with the real po wer , where it is observ ed that the ANN pro vide the most accurate prediction po wer and the lo west error compared to the models presented in Figures 3–5. These results support the h ypothesis that ANN is the best suitable approach for the Algerian data. Mac hine learning based models for solar ener gy (Dalila Cheri) Evaluation Warning : The document was created with Spire.PDF for Python.
760 ISSN: 2088-8694 (a) (b) Figure 3. Actual vs predicted v alues linear re gression: (a) 2 input and (b) 3 input (a) (b) Figure 4. Actual vs predicted v alues polynomial re gression: (a) 2 input and (b) 3 input (a) (b) Figure 5. Actual vs predicted v alues SVR: (a) 2 input and (b) 3 input (a) (b) Figure 6. Actual vs predicted v alues ANNs: (a) 2 input and (b) 3 input Int J Po w Elec & Dri Syst, V ol. 17, No. 1, March 2026: 752–764 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Po w Elec & Dri Syst ISSN: 2088-8694 761 4.4. Discussion This study presents the de v elopment and performance of se v eral re gression models for predi cting solar po wer based on weather conditions (irradiance, temperature, and wind speed). Se v eral performance criteria were used in the solar po wer prediction method l iterature as prediction accurac y and error . In this conte xt, the predict ion model’ s performance w as e v aluated in terms of predi ction accurac y (R²) and prediction error (MAE, RMSE). The linear re gression models pro vided a limited performance due to t heir inability to capture non-linear relationships in the data. It is observ ed that the increase in the number of input features enhances their performance as sho wn in T able 2. Ho we v er , this impro v ement w as limited by an inability to capture non-linear relationships in the data . Polynomial re gression enhanced the model’ s ability to account for non-linear relationships (T able 3). Ho we v er , the increased comple xity of higher -de gree polynomials introduced o v ertting, reducing their generalizability . SVR pro vided a more e xible approach, handling non-linear relationships better than linear models. Despite this, the SVR model achie v ed lo w v alidation accurac y compared to the training accurac y . More specically , the v alues on the v alidation sets were much lo wer than on the training sets, meaning that the model is o v ertting the training data. This suggests that the SVR model o v ertted the trai n i ng data, b ut it f ailed to generalize well to ne w unseen data and thus performed e xtremely poorly on the v alidation and test datasets in terms of predicti v e po wer . The possible reason for f ailure is o v ertting: the lar ge dif ference between training and v alidation accurac y is an e xtremely strong sign of o v ertting. The model could ha v e memorized the training data, l earning noise, and irrele v ant patterns rather than the underlying patterns. This could be due to the e xtremely high comple xity of the SVR model, which c ou l d ha v e been too e xible for the specic details of the training data. Another reason is data comple xity: the model might be too simple to capture more comple x relationships between the v ariables in the data. Ev en emplo ying the radial basis function (RBF) k ernel, typically potent for handling non-linear relationships, the model might still be too lacking in sophistication to be able to utilize this type of non-linear relationship in this dataset. ANN might suit the dataset better . In contrast, ANN e xcelled in capturing more comple x non-linear relationships ef fecti v ely . The testing plots (T able 5) and Figure 6 sho w that the ANN model pro vides a competiti v e accurac y when compared to the other models since it has a lo wer error and the highest accurac y between the actual po wer and the predicted po wer . The results sho wed that during dif ferent weather conditions from the season, the ANN model closely approximated the real po wer v alues with minimal error making it a reliable tool for solar po wer forecasting. This study w as able to de v elop a machine learning-based model to estimate the solar po wer generated based on natura l data, such as solar radiation, temperature, and wind speed. Machine learning w as de v eloped by implementing the ANN algorithm and resulted in estimation accuracies of 93.69% and 94.96% in the tw o datasets respecti v ely . The accurac y result is comparable to other similar studies. The studies in [13] and [14] utilized dif ferent datasets from v arious locations and applied mul tiple machine learning algorithms to de v elop solar po wer forecasting models. These models achie v ed accuracies ranging from 64.9% to 97.097%. Our results demonstrate that the proposed model outperforms the e xisting approaches reported in the literature [13], [14]. Since the proposed model is ef cient in forecasting, this model will contrib ute to photo v oltaic systems to optimize ener gy generation. Additionally , it can be applied in dif ferent multi-horizon forecasting applications such as a grid or microgrid demand to reduce the use of g as ener gy and maintain the balance dif ferent multi-horizon forecasting applications such as a grid or microgrid demand, enabling reduced reliance on g as ener gy and impro v ed po wer plant balance. This model will also enhance the inte gration and reliability of photo v oltaic systems in the Algerian ener gy mark et. This is particularly signicant as the Algerian go v ernment has launched a program to implement 15,000 MWc of PV capacity by 2035. Accurate solar po wer generation forecasting using ANN models is essential for optimizing ener gy production, distrib ution, and storage. In Algeria, where solar ener gy has signicant potential b ut grid stability is a challenge, precise forecasting can impro v e grid management, reduce ener gy w aste, and pre v ent po wer outages, especially during peak demand periods. Future w ork could impro v e accurac y by adding more input features, such as humidity , to assess their impact on model performance, especially for the Algerian dataset. By inte grating an ANN-based model that learns from historical data and en vironmental f actors, Algeria can increase its reliance on rene w able ener gy while maintaining grid stability . This reduces the need for e xpensi v e fossil fuel backup plants, lo wering operational costs and of fering en vironmental benets. Mac hine learning based models for solar ener gy (Dalila Cheri) Evaluation Warning : The document was created with Spire.PDF for Python.