An evaluation of variable selection methods using Southern Africa solar irradiation data
DOI:
https://doi.org/10.17159/2413-3051/2024/v35i1a16336Abstract
Dimensionality poses a challenge in developing quality predictive models. Often when modelling solar irradiance (SI), many covariates are considered. Training such data has several disadvantages. This study sought to identify the best variable embedded selection method for different location and time horizon combinations from Southern Africa solar irradiance data. It introduced new variable selection methods into solar irradiation studies, namely penalised quantile regression (PQR), regularised random forests (RRF), and quantile regression forest (QRF). Stability analysis, performance and accuracy metric evaluations were used to compare them with the common lasso, elastic and ridge regression methods. The QRF model performed best in all locations followed by the shrinkage methods on hourly data. However, it was found that QRF is not sensitive to associations through correlations, thereby ignoring the relevance of variables while focusing on importance. Among the shrinkage methods, the lasso performed best in only one location. On the 24-hour horizon, elastic net dominated the performances among the shrinkage methods, but QRF was best in three locations of the six considered. Results confirmed that variable selection methods performed differently on different situational data sets. Depending on the strengths of the methods, results were combined to identify the most paramount variables. Day, total rainfall, and wind direction were superfluous features in all situations. The study concluded that shrinkage methods are best in cases of extreme multicollinearity, while QRF is best on data sets with outliers or/and heavy tails.
Downloads
References
Alhamzawi, R. and Ali, H.T.M. 2018. The Bayesian adaptive lasso regression. Mathematical Biosciences 303: 75-82. .
Asnaghi, V., Pecorino, D., Ottaviani, E., Pedroncini, A., Bertolotto, R.M. and Chiantore, M. 2017. A novel application of an adaptable modelling approach to the management of toxic microalgal bloom events in coastal areas. Harm-ful Algae 63: 184-192.
Ayodele, T. R., Ogunjuyigbe, A. S. O., and Monyei, C. G. 2016. On the global solar radiation prediction methods. Journal of Renewable and Sustainable Energy 8: 023702-1; . http://dx.doi.org/10.1063/1.4944968.
Babar, B., Luppino, L.T., Bostrom, T. and Anfinsen, S.N. 2020. Random forest regression for improved mapping of solar irradiance at high latitudes. Solar Energy 198:81-92.
Belloni, A. and Chernozhukov, V. 2011. l1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics 39(1): 82-130; DOI: 10.1214/10-AOS827.
Brink-Jensen, K., and Ekstrom, C. T. 2021. Inference for feature selection using the Lasso with high-dimensional data. arXiv:1403.4296v1 [stat.ME]; https://doi.org/10.48550/arXiv.1403.4296.
Celeux, G., Martin-Magniette, M-L., Maugis-Rabusseau, C. and Raftery, A. E. 2015. Comparing model selection and regularization approaches to variable selection in model-based clustering. Journal de la Société française de Sta-tistique 155(2): 57–71.
Chandiwana, E., Sigauke, C., and Bere, A, 2021. Twenty-four-hour ahead probabilistic global horizontal irradiation forecasting using Gaussian process regression. Algorithms 14: 177. .
Deng, H., and Runger, G. 2012. Feature selection via regularized trees, WCCI 2012. Proceedings of the IEEE World Congress on Computational Intelligence, Brisbane, Australia, 10-15 June 2012. .
Deng, H., Guan, X., Liaw, A., Breiman, L., and Cutler, A. 2022. Package ‘RRF’, CRAN. .
Diez-Olivan, A., Averos, X., Sanz, R., Sierra, B., and Estvez, I. 2018. Quantile regression forests-based modelling and environmental indicators for decision support in broiler farming. Computers and Electronics in Agriculture 161: 141-150; https://doi.org/10.1016/j.compag.2018.03.025.
El Motaki, S., and El Fengour, A. 2021. A statistical comparison of feature selection techniques for solar energy fore-casting based on geographical data. CAMES 28(2): 105–118; DOI: 10.24423/cames.324.
Evin, G., Lafaysse, M., Taillardat, M., and Zamo, M. 2021. Calibrated ensemble forecasts of the height of new snow using quantile regression forests and ensemble model output statistics. Nonlinear. Processes in Geophysics 28: 467–480; https://doi.org/10.5194/npg-28-467-2021.
Fonti, V., and Belitser, E. 2017. Feature selection using lasso. VU Amsterdam research paper in business analytics, 30, 1-25.
Friedman, J., Hastie, T. and Tibshirani, R. 2010. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software 33(1): 1-22.
Freeman, E.A., Frescino, T.S. and Moisen, G.G. 2023. Pick your flavour of random forest. CRAN.
Gostkowski, M., and Gajowniczek, K. 2020. Weighted Quantile Regression Forests for Bimodal Distribution Model-ing: A Loss Given Default Case. Entropy 22: 545; ; DOI:10.3390/e22050545.
Gu, Y., Fan, J., Kong, L., Ma, S. and Zou, H. 2017. ADMM for High-Dimensional Sparse Penalized Quantile Regres-sion. Technometrics; DOI: 10.1080/00401706.2017.1345703 .
Hastie, T., Qian, J. and Tay, K. 2023. An Introduction to glmnet. CRAN.
Hossain, M. R., Than Oo, A. M. and Shawkat Ali, A. B. M. 2013. The Effectiveness of feature selection method in solar power prediction. Journal of Renewable Energy (2013): http://dx.doi.org/10.1155/2013/952613.
Ibrahim, I.A. and Khatib, T. 2017. A novel hybrid model for hourly global solar radiation prediction using random forests technique and firefly algorithm. Energy Conversion and Management 138 (2017): 413-425.
Khalid, S., Khalil, T. and Nasreen, S. 2014. A survey of feature selection and feature extraction techniques in machine learning. Science and Information Conference Proceeding. London, UK, August 27-29, 2014. .
Kipruto, E. and Sauerbrei, W. 2022. Comparison of variable selection procedures and investigation of the role of shrinkage in linear regression protocol of a simulation study in low-dimensional data. PLOS ONE 17(10): e0271240; https://doi.org/10.1371/journal.pone.0271240.
Koenker, R. 2018. Quantile regression in R: A vignette. CRAN.
Kursa, M. B., and Rudnicki, W. R. 2022. Package ‘Boruta’. CRAN.
Lee, j., Wang, W., Harrou, F. and Sun, Y. 2020. Reliable solar irradiance prediction using ensemble learning-based models: A comparative study. Energy conversion and management 208(2020): 112-582. .
Leng, C., Lin Y. and Wahba, G. 2006. A note on the lasso and related procedures in model selection. Statistica Sinica 16(4): 1273-1284.
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J. and Liu, H. 2017. Feature selection: A data perspec-tive. ACM Computing Surveys 50(6) Article 94: 45 pages; ttps://doi.org/10.1145/3136625.
Ludwig, N., Feuerriegel, S. and Neumann, D. 2015. Putting Big Data analytics to work: Feature selection for forecast-ing electricity prices using the LASSO and random forests. Journal of Decision Systems 24(1):19-36; http://dx.doi.org/10.1080/12460125.2015.994290.
Maxwell, K., Rajabi, M. and Esterle, J. 2021. Spatial interpolation of coal properties using geographic quantile regres-sion forest. International Journal of Coal Geology 248: 103869 .
Mehmood, T., Sæbø, S. and Liland, K. H. 2020. Comparison of variable selection methods in partial least squares regression. Journal of Chemometrics 34:e3226: Doi.org/10.1002/cem.3226.
Meinshausen, N. 2022. Package ‘quantregForest’. CRAN.
Mpfumali, P., Sigauke, C., Bere, A. and Mlaudzi, S. 2019. Day Ahead Hourly Global Horizontal Irradiance Forecast-ing-Application to South African Data. Energies 12: 1-28. .
Muller, I. M. 2021. Feature selection for energy system modelling: Identification of relevant time series information. Energy and AI 4(2021): 100057; https://doi.org/10.1016/j.egyai.2021.100057.
Munshi, A. and Moharil, R.M. 2022. Solar radiation forecasting using random forest. AIP Conference Proceedings 2424, 050003 (2022); DOI.org/10.1063/5.0076827 .
Mutavhatsindi, T., Sigauke, C. and Mbuvha, R. 2020. Forecasting Hourly Global Horizontal Solar Irradiance in South Africa, IEEE Access 8: 198873.
Omoruyi, F. A., Obubu, M., Omeje, I. L., Echebiri, U., Onyekwere, K. C., Lilian, N. O. and Hamzat K. I. 2019. Compar-ison of some variable selection techniques in regression analysis. American Journal of Biomedical Science and Research 6(4): 281-293; DOI: 10.34297/AJBSR.2019.06.001044.
Park, T. and Casella, G. 2008. The Bayesian Lasso, Journal of the American Statistical Association 103(482): 681-686.
Randa, T.M., Tinungki, G.M. and Sunusi, N. 2022. Application of lasso and lasso quantile regression in the identifica-tion of factors affecting poverty levels in Central Java. International Journal of Academic and Applied Research 6(4):350-353.
Ratshilengo, M., Sigauke, C. and Bere, A. 2021. Short-Term Solar Power Forecasting Using Genetic Algorithms: An Application Using South African Data. Applied Sciences 11: 4214 .
Sanchez-Pinto, L. N., Venable, L. R., Fahrenbach, J. and Churpek, M. M. 2018. Comparison of variable selection methods for clinical predictive modelling. International Journal of Medical Information 116: 10–17; doi:10.1016/j.ijmedinf.2018.05.006.
Su, M. and Wang, W. 2021. Elastic net penalized quantile regression model. Journal of Computational and Applied Mathematics 392 (2021): 113462. .
Vantas, K., Sidiropoulos, E. and Loukas, A. 2020. Estimating Current and Future Rainfall Erosivity in Greece Using Regional Climate Models and Spatial Quantile Regression Forests. Water 12 (2020): 687; DOI:10.3390/w12030687 .
Vaysse, K. and Lagacherie, P. 2017. Using quantile regression forest to estimate the uncertainty of digital soil mapping products. Geodema 291 (2017):55-64. .
Villegas-Mier, C. G., Rodriguez-Resendiz, J., Alvarez-Alvarado, J.M., Jimenez-Hernandez, H. and Odry, A. 2022. Op-timized Random Forest for Solar Radiation Prediction Using Sunshine Hours. Micromachines 13: 1406 .
Wang, L., Wang, Y. and Chang, Q. 2016. Feature selection methods for big data bioinformatics: A survey from the search perspective. Methods (2016); doi: http://dx.doi.org/10.1016/j.ymeth.2016.08.014.
Williams, B., Hansen, G., Baraban, A. and Santoni, A. 2015. A practical approach to variable selection comparison of various techniques. Casualty Actuarial Society E-Forum, Summer 2015.
Yilmaz, U. and Kuvat, O. 2023. Investigating the effect of feature selection methods on the success of overall equip-ment effectiveness prediction. Uludağ University Journal of The Faculty of Engineering 28(2): 437-452; DOI: 10.17482/uumfd.1296479.
Zeng, Z., Wang, Z., Gui, K., Yan, X., Gao, M., Luo, M., Geng, H., Liao, T., Li, X., An, J., Liu, H., He, C., Ning, G. and Yang, Y. 2020. Daily global solar radiation in China estimated from high density meteorological observations: A random forest model framework. Earth and Space Science 7: e2019EA001058; DOI. org/10.1029/2019EA001058.
Zhang, L. and Wen, J. 2019. A systematic feature selection procedure for short-term data-driven building energy fore-casting model development. Energy & Buildings 183: 428–442; https://doi.org/10.1016/j.enbuild.2018.11.010
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Daniel Maposa, Amon Masache, Precious Mdlongwa, Caston Sigauke
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.