Crop yield prediction in Ethiopia based on machine learning under future climate scenarios
-
-
Abstract
Crop yield and agricultural development are the foundation of human survival. In Ethiopia, where agriculture is the economic backbone, food supply and security are crucial for national security and people’s livelihoods. Crop yield is greatly influenced by climatic conditions, but the coupling relationship between them has not been clearly explained, which poses difficulties for quantitatively analyzing crop yields under climate change. The development of machine learning techniques provides a method for predicting changes in such complex systems. This study predicts the changes in the yield of five major staple crops in Ethiopia from 2021 to 2050 by using machine learning methods combined with climate predictions from Global Climate Models (GCMs) under different future scenarios in the Sixth Coupled Model Intercomparison Project (CMIP6). Data on 9 climate variables from 37 GCMs under four scenarios (i.e., historical, SSP1-2.6, SSP2-4.5 and SSP5-8.5) in CMIP6 were obtained. A Taylor diagram was used to select the best-performing GCMs and calculate their weighted averages. These averages were combined with five soil indicators to form an independent variable database. After removing highly correlated variables using Spearman’s correlation coefficient, machine learning models were trained using 10 yield data variables of teff, maize, wheat, barley and sorghum for two major growing seasons in Ethiopia from 1995 to 2020 as dependent variables. This paper employed histogram gradient boosting (HGB), extreme gradient boosting random forest (XGBRF), light gradient boosting machine (LGBM), random forest (RF), extra trees (ET) and K-neighbors as machine learning models. After model evaluation, the top-performing three models were stacked using linear regression. The independent variables were input into the final model to predict the yields of the 5 main staple crops in Ethiopia from 2021 to 2050. The results were analyzed, and the following conclusions were drawn. 1) CMCC-CM2-SR5, MPI-ESM1-2-LR, EC-Earth3-Veg-LR, EC-Earth3-Veg and MPI-ESM1-2-HR obtained higher overall scores in the Taylor diagram analysis, indicating better simulation of climate in Ethiopia compared to other GCMs. 2) The coefficient of determination (R2), mean absolute error (MAE), and explained variance score (EVS) of the XGBRF, RF and ET were higher than those of HGB, LGBM and K-neighbors. The stacking method of ensemble learning improved the performance of the ensemble model over individual models. 3) Over the next 30 years, the changes in crop yield during the Meher season (the longer growing season in Ethiopia, which is generally from April to December) were mainly within 2 t·hm−2. In the Belg season (the shorter growing season in Ethiopia, which is generally from February to September), there was a greater decrease in yield under SSP126 scenario, while the other two scenarios showed an increase, possibly due to the mitigation of greenhouse effects reducing the fertilization effect of CO2. 4) With intensification of social conflicts and environmental degradation caused by human activities, there is a growing need in the research area to change the agricultural structure and redistribute productivity, and this leads to the transfer of agricultural productivity to new suitable areas. Under SSP126 and SSP585 scenarios, the research area will achieve higher crop productivity due to the alleviation of drought conditions and the exacerbation of greenhouse effects, respectively. Results of this study demonstrate the changes in crop yield in the research area under different future climate change scenarios, providing references for determining agricultural production potential and formulating agricultural policies in the research area.
-
-