Abstract:
Understanding the soil content of iron oxide is of great significance in revealing the soil environment and guiding crop production. Hyperspectral is widely used for inversion of soil physical and chemical properties owing to its high spectral resolution and strong wavelength continuity. However, hyperspectral data have the disadvantage of wavelength redundancy; therefore, the selection of characteristic wavelengths is a necessary step for soil hyperspectral modeling and analysis. The traditional method for selecting the characteristic wavelength of soil iron oxide is single, and only correlation coefficient (CC) method is used to select characteristic wavelengths. Hence, there are too many input variables in the model, which reduces its prediction accuracy. The purpose of this study was to explore the predictive ability of different models constructed by using correlation analysis combined with various characteristic wavelength selection algorithms for soil iron oxide content and compare the accuracy of the models constructed by using the correlation coefficient method. A total of 135 soil samples were collected from the surface of the study area in the southern margin of Lufeng Dinosaur Valley in Yunnan Province. The spectral reflectance and iron oxide content of the samples were measured in the laboratory. The Kennard-Stone algorithm was used to divide the soil samples into a calibration set of 95 samples and validation set of 45 samples. The soil spectral curve was smoothed by Savizky-Golay as the original spectral reflectance (OR), and the original spectrum was transformed using first-order differential reflectance (FD) and reciprocal logarithm reflectance (RL). Based on the correlation analysis of iron oxide content with original spectrum and its transformed spectrum, the iteratively retaining informative variables (IRIV), competitive adaptive reweighted sampling (CARS), and successive projection algorithm (SPA) were used to extract the characteristic wavelength. With the extracted characteristic wavelength as the independent variable and the iron oxide content as the dependent variable, inversion models were constructed by using random forest regression (RF) and partial least squares regression (PLSR). The prediction accuracy of these models was evaluated by comparing the determination coefficient (
R2), root mean square error (RMSE), and the ratio of performance to interquartile distance (RPIQ). The results showed that the further use of IRIV, CARS, and SPA algorithms to extract the characteristic wavelength after correlation analysis could effectively reduce the number of modeling variables. For the four characteristic wavelength selection methods, it is necessary to fit with different spectral transformation and modeling methods for improving the prediction accuracy. Comprehensive comparison of all the characteristic wavelength selection methods showed that the PLSR model built using the RL-CC-CARS method achieved the best performance. The
R2, RMSE, and RPIQ the validation set were 0.826, 5.600 g∙kg
−1, and 3.618, respectively. Based on the 1∶1 scatter plot, the measured soil iron oxide contents and the predicted ones of RL-CC-CARS-PLSR were close to the 1∶1 line and had good prediction results. Appropriate variable selection can improve the performance of the model, simplify the regression model, and improve the accuracy of iron oxide estimation. This study provides a reference for the inversion of soil iron oxide content using hyperspectral data.