Arsenic in topsoils

Arsenic (As) is a versatile heavy metalloid trace element extensively used in industrial applications. As is carcinogen, poses health risks through both inhalation and ingestion, and is associated with an increased risk of liver, kidney, lung, and bladder tumors. In the agricultural context, the repeated application of arsenical products leads to elevated soil concentrations, which are also affected by environmental and management variables. Since exposure to As poses risks, effective assessment tools to support environmental and health policies are needed.

The LUCAS 2009/2012 database contains 21,682 samples, of which 329 do not have As data available. In the remaining 21,353 observations, 9,784 (i.e., 45.82 %) are below the LOQ of 2.84 mg kg−1. Such a censored nature of these As observations has several implications to the exploratory analysis and modeling procedures. For instance, the commonly used distribution moments, such as the mean and variance, can not be calculated to characterize the data, and quantiles have to be used as an alternative. In that case, the only quantiles that can be obtained are those that exceed the fraction of observations below the detection limit for a given subset of the data. To deal with such restriction, the exploratory analysis in this work consisted of reporting the empirical cumulative function for the As concentration.

Another implication of having a high proportion of censored As data is that the adoption of common simplifications found in the literature for similar cases, which include removing censored observations or replacing them with a fixed value within the interval they represent. To address these issues, the proposed GAMLSS-RF model couples a RF model (Breiman, 2001) to the semiparametric regression GAMLSS framework (Stasinopoulos et al., 2018, Rigby and Stasinopoulos, 2005). In GAMLSS, the response variable can be assumed to have any parametric distribution, and all distribution parameters (i.e., location [e.g., mean], scale, and shape) can vary according to parametric or nonparametric functions of the explanatory variables. Because GAMLSS do not have the same distributional limitations as other statistical frameworks, e.g., Linear Models or Generalized Linear Models, standard distributions can be properly modified to capture relevant properties of the data, such as skewness, heavy tails, bimodality, truncation, and (left-, right- or interval-)censoring. Parameter estimation in GAMLSS is achieved through iterative procedures to maximize the (penalized) log-likelihood. These procedures contain a backfitting component, which allows the incorporation of several nonparametric techniques, such as neural networks, Multivariate Adaptive Regression Splines, and RFs.

 

 

Figure (below) shows the median As concentrations calculated with the fitted LUCAS model at the 250 m spatial resolution for Europe. The values range from 1.1 to 64.6 mg kg−1, with arithmetic and geometric means of 4.1 and 3.5 mg kg−1. The average value per country  shows that Latvia, Estonia, Lithuania, Finland and Poland present the lowest averages, equal to 2.03, 2.10, 2.30, 2.41 and 2.68 mg kg−1, respectively. Among the countries with the highest values, Luxembourg, Portugal, Slovenia, France and Austria present averages of 9.00, 9.00, 9.21, 9.71 and 9.74 mg kg−1, respectively. The country-averaged As concentration points to the existence of three groups of countries: with lower (< 4 mg kg−1), medium (4 - 7 mg kg−1), and higher (> 7 mg kg−1) As concentrations. The group of low values is geographically clustered, with the spatial distribution of Fig. 6 displaying a clear difference between the As concentrations in Northern Europe and the other regions. While the results of the comparison against the background concentration (Fig. 9) indicate that most of the As found may come from human contamination, the comparison against exceedance probabilities indicate that most of Europe has a relatively small risk of exceeding 45 mg kg−1.

 

Reference: Fendrich, A.N., Van Eynde, E., Stasinopoulos, D.M., Rigby, R.A., Mezquita, F.Y., Panagos, P., 2024. Modeling arsenic in European topsoils with a coupled semiparametric (GAMLSS-RF) model for censored data. Environment International, 185: 108544. https://doi.org/10.1016/j.envint.2024.108544

Donwload the data of Aresenic distribution in EU (plus probablity maps)

 

Go Back To