![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
METADATA:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Input attributes | Input classes |
|---|---|
| FAO Soil name (SN) | cf. (FAO, 1975) and (CEC, 1985) |
| Topsoil texture class (TEXT) | 1 Coarse 2 Medium 3 Medium fine 4 Fine 5 Very Fine |
| Slope (SL) | a Level (0-8%) b Sloping (8-15%) c Moderately steep (15-25%) d Steep (> 25%) |
| Parent Material (PM) | cf. (CEC, 1985), (INRA-JRC, 1993) |
| Phase (PHASE) | cf. (CEC, 1985) |
| Land Use (U1) | cf. (INRA-JRC, 1993) |
| Elevation (ZMIN, ZMAX) | in meters |
| Surface percentage of STU within SMU (PC) | % STU/SMU |
| Regrouped accumulated mean annual temperature class (ATC) (source: JRC-MARS) | H: High (> 3000°C) M: Medium (1800-3000°C) L: Low (< 1800°C) |
Some required input attributes to the rules could (elevation, slope, land use, ¼) or must (temperature) come from external data sources. This implies the combination (overlay) of the Soil Database with the geographical database of such external attributes.
Output attributes were selected on the basis of the environmental parameters needed for the problems faced, e.g., hydrology of soil types for predicting catchment response to rainfall and standard percentage of run-off; location and sensitivity of wetlands; soil buffering capacity for predicting soil susceptibility; ecosystem and surface water deposition; vulnerability of ground -and surface- water to pollution by agrochemicals and farm waste; soil erosion potential, etc.
The output attributes selected for this work are listed in table 2. They are grouped into four classes that respectively correspond to attributes of biological, chemical, mechanical and hydrological nature. Some of them can be derived directly from the Soil Database via pedotransfer rules, others need previously derived attributes as input.
For each output attribute, we have indicated the necessary input attributes for making the estimates. We also indicate the values of the classes adopted at the output. They were fixed in a rather broad manner, in view of the low level of precision in the input attributes. The thresholds selected for class intervals are resulting from a compromise between currently established values in the Soil Science, and the possible level of precision at this scale. The adopted values may not correspond to the thresholds necessary for environmental problems. However, multiplication of the number of classes certainly would have reduced the reliability of the pedotransfer rules and thus the system would become unusable.
In our work, we limited ourselves to estimating the soil parameters necessary for environmental problems. We did not draw risk (or vulnerability) maps; such work would require the combination of soil attributes with physical (climate, relief), agronomic (agricultural exploitation structure) and industrial (type and place of polluting emissions) variables. Each case would also require a fine analysis of the problem, modelling of the processes, selection of the tolerance threshold, and validation through experimental field work. The development of pedotransfer rules is a preliminary work for such investigations; it should facilitate a general application to such studies for the whole of Europe, providing a first estimate of the soil parameters needed for environmental models.
| Output attributes | Input attributes | Output classes |
|---|---|---|
| BIOLOGICAL ATTRIBUTES | ||
| Topsoil organic carbon content (OC_TOP) (0 - 25 cm) | SN - FAO soil name TEXT -Topsoil textural class USE - Regrouped land use class ATC - Accumulated mean temp |
H(igh): > 6.0% M(edium): 2.1-6.0% L(ow): 1.1-2.0% V(ery) L(ow): < 1.0% |
| Presence of a raw peaty topsoil horizon (PEAT) | SN - FAO soil name | Y(es) N(o) |
| CHEMICAL ATTRIBUTES | ||
| Soil profile differentiation (DIFF) | SN - FAO soil name | H(igh) differentiation L(ow) differentiation O: No differentiation |
| Profile Mineralogy (MIN) | SN - FAO soil name | (C)hemical or Geochemical (M)echanical or Physical MC: Chemical and Mechanical ND: No Differentiation |
| Topsoil Mineralogy (MIN_TOP) | PM - Parental material MIN - Profile Mineralogy |
KQ: 1/1 minerals + quartz KX: 1/1 minerals + oxides & Hy MK: 2/1 and 1/1 minerals M: 2/1 and 2/1/1 non swelling m |
| Subsoil Mineralogy (MIN_SUB) | PM - Parental material MIN - Profile Mineralogy |
MS: Swelling and non s. 2/1 m S: Swelling 2/1 minerals TV: Vitric materials TO: Andic materials |
| Topsoil Cation Exchange Capacity (CEC_TOP) | DIFF - Soil profile differentiation MIN - Profile Mineralogy OC_TOP - Topsoil organic carbon content TEXT - Topsoil textural class |
L(ow): < 15 cmol(+)kg-1
soi lM(edium): 15-40 H(igh): > 40 |
| Subsoil Cation Exchange Capacity (CEC_SUB) | MIN_SUB - Subsoil mineralogy TD - Subsoil textural class |
|
| Topsoil Base saturation (BS_TOP) | SN - FAO soil name USE - Regrouped land use class |
L(ow): < 50% M(edium): 50-75% H(igh): > 75% |
| Subsoil Base saturation (BS_SUB) | SN - FAO soil name MIN_SUB - Subsoil mineralogy |
L(ow): < 50% H(igh): > 50% |
| MECHANICAL ATTRIBUTES | ||
| Depth to rock (DR) | SN - FAO soil name PM - Parent material PHASE - Phase |
S(hallow): 0-40 cm M(oderate): 40-80 cm D(eep): 80-120 cm V(ery) D(eep): > 120 cm |
| Volume of stones (VS) | PHASE - Phase PM - Parent material |
0% stones - 10% stones 15% stones - 20% stones |
| Subsoil textural class (TD) | SN - FAO soil name TEXT - Topsoil textural class DR - Depth to rock |
1 Coarse 2 Medium 3 Medium fine 4 Fine 5 Very Fine |
| Topsoil structure (STR_TOP) | USE - Regrouped land use class SN - FAO soil name |
G(ood) N(ormal) P(oor) |
| Subsoil structure (STR_SUB) | SN - FAO soil name | H(umic) or Peaty soil O : Peaty subsoil |
| Topsoil Packing Density (PD_TOP) | STR_TOP - Topsoil structure class TEXT - Topsoil textural class USE - Regrouped land use class |
L(ow): < 1.4 g/cm3 M(edium): 1.4 – 1.75 g/cm3 |
| Subsoil Packing Density (PD_SUB) | STR_SUB - Subsoil structure class TD - Subsoil textural class SN - FAO soil name |
H(igh): > 1.75 g/cm3 |
| HYDROLOGICAL ATTRIBUTES | ||
| Parent material hydrogeological type (PMH) | PM - Parent material | R, C, S, L, H, M (INRA et al., 1993) |
| Depth to a gleyed horizon (DGH) | SN - FAO soil name | S(hallow): 0-40 cm M(oderate): 40-80 cm D(eep): 80-120 cm V(ery deep): > 120 cm |
| Depth to impermeable layer (DIMP) | TEXT - Topsoil textural class PD_SUB - Subsoil packing density SN - FAO soil name |
S(hallow): < 80 cm D(eep): > 80 cm |
| Hydrological class (HG) | ATC - Accumulated mean temp. PMH - Parent material hydrogeological type SN - FAO soil name ALT - Elevation DIMP - Depth to impermeable layer |
HG1: soil with permeable substratum, remote from
groundwater: seldom wet HG2: lowland soil affected by groundwater, seasonally or permanently wet, or artificially drained HG3: soil with impermeable layers within 80 cm depth, seasonally or permanently wet HG4: soils of the uplands and mountains |
| Topsoil Available Water Capacity (AWC_TOP) | TEXT - Topsoil textural class PD_TOP - Topsoil packing density |
|
| Topsoil Easily Available Water Capacity (EAWC_TOP) | TEXT - Topsoil textural class PD_TOP - Topsoil packing density |
V(ery) H(igh): > 190 mm H(igh) : 140-189 mm |
| Subsoil Available Water Capacity (AWC_SUB) | TD - Subsoil textural class PD_SUB - Subsoil packing density DR - Depth to rock |
M(edium) : 100-139 mm L(ow): < 99 mm |
| Subsoil Easily Available Water Capacity (EAWC_SUB) | TD - Subsoil textural class PD_SUB - Subsoil packing density DR - Depth to rock |
|
This section describes the structure that was adopted for implementation of the system, and defines the retained options.
Implementation of the system takes place within the Arc/Info Geographical Information System (GIS) software package, using its macro-programming language AML (Arc Macro Language). The reasons for this choice are: 1) the database of available information (soil descriptions) is stored and managed within Arc/Info; 2) the resulting data (environmental parameters) have to be stored and managed within Arc/Info for mapping display purposes; and 3) this implementation had to be made within time and means limits that did not allow for the acquisition of - and staff training in - a specialized software.
The implementation is tailored for use within the general context of deriving new information from existing one via expert knowledge and could be used in any field of interest. But in our case, it was primarily meant to provide the European Environmental Agency with spatialized environmental indicators that could possibly be derived from the Soil Database.
All the information available in the field of interest is stored in a so-called "dataset", e.g. the Soil Typological Units (STU) dataset. The dataset is physically stored as a dataset Info file, and holds information on a number of "objects", e.g. a number of soil types such as Luvisols, Cambisols, etc. Each object is physically stored as a line or record in the dataset Info file.
The objects in the dataset have a number of characteristics called "attributes", e.g. soil types have a soil name, a texture, etc. Each attribute is physically stored as a column in the dataset Info file. Each object in the dataset has a particular "value" for each of its attributes, e.g. such soil has a soil name Luvisol, a coarse texture, etc. Each value is physically stored at the intersection of the object's record and the corresponding column in the dataset Info file.
Values generally follow a coding scheme before being physically stored in the dataset, e.g. the soil name Luvisol is encoded and stored as "Lo", coarse texture is stored as "1", etc. Some objects might not be fully described when some of their attributes are unknown, e.g. unknown texture of a soil. An unknown value for an attribute is called a "NODATA" value. As there is no pre-defined way of coding and physically storing NODATA values in Info files, each attribute coding scheme has to make provision for a NODATA value code, e.g. # means unknown texture.
Soil Science experts of the working group provide the system with pedotransfer rules. These rules, using expert knowledge, permit to derive new needed information from the existing factual information, "fact", describing an object of the dataset; e.g. the soil depth of a particular soil type can be inferred from both its known soil name and its parent material. A rule is physically stored as a rule Info file. The whole of rules composes a set of rules and is physically stored as a rules Info database.
A rule can be seen as a statement of the form:
IF <available information is ...> THEN <new information is ...> ELSE IF <available information is ...> THEN <new information is ...> ... ELSE IF <available information is ...> THEN <new information is ...>
Each line in this statement is called an "occurrence" of the rule. An occurrence is physically stored as a line or record in the rule Info file.
A rule can be seen as a statement of the form:
IF (or ELSE IF) <factual value for attribute i is w and factual value for attribute j is x ... and factual value for attribute n is y> THEN <inform the object with value z for a new attribute m>
where attributes i to n provide the factual information (values w to y of an object), and attribute m provides the new -inferred- information (with value z). Attributes providing the factual information are the "input attributes" to the rule. The attribute providing the new -inferred- information is called the "output attribute" from the rule. Input and output attributes are physically stored as columns in the rule Info file.
Example:
IF <soil name is "eutric Cambisol" and parent material is "450"> THEN <soil depth is "Medium"> ELSE IF <soil name is "eutric Cambisol" and parent material is "700"> THEN <soil depth is "Medium"> ELSE IF <soil name is "dystric Cambisol" and parent material is "500"> THEN <soil depth is "Deep">
As with the dataset, "values" are physically stored at the intersection of each record and the input and output attributes in the rule Info file.
Therefore pedotransfer rules tables are describing the link, established through expert knowledge, between input attributes from the Soil Database and output attributes. The structure of a typical table is given in Table 3. The columns on the left correspond to values taken by the input attributes; the central columns provide estimated values and their confidence level (see section 3.6); the right-hand columns contain management attributes and the references of rule occurrences (see section 3.9). The lines indicate the possible occurrences of the rule, based on the values (or combinations thereof) for the input variables in the Soil Database.
[TABLE]
Input attributes in a rule must have the same definition (name, type, size, etc.) and coding scheme as their corresponding attribute in the dataset.
An "Inference" is the action of producing a new derived information to an object according: a) to the available information it provides, and b) to the rule that is activated. It proceeds in 5 steps:
When a rule is activated on a dataset, inference will occur for each object of the dataset, one after the other. The result will be a new attribute in the dataset, one for the whole dataset, to hold the new inferred values, one for each object. An attribute of the dataset that has been previously inferred using a rule is further considered as storing available information. It can thus be used as an input attribute to other rules.
It is difficult, if not impossible, for an expert to foresee all cases that can possibly occur in a set of available data. Furthermore, in some cases many different values of a fact will lead to the same conclusion, e.g. [IF <texture is sandy or loamy or ...> THEN ...]. A "wild card" mechanism allows the expert to define occurrences of rules that will match different facts
For example:
IF <soil name is "eutric Cambisol" and parent material is "450"> THEN <soil depth is "Medium"> ELSE IF <soil name is "eutric Cambisol" and parent material is "any other parent material"> THEN <soil depth is "Deep">
The "any other" wild card will, by convention, be denoted as a star character (*).
A fact for which an exact matching occurrence can be found will receive this occurrence's output attribute value. A fact for which an exact matching occurrence cannot be found, will receive the output attribute value of the last occurrence of the rule that matches, if it can be found with the wild card convention. This assumes that an expert will construct a rule by refining its occurrences, considering the most general cases before the most particular cases.
When no matching occurrence at all can be found for a fact, no value is provided to the output attribute, thus leaving it "blank" (or "0" (zero) depending on the output attribute's type). This can lead to confusion if blank (or 0) are possible normal output values. Therefore, having a fully "wild carded" occurrence as header of a rule, will "pick up" all facts for which no valid occurrence can be found and force the output value to, say, the NODATA value.
Using these specifications, the above example will become:
IF <soil name is "any soil name" and parent material is "any parent material"> THEN <soil depth is "unknown"> ELSE IF <soil name is "eutric Cambisol" and parent material is "any parent material"> THEN <soil depth is "Deep"> ELSE IF <soil name is eutric Cambisol and parent material is "450"> THEN <soil depth is "Medium">
It has been agreed that the last occurrence examined in the rule, will be the one to retain. As the occurrences are sequentially skimmed in the order of the lines of the table, i.e. from top to bottom, the construction of rules is designed to list the occurrences from the most general to the most detailed expert evaluations. For instance, if the input variable is "FAO Soil Name", the STU noted "Bge" will accept all following occurrences: "B**", "Bg*", "*g*", etc. The order of occurrences would be "B**", "Bg*", "Bge". If the STU soil name only contains code "B", the first occurrence will be applied; if it contains detailed information of the type "Bge", the third occurrence will be applied.
Expert knowledge is subject to evolution. Furthermore, the available data, and the inferences that can be made using that information and the expert knowledge, have a certain level of reliability. It is thus necessary to have a mechanism that allows all available information (or factual values) held in the dataset, and each inferred information (or output value) held in the rule database, to be complemented with an evaluation of its reliability.
The reliability of information is called its "confidence level". Confidence levels are held by confidence level attributes, one for each attribute of the dataset, and one for the output attribute of each rule. Each object in the dataset thus has a confidence level value for each of its attributes, and each occurrence of each rule has a confidence level value for its output attribute.
Four classes are proposed, ranging from "high", via "medium" and "low" to "very low". When the definition of input attributes enables the direct evaluation of an output attribute, the level is "high". On the other hand, if it is known that a very strong variation exists in the values of an output attribute, the "low" level is retained. "Very low" is used in the case of missing input attribute values.
So as to warn the users against a too abusive use of pedotransfer rules, it was decided that the confidence level of an output value should be the minimum of the confidence levels of all the input attributes and its corresponding occurrence.
When an inference takes place, the following 4 steps complement those listed above in section 3.4:
We have seen that an attribute of the dataset that previously was inferred using a rule, can be used as an input attribute to other rules. Its confidence level will be used in the same way as for any other input attribute.
In many cases, data are missing from the dataset because there are unknown input values to some objects. Two options then are open: the first consists in not evaluating the output attribute, which then itself becomes a missing attribute. The second proposes to output the best value found using the wild card convention, but with an imposed "Very Low" confidence level.
Use of wild cards in the case of missing input data carries the risk that information is generated that has never existed. The two options proposed above make it possible to retrace for each mapping unit the origin of its estimates. Checks are especially possible through the making of maps of the output "Confidence Level".
In general the rules are drawn up for all of the mapped European territory. For making estimates, no attributes were used that might cause a strong regional bias. To avoid any drift that, locally, might become dominant, a systematic input attribute called "Region" is planned. The selected geographical level is that of the European administrative regions, called NUTS II, but the stacked coding for administrative regions (NUTS 0 = country, NUTS I and NUTS II) enables the easy writing of a rule at the scale of a country. For instance, a rule that is specific for Italy will be noted "32*" in the "Region" column.
The rules can thus be completed by specific occurrences for countries, without modification of the initial general structure. As the occurrences are skimmed in a sequential fashion, displacement is always from the most general to the most specific case.
Although not used at present this option will enable revision or refinement of any rule with the help of regional experts. Its use will require the geographic combination of soil and administrative boundaries.
Three management attributes were added to the structure of the table describing a rule. The first gives a pointer to the author(s) of each occurrence. An authors' references table is kept up to date. The second attribute defines the date of establishing the occurrence. The third attribute gives a pointer to explanatory notes, defining the reasons for selecting a certain estimate (not used up to now).
Such management attributes give insight into the origin of the proposed estimate. Moreover, in case an occurrence is updated, it is avoided that an old occurrence has to be eliminated in order to be replaced with a new one. The new one will rather be placed sequentially behind the old one. During application of a rule, the last occurrence accepted is the one retained, which will enable to keep trace of the subsequent updates effected.
The rules described above are called "expert type rules" as opposed to "class type rules". The latter are simple reclassification or recoding rules. They are used in any of the following cases:
Class type rules accept only one input attribute and produce one output attribute. The input attribute has no limitation as to its Info data type. The output attribute follows the same limitations as those applicable to expert type rules.
Class type rules do not follow the wild card convention. Wild cards may not be used there.
Class type rules do not make use of the confidence level of the input attribute if it exists in the dataset, whereas expert type rules use all available confidence levels to compute an output confidence level.
Class type rules may or not produce a confidence level attribute together with the output attribute, but expert type rules always produce a confidence level attribute.
A toolbox was developed on the basis of these specifications for the creation, deletion, editing, management, description, report and inference of rules. The tools also maintain a dictionary for the rules database, legends for input and output attributes, and a last rule edit historical file.
A tracing mechanism allows the detection of forward and backward dependencies. This means that when a rule is inferred, the tree of rules that are depending on its results can be traced forward in order to be fired in the correct sequence. Conversely backward tracing chases all the rules on the results of which one rule is depending.
Other utilities run compatibility controls between rules and the dataset, i.e. check input attributes in the rules against their corresponding in the dataset. This includes historical compatibility, i.e. date of last inference must be checked against date of last edit of a rule.
Plotting tools make use of the dictionary of the database, its legends, its controls for historical compatibility, and of the rules' output confidence levels. It also provides a mechanism for the proper generalization of the attributes describing STUs -which is the level of the Soil Database at which the rules are run- to the SMUs ?which is the level that can be plotted on a map- (see 2.1). Therefore, each map of the results of a rule inference represents the dominant value of the output attribute over the polygons and can be provided together with both its corresponding confidence level and purity maps.
Finally a "WHY" tool is provided to allow the user to interactively point to a location on the map and ask why a rule has provided such a result. It will then give a full explanation of the inference that lead to the result.
These tools are provided as a command line language. They should be considered as a prototype that could be fully implemented at a future stage using an appropriate expert system development software and an ad?hoc interface to Arc/Info.
The Soil Geographical Database of Eurasia represents a knowledge potential that is based on many years of map-data collection and compilation in Europe. Such data have already been used in applications related to agriculture and environment, thus showing the interest and importance of this knowledge as well as its limits. The main limitation is the difficulty in obtaining accurate data on soil parameters needed for environmental studies, when based only on synthetic attributes such as the soil name according to the FAO classification used.
The objective of our work is to propose an automatic interpretation of the data present in the Soil Database, leading to estimates for environmental use that are as reliable as possible. This means that it is necessary to formalize the interpretations made empirically by a well-versed reader when faced with a soil map.
This is done by means of so-called pedotransfer rules that link the standard soil characteristics to more complex properties, such as hydrodynamic properties. The rules appear in a standardized format which facilitates their use and management. They are created by expert judgement based on a general knowledge in Soil Science and can be associated to a region.
The results provided by the application of these rules are only qualitative estimates. At the 1:1,000,000 scale it is difficult to provide accurate information from the few data contained in the Soil Database, and care is taken to point out the methodological limitations of our approach. This is done by attaching a confidence level to each output value, which can highlight those areas for which the results are not so reliable. Moreover rules are applied to Soil Typological Units (STU) and when their results have to be displayed as maps, purity of the Soil Mapping Units (SMU) has to be accounted for. This can be computed from an indicator of the surface percentage of STUs within each SMU which is provided in the database.
The improved version of the Soil Database -version 3- is now available. It provides some means of validation of the methodology because some of the new attributes in version 3 where instead derived by pedotransfer rules from version 2 of the database at the time of development of the system. Soil profile databases and larger scale regional soil geographical databases are other means by which tuning or validation of the rules can be done.
We thank all the members of the working group for their contributions: J. Hollis and R.J.A. Jones (SSLRC, UK), M. Jamagne (INRA, F), A. Thomasson (LRP, UK), L. Vanmechelen and E. Van Ranst (Ghent University, B). This work was supported by the CORINE project (DGXI): M.H. Cornaert and A. Teller of the CEC in Brussels.
BATJES, N.H., 1990 - Macro-scale land evaluation using the 1:1 M world soils and terrain digital database. Working paper and preprint 90/9. ISRIC. Wageningen, Netherlands, 35p.
BOUMA, J. and VAN LANEN H.A.J., 1986 - Transfer functions and threshold values : from soil characterics to land qualities. In Proceedings of the international workshop on quantified land evaluation procedures. 27/04-2/05/1986, Washington D.C., USA, p. 106-110.
CEC, 1985 - Soil map of the European Communities at 1:1,000,000. CEC-DGVI. Brussels, Belgium, 124p.
FAO, 1975 - Soil map of the world at 1:5,000,000. Volume I. Europe. UNESCO, Paris, France, 62p.
INRA, 1991 - Elaboration d'une base de données géographiques des pédopaysages des Communautés Européennes. Final report of the CEC contract 3934-90-03 ED ISP F. Orleans, France, 63p.
INRA-JRC, 1993 - Users' guide for the elaboration of the EC soil database 3.1 version. INRA Orléans, France, 10p.
INRA, University of Ghent, SSLRC and LRP, 1993 - A geographical knowledge database on soil properties for environmental studies based on 1:1,000,000 scale EC soil map data. Final report of EC contract N° 3392004. CEC-DGXI, Orléans, France, 49 p + annexes.
KING, D., LE BISSONNAIS Y. and HARDY R., 1993 - Regional assessment of runoff and erosion risks. Example of the North/Pas de Calais region in France. In : Farm lands erosion in temperate plains environments and hills. Wichereck (Ed), Elsevier, 191-205.
KING, D., DAROUSSIN J., TAVERNIER R., 1994 - Development of a Soil Geographical Database from the Soil Map of the European Communities, CATENA, 21, p 37 - 56.
PLATOU, S.W., NOOR A.M. and MADSEN H.B., 1989 - Digitizing of the EC soil map. In: Computerization of land use data. R.J.A. Jones and B. Biagi (eds). CEC, Brussels, Belgium, 12-24.
JONES, R.J.A. and HOLLIS J.M. (1996). Pedotransfer rules for environmental interpretations of the EU Soil Map. In: Le Bas and M. Jamagne Soil databases to support sustainable development. EUR 16371 EN, p.125-133. Office for Official Publications of the European Communities, Luxembourg.
VAN RANST, E., THOMASSON, A.J., DAROUSSIN, J., HOLLIS, J.M., JONES, R.J.A., JAMAGNE, M., KING, D. and VANMECHELEN, L. (1995). Elaboration of an extended knowledge database to interpret the 1:1,000,000 EU Soil Map for environmental purposes. In: European Land Information Systems for Agro-environmental Monitoring. D. King, R.J.A. Jones and A.J. Thomasson (eds.). EUR 16232 EN, p.71-84. Office for Official Publications of the European Communities, Luxembourg.