As we previously saw in the article about crop-management, the assessment of the management practices is multidimensional. This evaluation depends on nested spatial and temporal scales. Here, for simplification purpose, we consider only the yield as production evaluation criteria. The field is a complex system partially stochastic. We will see how we can take risk into account for yield measurement.
Yield as a Random Variable
Let us imagine that we defined a fixed series of management practices. This series indicate in details all operations to be performed in the field from soil preparation to harvest. Imagine that all operations are indicated with a fixed calendarial date. Moreover, we will assume for this fictive illustration that field characteristics at the beginning of crop season are exactly the same from one year to another. By following strictly all indication, we grow crops following strictly indications, on the same field. We always start at the same date.
In the end of each season, we observe yield, say dry maize grain weight by hectare. All years are logged. After a great number of years, we draw yield histograms. We group observation by categories: 0 to 500 kg/ha, 500 to 1000 kg/ha … After that, we count observations for each category and compute fréquences: . We get the histogram below.
Why do not we get a single value, but instead a large range of observations? Because yield depends on more or less predictable phenomenons happening during crop season. For instance, pest and disease attacks, drought severity will vary from one year to another. Very low yields (say under 1000 kg/ha) are very bad condition years. In the opposite, very high yields (say 8000 kg/ha) come from very favorable years. Those years are not often seen. In other words, we illustrate here that yield is a random variable. It is a continuous and bounded random variable because it can take any values say between 0 and 12000 kg/ha.
The graph is called graph of the statistical distribution. When we consider categories, we get an histogram as above. If categorie range become infinitely thin, i.e. points, we get the probability density function (PDF). We can see the PDF as an infinitely precise histogram. Below, we superpose the two graphs.
In the following, we will manipulate yield PDF (the continuous curve). Its a theoretical object whose histogram is the empirical approximation. Considering a yield distribution implies that values are observed with different probability (unless distribution is flat). It is natural to look for a metric taking into account risk.
Mean yield and standard deviation
First we can measure mean yield. Here the mean value if slightly less than 4 t/ha. It corresponds to large dashed line above. This measure is not informative enough because it does not reflect how much yield can vary. In the PDF graph above we can see that we have yield lower than 1 t/ha with a non negligible proportion.
Another tool to qualify the distribution is its standard deviation (SD). SD is how much, in average, we can expect a point to lay far from the mean. A common interval used in Statistics is mean +/- 1 SD. This interval is represented by dotted lines surrounding the mean vertical line. This interval is used because we now that if we have a sample from distribution, there is a high probability that it belongs to the interval. In other words, this interval describes most frequent values from the distribution. But despite it exhibits some variability, it does not give us information on worst cases.
Yield Expected Shortfall
We shall see our final yield metric. We saw that usual metric are not informative about worst cases. Let’s imagine that we are considering a Food Security case: we would like to prevent from terrible harvests. We introduce now the notion of Expected Shortfall (ES) or condition Value at Risk (cVaR). The metric popularized by finance with portfolio optimization is especially interesting.
The idea is quite simple. We look at worst values until a threshold. This threshold is called quantile, fractile or value-at-risk. It correspond to the limit yield value that separates a fraction of worst results from the other. In our example, this value corresponds to the limit yield value that separates the 10% worst observable yields from the others. Here this value is about 900 kg/ha. It means that the 10% of lowest yield values are inferior to 900 kg/ha.
Once this quantile is determined, we truncate the wall distribution after that threshold. That-is-to-say that we only keep the 10% worst observable yields. Once we have those points, we measure the average value: this is the Expected Shortfall. Here this value is about 800 kg/ha. A simple formulation of this result is “in the 10% worst observable yield outcomes, the average value is 800 kg/ha”.
We have seen that yield in some proportion depends on unexpected (climatic, biotic…). Thus, yield is a random variable. Measuring random variable mean may not be sufficient. This applies to Food Security. Mean can be high despite bad outcomes may happen with non negligible probability. An interesting metric taking into account risk is the Expected Shortfall. Expected Shortfall deals with a sub ensemble of worst cases from the whole distribution. It can be interpreted as the mean value determined values lower than a fractile.