Most numerical computations proceed by following strictly deterministic algorithms. That is, if such a program were to be executed more than once on the same task, there would be no difference in the steps taken and, therefore, the same result would emerge. This may appear to be consistent with the character of practical computation. In fact, unless inherently statistical processes are being studied, the incorporation of randomness may, at first sight, seem undesirable. Our intuition can fail us, however, when we are in unfamiliar territory.
It turns out that there is a class of effective algorithms that solve non-statistical problems by employing randomness in some way in the computation. These come under the general heading of "Monte Carlo algorithms". One reason for introducing randomness in an algorithm relates to the process of sampling in a high-dimensional space a process that is at the heart of the task of global optimization. It is important to appreciate that our intuition is often a poor guide in more than two or (at a stretch) three dimensions; twenty or thirty dimensions is surely foreign territory.
As a simple example, consider a spherical region of volume V in an n-dimensional space. (You may prefer to think of a rectangular prism, since the same result applies.) If this region is contracted by a constant factor in each dimension, say f where 0 < f < 1, the result is a sphere of volume ( f n)V. If the contraction is toward the center, it follows that the volume of the sub-region of the original sphere that is radially within a fraction 1 f of the surface is just V ( f n)V and this represents a fraction 1 ( f n) of the original volume. This means that, in seven dimensions, more than half of the volume lies within 10% of the radial distance from the surface to the center. Further, in a 20-dimensional space, almost 90% of the volume lies within 10% of the radial distance from the surface. When first encountered, this is a striking result.
It is similarly surprising to consider the fraction of the volume of an n-dimensional cube that lies inside the largest sphere that fits within that cube. If the cube has side length 2R, and therefore volume (2R)n, the sphere has radius R and its volume is also proportional to Rn where the constant of proportionality turns out to be given by {p(n/2) / G[(n+2)/2]}. For between one and ten dimensions, the fraction of the cube that is occupied by the sphere, call it S, is tabulated in Table 1. (The results for n less than four are familiar from elementary geometry.) Again it is striking that in just 10 dimensions all but ¼ % of the cube's volume lies in the corners. For n=20, the fraction drops below 25 parts per billion! (This is not entirely due to the "all skin and no heart" effect described above, but in n-dimensions a cube has 2 n corners so a ten-dimensional cube has over a thousand corners!) As discussed below, these are significant observations.
n |
S |
n |
S |
1 |
1 | 6 |
p3/384 » 0.0807 |
2 |
p/4 » 0.7854 | 7 |
p3/840 » 0.0369 |
3 |
p/6 » 0.5236 | 8 |
p4/6144 » 0.0159 |
4 |
p2/32 » 0.3084 | 9 |
p4/15120 » 0.0064 |
5 |
p2/60 » 0.1645 | 10 |
p5/22880 » 0.0025 |
Table 1. Sphere-to-enclosing-cube volume ratio in n dimensions.
Now consider a sampling process for measuring the volume of a sphere in n dimensions. The first idea proposed here is simple: consider sampling on a regular grid of spacing D in each dimension throughout the cube that encloses the sphere. For each point that lies in the sphere, a contribution of Dn is made to the estimate of the total spherical volume. For two, four, and six dimensions, the relative error in the resulting estimate of the fraction of the volume that is occupied by the sphere is plotted in Figure 1 as a function of the number of points on the grid. Clearly, for a million sample points in two dimensions, the grid has a thousand points along a side whereas in six dimensions there are only ten points on a side. It is therefore not too surprising that the error here is 0.001% for n=2 versus 5% for n=6.

Figure 1. The jagged lines correspond to the error found by sampling on a regular grid while the straight lines give the associated Monte Carlo results. The solid, dashed, and dotted lines are for the cases n=2, 4, and 6, respectively. |
Alternatively, the sample points can be chosen at random and uniformly distributed over the cube. The points are now picked simply by choosing the value of each coordinate to be distributed independently and uniformly over a fixed interval. With this, it is possible to parallel the earlier approach and estimate the sphere's volume by making a contribution of C/M for every point that falls within the sphere where C represents the volume of the cube and M is the total number of samples. This amounts to estimating the fraction of the cube that is occupied by the sphere by simply calculating the fraction of the random points that fall within it. If this fraction is written as p, it follows from elementary statistics that pM points can be expected to fall within the sphere and the statistical fluctuation in this number is just sqrt(Mp(1p)) Notice that the dimensionality of the space enters here only through the fraction p and that, regardless of n, the statistical error in the resulting estimate of p falls off as the inverse of sqrt(M) for large M. For comparison, the error in the associated volume estimate is also plotted in Figure 1.
In two dimensions, the regular grid gives better accuracy and the relative gain increases as more sample points are employed. For a four-dimensional space, there is little difference in the accuracy of the results from the two schemes. By contrast, in just six dimensions, the tables have now turned: the Monte Carlo results are consistently better and the relative difference increases with the number of sample points. This trend is more pronounced in higher dimensions. For example, in ten dimensions the Monte Carlo result can be expected to be 500 times more accurate than the result for a regular grid with ten points on a side.
These simple results give helpful insights that relate to global optimization. First, a tight specification of the region to be searched is crucial since a small change in the region of interest for each variable can significantly change the extent of the volume to be explored skin deep is all there is. Second, a carefully structured step-generation component of a global search (whether it is random or otherwise) is vital since, for example, a small change in the scale or shape of the step distribution can make a startling difference to performance in high-dimensional spaces. This is an important component of ASA. The final point is that it is possible for randomness to bring gains in efficiency and, in the case of simulated annealing, it also admits an appealing simplicity. That is, randomness need not connote a scatter gun shooting in the dark; it is simplicity and effectiveness that give Monte Carlo its appeal.