Saturday, April 28, 2012

25. Laplace's Demon

Reductionism is the philosophy that the explanation of all phenomena can be reduced to their simplest components. The basic premise is that everything can be ultimately explained in terms of the bottom-level laws of physics. The spirit behind this approach is that the universe is governed by natural laws which are fixed and comprehensible at the fundamental level.

Reductionism has excellent validity where it is applicable, namely for simple or simplifiable (rather than complex) systems. Reductionistic science has had remarkable successes in predicting, for example, the occurrence of solar eclipses with a very high degree of precision in both space and time.

An approach related to reductionism is constructionism, which says that we can start from the laws of physics and predict all that we see in the universe. However, both reductionism and constructionism assume the availability of data of infinite precision and accuracy, as well as unlimited time and computing power at our disposal. Moreover, random events at critical junctures in the evolution of complex systems can make it impossible for us to make meaningful predictions always.

The philosophy of scientific determinism flourished before the advent of quantum physics, and was first articulated in print by Pierre-Simon Laplace in 1814. Building on the work of Gottfried Leibniz, he said:
We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.
Imagine such a superintelligent and superhuman creature (the 'Laplace demon'), who knows at one instant of time the position and momentum of every particle in the universe, as also the forces acting on the particles, and all the initial conditions. Assuming the availability of a good enough supercomputer, is it possible for the Laplace demon to predict the future in every detail? The answer may be 'yes' (for some classical systems) if unlimited computational power and time are available. But in reality, there are limits on the speed of computation, as well as on the extent of computation one can do. These limits are set by the laws of physics, and by the limited nature of the resources available in the universe. Here are some of the reasons for this:
  • The bit is the basic unit of information, and the bit-flip the basic operation of information processing. It costs energy to process information. Energy and time uncertainties are related through the Heisenberg principle of quantum mechanics. This principle puts a lower limit on the time needed for processing a given amount of energy and information.
  • The finite speed of light puts an upper limit on the speed at which information can be exchanged among the constituents of a processor.
  • A third limit is imposed by entropy which, being 'missing information' (cf. Part 22),  is the opposite of available information: One cannot store more bits of information in a system than permitted by its entropy.
  • Most natural phenomena or interactions are nonlinear, rather than linear. This can make the dynamics of even deterministic systems unpredictable.
  • In the time-evolution of many systems, there are events which are necessarily random, and therefore cannot be predicted, except in probabilistic terms.
Thus there are limits on available computational power. Predictions based on the known laws of physics, but requiring larger computational power than the limits stated above are not possible. In any case, predictions (which are always based on data of finite precision) cannot have unlimited precision, not even for otherwise deterministic situations.

The implication is that, beyond a certain level of 'computational complexity', new, unexpected, organizing principles may arise: If the known fundamental physical laws cannot completely determine the future states of a complex system, then higher-level laws of emergence may come into operation.

A striking example of this type of 'strong emergence' is the origin and evolution of life. Contrary to what David Chalmers says, it is a computationally intractable problem. Therefore new, higher-level laws, different from the bottom-level laws of physics and chemistry, may have played a role in giving to the genes and the proteins the functionality they possess.

Complex systems usually have a hierarchical structure. The new principles and features observed at a given level of complexity may sometimes be deducible from those operating at the previous lower level of complexity (local constructionism). Similarly, starting from a given observed level of complexity, one can sometimes work backwards and infer the previous lower level of complexity (local reductionism). Such reductionism and constructionism has only limited, local, ranges of applicability for complex systems. 'Chaotic systems' provide a particularly striking example of this. The Laplace demon cannot predict, on a long-term basis, for example the weather of a chosen region, nor can he start from the observed weather pattern at a given instant of time and work out the positions and momenta of all the molecules.

A common thread running through the behaviour of all complex systems is the breakdown of the principle of linear superposition: Because of the nonlinearities involved, a linear superposition of two solutions of an equation describing a complex system is not necessarily a solution. This fact lies at the heart of the failure of the reductionistic approach when it comes to understanding complex systems.


Saturday, April 21, 2012

24. Attractors in Phase Space

The concept of 'phase space' is a very powerful way of depicting the time-evolution of dynamical systems. Imagine a system of N particles. At any instant of time, any particle is at a particular point in space, so we can specify its location in terms of three coordinates, say (x, y, z). At that time the particle also has some momentum. The momentum, being a vector, can be specified in terms of its three components, say (px, py, pz). Thus six parameters (x, y, z, px, py, pz) are needed to specify the position and momentum of a particle at any instant of time. Therefore, for N particles, we need to specify 6N parameters for a complete description of the system. For real systems like molecules in a gas, the number N can be very large, being typically of the order of the Avogadro number (~1023).

So this is a very messy, in fact impossible, way of depicting such a system graphically. The concept of phase space solves this problem. Imagine a 6-dimensional 'hyperspace' in which three of the axes are for specifying the position coordinates of a particle, and the other three are for specifying the momentum components of the same particle. In this space the position and momentum of a particle at any instant of time can be represented by a single point. Similarly, for representing simultaneously the configurations of N particles, we can imagine a 6N-dimensional hyperspace (called phase space or state space). A point in this space represents the state of the entire system of N particles at an instant of time. As time progresses, this 'representative point' traces a trajectory, called the phase-space trajectory. Such a trajectory records the time-evolution of the dynamical system (in classical mechanics).

The figure below illustrates this. In it I have introduced the simplification that, for depiction purposes, all the position coordinates (3N in number) are given the generic symbol q, and only one axis is drawn to denote all the 3N axes. Similarly, all the 3N momentum components are given a representative symbol p, and only one axis is taken to represent all of them. In reality there are a total of 6N axes, 3N for the position components, and 3N for the momentum components.

Some variations of the concept of such an imaginary phase space or state space are: representation space; search space; configuration space; solution space; etc. The basic idea is the same. One imagines an appropriate number of axes, one for each 'degree of freedom'.

Next, let us consider a simple pendulum (a vertical string fixed at the top, and having a weight attached to its lower end). Suppose I pull the weight horizontally by a small distance x0 along the x-axis, and then release it. The weight starts performing an oscillatory motion around the point x = 0. At the moment I released the weight it was at rest, so its momentum was zero, and it had only potential energy. On releasing it the potential energy starts decreasing as the weight moves towards the point x = 0, and its momentum starts increasing. This goes on till the point x = 0 is reached. At this moment the potential energy is zero (it got fully converted to kinetic energy corresponding to the momentum -px).

Because of this momentum, the weight now overshoots the point x = 0 and moves in the opposite direction. When it has moved a distance -x0 it stops, having spent all its kinetic energy for acquiring an equivalent amount of potential energy.

Then it starts moving towards the point x = 0. At this point it has acquired the maximum (but oppositely directed) momentum px. And so on.

What is the phase-space trajectory for this system? It is a circle in a plane defined by the x-axis and the px-axis (Figure (a) below). The weight successively and repeatedly passes through a whole continuum of points in phase space, including the points (-x, 0), (0, px), (x, 0), (0, -px).

If there is no dissipation of energy, the phase-space trajectory in this experiment is a closed loop because the particle repeatedly passes through all the allowed (i.e. energy-conserving) position-momentum combinations again and again.

But in reality, dissipative forces like friction are always present, and in due course all the energy I expended in displacing the weight from its initial equilibrium position will be dissipated as heat. As the total energy decreases, the maximum value of the x-coordinate during the trajectory cycle, as also the maximum value of px, would decrease, implying that the area enclosed by the trajectory in phase space will progressively decrease, till the particle finally comes to a state of rest or zero momentum.

This final configuration corresponds to an ATTRACTOR in phase space: It is as if the dissipative dynamics of the system is 'attracted' by the point (0, 0, 0, 0, 0, 0) as the energy gets dissipated. Thus, because of the gradual dissipation of energy, the phase-space trajectory spirals towards a state of zero area (Figure (b) above).

This is like a particle set rolling in a bowl, spiralling towards the bottom of the bowl; the bowl thus acts as a basin of attraction. The phase-space region around the attractor (0, 0, 0, 0, 0, 0) is the basin of attraction for the oscillator problem I have considered here.

In the above experiment if I move the weight only by a small amount, the restorative force is linearly proportional to the displacement. If we plot this force fx as a function of x, we get a straight line (which is a linear curve).

But if the displacement is too large, the restorative force is not linearly proportional to the displacement x, and we are then dealing with a NONLINEAR DYNAMICAL SYSTEM. All complex systems are governed by nonlinear dynamics, and this makes their detailed analytical investigation very difficult, if not impossible.

Saturday, April 14, 2012

23. Natural Phenomena are just Computations

How much information (in terms of number of bits) is needed for specifying the set of all positive integers: 0, 1, 2, 3, . . . ? The sequence runs all the way to infinity, so does it have an infinite information content? Something is wrong with that assertion. We can see that we can generate the entire sequence by starting from 0, and adding 1 to get the next member of the set, and then obtain the next member by adding 1 again, and so on. So, because of the order or structure in this sequence of numbers, an algorithm can be set up for generating the entire set of numbers. And the number of bits needed to write the corresponding computer program is rather small.

The number of bits needed to write the computer program for generating a given set of numbers or data is called the 'algorithmic information content' (AIC) of that set of data. Algorithmic Information Theory (AIT) is the modern discipline which is a great improvement over classical information theory. But such ideas about COMPRESSION OF INFORMATION have a long history.

Leibniz (1675) was amongst the earliest known investigators of compressibility (or otherwise) of information. He argued that a worthwhile algorithm or theory of anything has to be ‘simpler than’ the data it explains. Otherwise, either the theory is useless, or the data are ‘lawless’. The idea of ‘simpler than’ is best expressed in terms of AIC defined above.

The information in a set of data can be compressed into an algorithm only if there is something nonrandom or ordered about the data. There must be some structure or regularity, and we must be able to recognize that regularity or 'rule' or 'law'. Then only can we construct the algorithm that generates the entire set of data.

In fact, this is how we discover and formulate the laws of Nature. And the statements of the laws are nothing but a case of compression of information, using a smaller number of bits than the number of bits needed for describing an entire set of observations about Nature.

Consider two numbers, both requiring, say, a million bits for specifying them to the desired accuracy. Let one of them be an arbitrary random number, which means that there is no defining pattern or order or structure for specifying it. Let the other number be the familiar π (= 3.14159…..). The second number has very small AIC because a small computer program can be written for outputting it to a desired level of precision (π is the ratio of the perimeter of a circle to the diameter of the circle). By contrast, a random number (say 1.47373..59) has a much higher AIC: The shortest program for outputting it has information content (in terms of number of bits) not very different from that of the number itself, and the computer program for generating it can be only this:

         Print “1.47373..59”

No significantly smaller program can generate this sequence of digits. The digit stream in this case has no redundancy or regularity, and is said to be incompressible. Such digit streams are called irreducible or algorithmically random.

Such considerations have led to the conclusion that there are limits to the powers of logic and reason. Gregory Chaitin has shown that certain facts are not just computationally irreducible; they are logically irreducible as well. The 'proof' of their ‘truth’ must be in the form of additional axioms, without any reasoning.

In science we use mathematical equations for describing natural phenomena. This approach has played a crucial role in the advancement of science for several centuries. However, the advent of computers has led to a paradigm shift:


I quote Seth Lloyd (2006):
The natural dynamics of a physical system can be thought of as a computation in which a bit not only registers a 0 or 1 but acts as an instruction: 0 means ‘do this’ and 1 means ‘do that’. The significance of a bit depends not just on its value but on how that value affects other bits over time, as part of the continued information processing that makes up the dynamical evolution of the universe.
Remember, the laws of Nature are quantum-mechanical. THE UNIVERSE IS ONE BIG QUANTUM COMPUTER.

As I shall explain in a later post, the computational approach to basic science and mathematics has also made us realize that there are limits to how far we can carry out our reasoning processes for understanding or describing natural phenomena. Beyond a limit, it is like in Alice in Wonderland:
‘Do cats eat bats? Do cats eat bats?’ Sometimes she asked, ‘Do bats eat cats?’ For you see, as she couldn’t answer either question, it didn’t much matter which way she put it.
More on this later, when I introduce you to Laplace's demon.

Saturday, April 7, 2012

22. Entropy Means Unavailable or Missing Information

In Part 6 I introduced the notion of entropy in the context of heat engines, defining it as dS = dQ / T. The great Ludwig Boltzmann gave us an equivalent, statistical notion of entropy, establishing it as a measure of disorder.

Imagine a gas in a chamber (labelled A in the left part of the figure below), separated by a partition from another chamber (B) of the same volume. This second chamber is empty to start with. If the partition disappears, the molecules of the gas start moving into the right half of the enlarged chamber, and soon the gas occupies the entire (doubled) volume uniformly.

Let us say that there are n molecules of the gas. Before the partition is removed, all the molecules are in the left half of the enlarged chamber. So the probability of finding any of the molecules in the left half is 100%, and it is zero for finding that molecule in the right half. After the partition has been removed, there is only a 50% chance of finding that molecule in the left half, and 50% chance for finding it in the right half. It is like tossing a coin, and saying that we associate ‘heads’ with finding the molecule in the left half, and ‘tails’ with finding it in the right half. In both cases the chance is 50% or ½.

Next let us ask the question: What is the probability that all the n molecules of the gas will ever occupy the left half of the chamber again? This probability is the same as that of flipping a coin n times, and finding ‘heads’ in each case, namely ½n.

Considering the fact that usually n is a very large number (typically of the order of the Avogadro number, i.e. ~1023), the answer is very close to zero. In other words, the free expansion of the gas is practically an irreversible process. On removal of the partition, the gas has spontaneously gone into a state of greater disorder, and it cannot spontaneously go back to the initial state of order (or rather less disorder).

Why do we say 'greater disorder'? Because the probability of finding any specified molecule at any location in the left half of the chamber is now only half its earlier value. Here 'order' means that there is a 100% chance that a thing is where we expect it to be, so 50% chance means a state of less order, or greater disorder.

So, intuitively we have no trouble agreeing that, left to themselves (with no inputs from the outside), things are more likely to tend towards a state of greater disorder. This is all that the second law of thermodynamics says. It says that if we have an ISOLATED system, then, with the passage of time, it can only go towards a state of greater disorder on its own, and not a state of lesser disorder. This happens because, as illustrated by the free-expansion-of-gas example above, a more disordered state is more probable.

How much has the disorder of the gas increased on free expansion to twice the volume? Consider any molecule of the gas. After the expansion, there are twice as many positions at which the molecule may be found. And at any instant of time, for any such position, there are twice as many positions at which a second molecule may be found, so that the total number of possibilities for the two molecules is now 22. Thus, for n molecules there are 2n more ways in which the gas can fill the chamber after the free expansion. We say that, in the double-sized chamber, the gas has 2n more 'accessible states', or 'microstates'. [This is identical to the missing-information idea I explained in Part 21.]

The symbol W is normally used for the number of microstates accessible to a system under consideration. This number doubled when the gas expanded to twice the volume. So, one way of quantifying the degree of disorder is to say that entropy S, a measure of   disorder, is proportional to W; i.e., S ~ W.

Boltzmann did something even better than that for quantifying disorder.  He  defined  entropy  S  as  proportional  to  the  logarithm  of  W; i.e. S ~ log W.

In his honour, the constant of proportionality (kB) is now called the Boltzmann constant. Thus entropy is defined by the famous equation S = kB log2W (or just S = k log W).

To see the merit of introducing the logarithm in the definition of entropy, let us apply it to calculate the increase of entropy when the gas expands to twice the volume. Since W = 2n, we get S ~ n. This makes sense. Introduction of the logarithm in the definition of entropy makes it, like energy or mass, a property proportional to the number of molecules in the system. Such properties are described as having the additivity feature. If the number of molecules in the gas is, say, doubled from n to 2n, it makes sense that the defined entropy also doubles in a linearly proportionate fashion.

The introduction of log W, instead of W, in the definition of entropy was done for the same reasons as those for defining missing information I (cf. Part 21). In fact,

I = S.

Also, the original (thermodynamic) and the later (statistical mechanics) formulations of entropy are equivalent. In the former, entropy of an isolated system increases because a system can increase its stability by obliterating thermal gradients. In the latter, entropy increases (and information is lost) because spontaneous obliteration of concentration gradients (and the ensuing more stable state) is the most likely thing to happen. Concentration gradients get obliterated spontaneously because that takes the system towards a state of equilibrium and stability.

For an isolated system, maximum stability, maximum entropy, and maximum probability all go together.