Aldrich Mamon: Markov Chain

A Markov chain, named for Andrey Markov, is a mathematical system that transits from one state to another (out of a finite or countable number of possible states) in a chainlike manner. It is a random process endowed with the Markov property: that the next state depends only on the current state and not on the past. Markov chains have many applications as statistical models of real-world processes.

A simple two-state Markov chain.

Formally, a Markov chain is a discrete (discrete-time) random process with the Markov property. Often, the term "Markov chain" is used to mean a Markov process which has a discrete (finite or countable) state-space. Usually a Markov chain would be defined for a discrete set of times (i.e. a discrete-time Markov chain)^[1] although some authors use the same terminology where "time" can take continuous values.^[2]^[3] Also see continuous-time Markov process. The use of the term in Markov chain Monte Carlo methodology covers cases where the process is in discrete-time (discrete algorithm steps) with a continuous state space. The following concentrates on the discrete-time discrete-state-space case.

A "discrete-time" random process means a system which is in a certain state at each "step", with the state changing randomly between steps. The steps are often thought of as time, but they can equally well refer to physical distance or any other discrete measurement; formally, the steps are just the integers or natural numbers, and the random process is a mapping of these to states. The Markov property states that the conditional probability distribution for the system at the next step (and in fact at all future steps) given its current state depends only on the current state of the system, and not additionally on the state of the system at previous steps.

Since the system changes randomly, it is generally impossible to predict the exact state of the system in the future. However, the statistical properties of the system's future can be predicted. In many applications it is these statistical properties that are important.

The changes of state of the system are called transitions, and the probabilities associated with various state-changes are called transition probabilities. The set of all states and transition probabilities completely characterizes a Markov chain. By convention, we assume all possible states and transitions have been included in the definition of the processes, so there is always a next-state and the process goes on forever.

A famous Markov chain is the so-called "drunkard's walk", a random walk on the number line where, at each step, the position may change by +1 or −1 with equal probability. From any position there are two possible transitions, to the next or previous integer. The transition probabilities depend only on the current position, not on the way the position was reached. For example, the transition probabilities from 5 to 4 and 5 to 6 are both 0.5, and all other transition probabilities from 5 are 0. These probabilities are independent of whether the system was previously in 4 or 6.

Another example is the dietary habits of a creature who eats only grapes, cheese or lettuce, and whose dietary habits conform to the following (artificial) rules: it eats exactly once a day; if it ate cheese yesterday, it will not today, and it will eat lettuce or grapes with equal probability; if it ate grapes yesterday, it will eat grapes today with probability 1/10, cheese with probability 4/10 and lettuce with probability 5/10; finally, if it ate lettuce yesterday, it won't eat lettuce again today but will eat grapes with probability 4/10 or cheese with probability 6/10. This creature's eating habits can be modeled with a Markov chain since its choice depends on what it ate yesterday, not additionally on what it ate 2 or 3 (or 4, etc.) days ago. One statistical property one could calculate is the expected percentage of the time the creature will eat grapes over a long period.

A series of independent events—for example, a series of coin flips—does satisfy the formal definition of a Markov chain. However, the theory is usually applied only when the probability distribution of the next step depends non-trivially on the current state.

Many other examples of Markov chains exist.

A simple example is shown in the figure on the right, using a directed graph to picture the state transitions. The states represent whether the economy is in a bull market, a bear market, or a recession, during a given week. According to the figure, a bull week is followed by another bull week 90% of the time, a bear market 7.5% of the time, and a recession the other 2.5%. From this figure it is possible to calculate, for example, the long-term fraction of time during which the economy is in a recession, or on average how long it will take to go from a recession to a bull market.

A thorough development and many examples can be found in the on-line monograph Meyn & Tweedie 2005.^[4] The appendix of Meyn 2007,^[5] also available on-line, contains an abridged Meyn & Tweedie.

A finite state machine can be used as a representation of a Markov chain. Assuming a sequence of independent and identically distributed input signals (for example, symbols from a binary alphabet chosen by coin tosses), if the machine is in state y at time n, then the probability that it moves to state x at time n + 1 depends only on the current state.A Bernoulli scheme is a special case of a Markov chain where the transition probability matrix has identical rows, which means that the next state is even independent of the current state (in addition to being independent of the past states). A Bernoulli scheme with only two possible states is known as a Bernoulli process.Markovian systems appear extensively in thermodynamics and statistical mechanics, whenever probabilities are used to represent unknown or unmodelled details of the system, if it can be assumed that the dynamics are time-invariant, and that no relevant history need be considered which is not already included in the state description.

Michaelis-Menten kinetics. The enzyme (E) binds a substrate (S) and produces a product (P). Each reaction is a state transition in a Markov chain.

Chemistry is often a place where Markov chains and continuous-time Markov processes are especially useful because these simple physical systems tend to satisfy the Markov property quite well. The classical model of enzyme activity, Michaelis-Menten kinetics, can be viewed as a Markov chain, where at each time step the reaction proceeds in some direction. While Michaelis-Menten is fairly straightforward, far more complicated reaction networks can also be modeled with Markov chains.

An algorithm based on a Markov chain was also used to focus the fragment-based growth of chemicals in silico towards a desired class of compounds such as drugs or natural products.^[6] As a molecule is grown, a fragment is selected from the nascent molecule as the "current" state. It is not aware of its past (i.e., it is not aware of what is already bonded to it). It then transitions to the next state when a fragment is attached to it. The transition probabilities are trained on databases of authentic classes of compounds.

Also, the growth (and composition) of copolymers may be modeled using Markov chains. Based on the reactivity ratios of the monomers that make up the growing polymer chain, the chain's composition may be calculated (e.g., whether monomers tend to add in alternating fashion or in long runs of the same monomer). Due to steric effects, second-order Markov effects may also play a role in the growth of some polymer chains.

Markov chains are used throughout information processing. Claude Shannon's famous 1948 paper A mathematical theory of communication, which in a single step created the field of information theory, opens by introducing the concept of entropy through Markov modeling of the English language. Such idealized models can capture many of the statistical regularities of systems. Even without describing the full structure of the system perfectly, such signal models can make possible very effective data compression through entropy encoding techniques such as arithmetic coding. They also allow effective state estimation and pattern recognition.

Markov chains are also the basis for Hidden Markov Models, which are an important tool in such diverse fields as telephone networks (for error correction), speech recognition and bioinformatics. The world's mobile telephone systems depend on the Viterbi algorithm for error-correction, while hidden Markov models are extensively used in speech recognition and also in bioinformatics, for instance for coding region/gene prediction. Markov chains also play an important role in reinforcement learning.Markov chain methods have also become very important for generating sequences of random numbers to accurately reflect very complicated desired probability distributions, via a process called Markov chain Monte Carlo (MCMC). In recent years this has revolutionized the practicability of Bayesian inference methods, allowing a wide range of posterior distributions to be simulated and their parameters found numerically.Markov chains are generally used in describing path-dependent arguments, where current structural configurations condition future outcomes. An example is the commonly argued link between economic development and the rise of capitalism. Once a country reaches a specific level of economic development, the configuration of structural factors, such as size of the commercial bourgeoisie, the ratio of urban to rural residence, the rate of political mobilization, etc., will generate a higher probability of transitioning from authoritarian to capitalist.

Andrey Markov produced the first results (1906) for these processes, purely theoretically. A generalization to countably infinite state spaces was given by Kolmogorov (1936). Markov chains are related to Brownian motion and the ergodic hypothesis, two topics in physics which were important in the early years of the twentieth century, but Markov appears to have pursued this out of a mathematical motivation, namely the extension of the law of large numbers to dependent events. In 1913, he applied his findings for the first time to the first 20,000 letters of Pushkin's Eugene Onegin

Seneta^[21] provides an account of Markov's motivations and the theory's early development. The term "chain" was used by Markov (1906).^[22]

Aldrich Mamon

Lunes, Abril 4, 2011

Markov Chain

Walang komento:

Mag-post ng isang Komento