xaktly | Probability

Markov chains

The basic flow of the probability pages goes like this:

One thing leads to another

The probabilities of some events are functions only of the state of the system one step before. For example, while a baseball player might have a .365 batting average (gets a hit 36.5% of at-bats — a good average, by the way), we could also have a different kind of information about his at-bats:

  • The player gets a hit 40% of the time after having got a hit on the previous at-bat.

  • The player does not get a hit 50% of the time after not having hit on the previous at-bat.

In this example one "step" is one at-bat. Steps in these problems often denote the passage of time. Given this information, we see that the next at-bat is a function solely of the last one. We could diagram this situation like this:

Notice that the only data we have are the conditional probabilities $P(H|H) = 0.4$ and $P(!H|!H) = 0.5.$ We can infer, however, that because the choice is binary – that is, the player either gets a hit or does not – that the other probabilities are $P(!H|H) = 0.6$ and $P(H|!H) = 0.5.$ That fills out our diagram. This is an example of a Markov chain. One of our main goals will be to ask what happens, if we begin in some initial "state," if we let several steps unfold in time. Does the system reach a steady or constant state, or does it continue to fluctuate between hit and no hit.

A simple example

Now let's work through a simple example and get into the mathematics of predicting future states of our system given these rules for future events. Let's imagine we live in a place where there are only two kinds of weather: rain and sunshine. Now let's further suppose that 30% of the time, a rainy day is followed by another rainy day, and 40% of the time a sunny day is followed by another sunny day. We can diagram the situation like this:

Now on a rainy day there are only two possibilities for the next day, that it will rain, 30%, and or that it will be sunny, 70%. Likewise, the probability that it will rain after a sunny day is 1 - 0.4, or 60%. We can then complete our diagram like this:

In such a scenario, the probability of what happens next (tomorrow) is a function of (and only of) what happens immediately before – today in this case.

We can use our probabilities to draw a tree diagram to predict the probability of future weather. Let's predict what will happen on the third day. The tree looks like this:

Notice that the probability, say, of rain on the first and second days and sun on the third is $P(RRS) = (0.3)(0.7) = 0.21.$ The first day is a given.

We can, of course, keep on extending these trees. Here's the same tree extended for a fourth day:

The total probabilities, on the fourth day, for rain and sun are just the sums of the numbers on the right:

$$ \begin{align} P(R) = 0.027+0.126+0.126+0.168 &= 0.4470 \\[5pt] P(S) = 0.063+0.084+0.294+0.112 &= 0.5530 \\[5pt] \text{sum} &= 1.0 \end{align}$$

This process is fine while the tree is small, but it's clearly going to be a little cumbersome to write as we continue to predict future events. We'll solve this problem by using some simple matrix methods, including

You might want to use those links to refresh your knowledge of matrix-vector multiplication.

Matrix-vector method

We'll begin by forming a transition matrix, $P$, like this:

$$ \begin{matrix} \phantom{0} & R & S \\ R & 0.3 & 0.6 \\ S & 0.7 & 0.4 \end{matrix}$$

where, reading columns first then rows, we have the probability of moving from rain to rain (RR) = 0.3, the probability of moving from rain to sun (RS) = 0.7, and so on for the second row. We'll put these elements into a $2 \times 2$ matrix called $P$ like this:

$$P = \begin{bmatrix} 0.3 & 0.6 \\ 0.7 & 0.4 \end{bmatrix}$$

Now let's begin on a rainy day, represented by a column vector:

$$ S_o = \begin{bmatrix} R \\ S \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$$

If we project forward one day we just multiply the matrix of probabilities by our starting vector $P$ to get:

$$ \begin{bmatrix} 0.3 & 0.6 \\ 0.7 & 0.4 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0.3 + 0 \\ 0.7 + 0 \end{bmatrix} = \begin{bmatrix} 0.3 \\ 0.7 \end{bmatrix}$$

We can project another day forward to get

$$ \begin{bmatrix} 0.3 & 0.6 \\ 0.7 & 0.4 \end{bmatrix} \begin{bmatrix} 0.3 \\ 0.7 \end{bmatrix} = \begin{bmatrix} 0.09 + 0.42 \\ 0.21 + 0.28 \end{bmatrix} = \begin{bmatrix} 0.51 \\ 0.49 \end{bmatrix}$$

Another day ahead gives us:

$$ \begin{bmatrix} 0.3 & 0.6 \\ 0.7 & 0.4 \end{bmatrix} \begin{bmatrix} 0.51 \\ 0.49 \end{bmatrix} = \begin{bmatrix} 0.153 + 0.294 \\ 0.357 + 0.196 \end{bmatrix} = \begin{bmatrix} 0.447 \\ 0.553 \end{bmatrix}$$

which matches our tree result above.

Now if we think about this, the first step was the matrix-vector multiplication

$$P \vec S_o = S_1,$$

where $S_1$ is the day-two "state vector." It says that there is a 0.3 chance of rain and a 0.7 chance of sun.

The second multiplication is

$$P \vec S_1 = P (P \vec S_o) = P^2 \vec S_o$$

Projecting forward a third day gives

$$P \vec S_2 = P (P \vec S_1) = P (P (P \vec S_o)) = P^3 \vec S_o$$

It's not difficult, now, to catch the pattern. The probability distribution (the elements of the state vector) on the nth day is

$$S_n = P^n \vec S_o$$

So to calculate the state vector components on, say, the 10th day, we can just find

$$P^{10} \vec S_o = \begin{bmatrix} 0.3 & 0.6 \\ 0.7 & 0.4 \end{bmatrix}^{10} \begin{bmatrix} 1 \\0 \end{bmatrix}$$

Now that seems pretty straightforward, except for the part about finding the 10th power of a matrix. Fortunately there are computer algorithms for doing that. They are likely beyond the scope of what you know right now, but suffice to say that matrix powers can be calculated quickly using a calculator, a spreadsheet or one of the math library functions from a coding language like Python. The result of the calculation above is

$$ \begin{bmatrix} 0.3 & 0.6 \\ 0.7 & 0.4 \end{bmatrix}^{10} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0.4615 \\ 0.5385 \end{bmatrix}$$

If we calculate our state vector for all of the powers of $P$ from 1 to 10 and plot the elements of $\vec S_n$, we get the graph below, which shows a convergence to a steady state over time. The blue markers are the rainy days and the yellow ones the sunny days.

The plot above also includes the 100th power of or $P$ matrix (in the gray area) so you can see that the probabilities of rain and sun indeed converge to stable values over long times (numbers of days in this case). We can conclude that the average probabilities of rain and sun in the region are about 54% and 46%, respectively.

A note on my notation

In many texts and other writings, the transition matrix is written differently than I've done it in the previous examples. My multiplication of the transition matrix $P$ by the state vector $\vec S_o$ was

$$ \begin{bmatrix} 0.3 & 0.6 \\ 0.7 & 0.4 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0.3 + 0 \\ 0.7 + 0 \end{bmatrix} = \begin{bmatrix} 0.3 \\ 0.7 \end{bmatrix}$$

where the $P$ matrix was constructed by writing column vectors, and the multiplication order was $P \vec S_o$. Others would write the transpose (swap rows for columns) of my matrix $P$ to make $P^T$, and the state vector would be $\vec S_o = (1, 0)$. The multiplication order would then be $\vec S_o P^T.$ It looks like this.

$$ \begin{align} \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} 0.3 & 0.7 \\ 0.6 & 0.4 \end{bmatrix} &= \begin{bmatrix} 0.3 + 0 & 0.7 + 0 \end{bmatrix} \\[5pt] &= \begin{bmatrix} 0.3 & 0.7 \end{bmatrix} \end{align}$$

The difference in the two methods is trivial, but I prefer the "operator" approach. That is, we can consider the probability transition matrix $P$ to be operating on the state vector to produce a new state vector if we write it in the order $P \vec S_o$. This kind of ordering would be typical in other fields, like quantum mechanics, and I've chosen to use it while working Markov problems. You can make your own choice.

More complicated, but not more difficult

Here's a hypothetical scenario with three outcomes, A, B and C, and the probabilities of transition from each. Notice that the sum of probabilities of a transition from each node to the others (and to "self") is one.

The transition matrix for this situation is set up like this:

$$ \begin{matrix} \phantom{0} & A & B & C \\ A & 0.35 & 0.30 & 0.20 \\ B & 0.50 & 0.50 & 0.70 \\ C & 0.15 & 0.20 & 0.10 \end{matrix}$$

Now let's begin in state A, represented by the vector $S_0 = (1, 0, 0)$ — but a column vector, of course.

The first transition – one time step – is given the the multiplication:

$$ \begin{bmatrix} 0.35 & 0.30 & 0.20 \\ 0.50 & 0.50 & 0.70 \\ 0.15 & 0.20 & 0.10 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 0.35 \\ 0.50 \\ 0.15 \end{bmatrix}$$

If we continue to run this scenario forward as we did in the Sun/Rain example above, here is a graph of the trends in the three probabilities. The steady states of each are reached in just a few steps.

Does it matter where we start?

Would it matter if we began in one of these two states:

$$ \vec S_o^{(1)} = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} \phantom{00} \text{or} \phantom{00} \vec S_o^{(2)} = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}$$

The answer is no. The equilibrium probabilities would turn out the same. Here's why. This is the hundredth power of the transition matrix $P$ for this situation:

$$ P^{100} = \begin{bmatrix} 0.2981 & 0.2981 & 0.2981 \\ 0.5337 & 0.5337 & 0.5337 \\ 0.1683 & 0.1683 & 0.1683 \end{bmatrix}$$

Notice that all three columns converge to the same column vector. That means whether we multiply this matrix by vector $\vec S_o^{(1)}$ or $\vec S_o^{(2)}$, we arrive at the same equilibrium probability vector. You can try this for yourself. The equilibrium is encoded in the transition matrix, not the starting conditions.

Matrix multiplication & powers on a TI-84

Here's how to enter matrices (the plural of matrix) on a TI-84 calculator and how to do some simple multiplication operations. The calculator makes these things very easy.

To access the matrix functions, hit [2nd][x-1]. You'll see a screen like this. Matrices are labeled using square brackets, like [A]. Arrow to the right to access [EDIT], and we'll edit matrix [A].

To edit a matrix, first set the dimensions. We'll use a 2×2 matrix from above. Remember that with matrices, it's always (number of rows) × (number of columns).

I put the 2×1 column vector (1, 0) into matrix [B] on the calculator, too.

Now to do the multiplication [A][B], just enter [A]*[B] and hit enter. You'll get this result, as we did in the Sun/Rain example above

To raise a matrix to a power is really just like raising a number to a power. For example, to raise our 3×3 matrix to the 10th power, just enter [A]^10 and hit [ENTER] to get the resulting 3×3 matrix:

You can use the TI-84 to do a number of other matrix operations, too, including adding and subtracting matrices of the same dimensions and finding the inverse of a matrix. For example, the inverse of matrix [A] is obtained by entering [A]-1 and hitting [ENTER].

Creative Commons License   optimized for firefox
xaktly.com by Dr. Jeff Cruzan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. © 2023, Jeff Cruzan. All text and images on this website not specifically attributed to another source were created by me and I reserve all rights as to their use. Any opinions expressed on this website are entirely mine, and do not necessarily reflect the views of any of my employers. Please feel free to send any questions or comments to jeff.cruzan@verizon.net.