It’s like that poster Mulder had in his office on the X-Files: I want to believe in the MaxEnt approach to statistical mechanics.
Problem is, I don’t. I used to, certainly. It’s so elegant and simple! Taking the Shannon entropy of a probability distribution as a measure of uncertainty, a good scheme for coming up with a prior probability distribution for whatever would seem to be to pick the distribution which has the maximum entropy given all the relevant constraints which are known. This way, the distribution has the least bias possible since, apart from the constraints, it has maximal uncertainty.
The recipe in statistical mechanics calls for a constraint on the average energy, and MaxEnt obliges by producing the canonical ensemble. (The microcanonical ensemble follows from a constraint on the total energy, but it would sort of be overkill to call this MaxEnt, since it’s just the principle of indifference in this case.) Similarly, the grand canonical ensemble comes from finding the distribution maximizing the entropy given fixed average energy and average particle number. A good deal of the rest of stat mech flows more or less naturally from this derivation and interpretation, making it quite attractive, if for no other reason than to save on what you need to remember for exams.
It’s all well and good (at least if you’re a Bayesian, and interpret probability as a degree of belief, not as a frequency of any shape or kind), until you start to wonder where the constraint information comes from. Then you realize that an uncomfortable situation presents itself: MaxEnt could be in conflict with Bayes’ rule, which, being a Bayesian, you of course believe in above all worldly things. This could happen if the constraint information comes from measured data, for example if the average energy constraint for the system at hand is actually the statement that you cooked up many copies of this system according to the same recipe every time, measured the energy of each, and took the mean .
According to the Bayesian approach, the correct way to incorporate this information is to update your original prior probability distribution with the measured data by using, well, Bayes’ rule. This just amounts to finding the probability of whatever you’re interested in (call it ) given the observed data (energy measurements) by taking the joint probability (energy measurements and ) and dividing by the probability of observed data (energy measurements). Very simple, though in this case we have to forget the actual sequence of energy measurement results and pretend that we only remember the mean, so that we end up with . (This we do by adding up all the probabilities for x given sequences having the same mean.) Using MaxEnt, on the other hand, we would forgo the original prior and just find the distribution maximizing the entropy under the constraint of the observed mean energy.
Do we get the same answer? Given that I said in the beginning of the post that I went from a state of MaxEnt belief to disbelief and have written the intervening paragraphs as I have, you can reasonably infer that the answer is no. (A word of congratulations to frequentist readers: You have just successfully completed a Bayesian inference!) Indeed, the answer is no, at least in the sense that we do not always get the same answer. In quite reasonable cases the two approaches give different answers.
In a nice paper detailing aspects of the constraint rule of MaxEnt, Uffink examines the two approaches for the case of rolling a die (in this context called the Brandeis dice problem). Suppose we roll the die many times and observe a mean number of 3.5. This is what one would expect for an unbiased die, i.e. one with probability 1/6 for each of the outcomes 1 to 6, and moreover this distribution has the largest entropy, so the MaxEnt probability is the uniform distribution. On the other hand, following the usual Bayesian prescription, if our prior distribution is that the die could be biased in any way, each equally-likely, then after going through a calculation similar to that of the rule of succession (after witnessing s in rolls, the probability of on the next toss is ) and adding up the probabilities for sequences of results which all have an average of 3.5, Uffink obtains the posterior distribution for the next roll. It’s not uniform; rather it gives more weight to 3 and 4 than to 1 and 6. (2 and 5 are somewhere in between.)
To me this dashes any hope of applying MaxEnt all over the place as Jaynes is wont to do. But my immediate concern was with MaxEnt as a means to justify the various ensembles of statistical mechanics. Does it still work in this context? The answer here is yes, but only insofar as it just reduces to the principle of insufficient reason. And in this case MaxEnt and Bayes’ rule give the same result (again, for a “reasonable” prior). This is straightforward for the microcanonical ensemble, as mentioned above. But the usual textbook justification of the canonical ensemble, for instance, is that it results when studying a system which, together with a much larger reservoir, is described by the microcanonical ensemble. So we didn’t need anything else in the first place! One of the original appeals of MaxEnt for stat mech was that it seems to dispose of the need for a reservoir system, and implies that the canonical ensemble is appropriate in a considerably more general setting. Alas, this cannot be justified.
I should mention how this works out in the context of the Brandeis dice problem, which will happily lead into the subject of applying large deviation principles to stat mech, but this will have to be left for another post.