# Information Causality

In a recent preprint, Pawlowski et al. propose a natural physical principle called information causality which classical and quantum mechanics obeys but is violated by any theory in which exhibit CHSH correlations beyond Tsirelson’s bound of $2\sqrt{2}$. Roughly speaking, information causality is the principle that if you send $m$ classical bits about a random variable $X$ to someone, their uncertainty about $X$ should not decrease by more than $m$ bits. For more on the background, see this excellent post by the his whole-iness, the Quantum Pontiff.

More precisely, it is defined in the following context: separated parties Alice and Bob play a game in which Alice receives a random $n$ bit string $A_1,A_2,\dots, A_n$ while Bob receives an index $j=1,\dots n$; Bob’s task is to guess the value of $A_j$. Additionally, Alice and Bob share a joint system  $A'B$ (already used $A$), described by whatever theory you want, and can use this however they like (measure it in some way, etc.). If Alice sends Bob an $m$ bit string $X$ (presumably depending on $A$ and some measurement on her half of the joint system), then information causality is the statement that no matter how Bob generates his guess $\widehat{A}_j$ from $X$ and measurement on his part of system $A'B$, $I=\sum_j I(A_j:\widehat{A}_j)\leq m$, where $I(A_j:\widehat{A}_j)$ is the mutual information between Bob’s guess $\widehat{A}_j$ and the actual bit $A_j$.

What does this mean, in more direct terms? Well, $I$ limits the probability of Bob correctly guessing Alice’s bit, a fact which I imagine the authors had in mind based on the discussion surrounding equation A5 in the paper. Let me elaborate on the connection here. By using the Fano inequality, we can link the mutual information of two random variables $X$ and $Y$ to the probability of guessing $X$ given $Y$: $p_e>(H(X|Y)-1)/\log|X|$, where $|X|$ is the number of values that $X$ can take.

Applied to the probability of error when Bob guesses Alice’s $j$th bit, averaged over the bits $j$, we end up with $p_e(j) \geq 1-I(A_j:\widehat{A}_j)$, where $A_j$ is the random variable corresponding to Alice’s $j$th bit ( $H(A_j)=1$ ), and $\widehat{A}_j$ is Bob’s guess. The average probability of error is then $p_e\geq 1-\frac{1}{n}\sum_{j=1}^nI(A_j:\widehat{A}_j)$. Since $\widehat{A}_j$ is generated from $X$ (the message), and $B$ (the extra system that Bob has, be it for the moment classical or quantum), $I(A_j:\widehat{A}_j)\leq I(A_j:XB)$ (data processing inequality/Holevo bound). In the course of showing information causality holds for classical and quantum mechanics, they prove that $\sum_{j=1}^n I(A_j:XB) \leq I(A:XB) \leq m$, so $p_e\geq (n-m)/n$. Thus, information causality says what we would have guessed intuitively: in order for Bob to be able to predict any of Alice’s bits with high probability, she’ll have to send him $n$ bits.

I would argue this formulation is more operational than a claim about $I$ itself, but of course they are very closely related. Phrasing things in terms of the number of bits required to complete the task is operational in the sense usually used in (quantum) information theory, e.g. the number of extractable random bits, number of distillable EPR pairs, smallest code that allows for reliable message recovery, etc. It also opens up the possibility of demonstrating that a particular theory obeys/disobeys information causality without making entropic arguments. On the other hand, it’s not operational in the sense that Alice and Bob can simply make a specified measurement on the joint system and compare with a bound like Bell/CHSH or Tsirelson. They first have to determine which protocol to use, and this may be difficult.