0%

EE411 Information Theory and Coding

Outline

  • Lecture 1: Introduction
  • Lecture 2: Entropy

Lecture 2: Entropy

  • Monotonous: less probable $\rightarrow$ more information
  • Additive: Total info = $\sum$ info independent events

For a discrete random variable $X$:

  • Realizations: $\mathcal X=\{x_1,\dots,x_q\}$
  • Probabilities: $\{p_1,\dots,p_q\}$
  • Self-info of $X=x_i$: $-\log p_i$
  • Avg. information: $I(X)=-\sum\limits_{i=1}^q p_i \log p_i$

Entropy:

A measure of the amount of info required to describe a r.v., average bits to describe the random variable.

Lemma 1. $H(X)\ge 0$

Lemma 2. $H_b(X)=(\log_ba)H_a(X)$

Denote $H(p) = -p\log p - (1-p)\log(1-p)$

Joint Entropy:

If $X$ and $Y$ are independent, then $H(X,Y) = H(X) + H(Y)$

Conditional Entropy:

Chain Rule of Entropy:

Proof by induction.

Relative Entropy (KL Divergence):

A measure of the distance between two distributions, extra bits needed for describing another r.v. based on the given r.v.

Mutual Information:

The relative entropy between the joint distribution and the product distribution.

Conditional Mutual Information:

Chain Rule for Mutual Information:

Conditional Relative Entropy:

Given joint probability mass functions $p(x, y)$ and $q(x, y)$, the conditional relative entropy is:

Chain Rule for Relative Entropy

Another way to explain mutual information

For r.v. $X,Y \sim p(x,y) $, $\tilde X, \tilde Y \sim q(x,y)$:

If $q(x,y) = p(x)\cdot p(y)$, then $q(x) = p(x)$, $q(y) = p(y)$

Therefore $q(x,y) = q(x) \cdot q(y)$, $\tilde X \bot \tilde Y$ .

Lecture 3: Inequality

$f(x)$ is convex IFF over $(a,b)$, $\forall x_1,x_2 \in (a,b)$ and $0\le \lambda \le 1$ :

$f(x)$ is strictly convex if equality holds only if $\lambda = 0/1$.

$f(x)$ is concave if $-f(x)$ is convex.

Theorem: Jensen’s Equality

For convex function $f(x)$ and r.v. $X$, $E[f(X)] \ge f(E[X])$. If strict convex, equality holds only if $X$ is constant.

Proof by induction.

Theorem: Information Inequality

For any probability mass function $p(x)$ and $q(x)$, $D(p(x)|q(x)) \ge 0$.

Equality IFF $\forall x, p(x)=q(x)$.

Theorem: Nonnegativity of mutual information

For any random variables $X$, $Y$, $I(X;Y)=D(p(x,y)|p(x)p(y))\ge 0$.

Equality IFF $X \bot Y$.

Corollaries

  • $D(p(y|x)|q(y|x))\ge0$, with equality IFF

  • $I(X;Y|Z)\ge0$, with equality IFF $X$ and $Y$ are conditionally independent given $Z$.

Theorem: The maximum entropy distribution

For discrete r.v. $X$, $H(X)\le \log | \mathcal X |$, with equality IFF $X$ has a uniform distribution over $|\mathcal X|$.

Theorem: Conditioning reduces entropy

$H(X|Y)\le H(X)$ with equality IFF $X\bot Y$.

Theorem: Independence bound on entropy

$H(X_1,\dots,X_n)\le\sum\limits_{i=1}^n H(X_i)$ with equality IFF $\forall i\ne j, X_i\bot X_j$

Markov Chain

Random variables $\{X_n\}$ form a Markov chain $X_1\rightarrow\cdots\rightarrow X_n$if the p.m.f. satisfies

Properties

  • $X\rightarrow Y\rightarrow Z\implies p(x,z|y)=p(x|y)p(z|y)$
  • $X\rightarrow Y\rightarrow Z\implies Z\rightarrow Y\rightarrow X$
  • If $Z=f(Y)$, then $X\rightarrow Y\rightarrow Z$.

Theorem: Data-processing inequality

If $X\rightarrow Y \rightarrow Z$, then $I(X;Y)\ge I(X;Z)$.

Since $X\rightarrow Y \rightarrow Z$, we have $I(X;Z|Y)=0$. Since $I(X;Y|Z)\ge 0$, it holds.