, where Z is an arbitrary set. We assume that there is z ∗ ∈ Z with f(z ∗) ≥ f(z), for all z ∈ Z. We also assume that there is a nonnegative function such that
Having found z k , we maximize the function
to get z k+1. Adopting such an iterative approach presupposes that maximizing H(z k , z) is simpler than maximizing f(z) itself. This is the case with the EM algorithm.
The cross-entropy or Kullback-Leibler distance  is a useful tool for analyzing the EM algorithm. For positive numbers u and v, the Kullback-Leibler distance from u to v is
We also define KL(0, 0) = 0, KL(0, v) = v, and . The KL distance is extended to nonnegative vectors component-wise, so that for nonnegative vectors a and b, we have
One of the most useful and easily proved facts about the KL distance is contained in the following lemma; we simplify the notation by setting b(z) = b(x, z).
For nonnegative vectors a and b, with