Analysis for Imaging

. We pay special attention to the case d = 2, which is the most important case for image processing and image analysis applications.

The chapter is organized as follows. Section 2 presents central tools from functional analysis in Hilbert spaces, e.g., the pseudo-inverse of a bounded operator and the central facts from frame theory. In Sect. 3, we introduce several operators that play important roles in Gabor analysis. Gabor frames on ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ are introduced in Sect. 4, and their discrete counterpart are treated in Sect. 5. Finally, the application of Gabor expansions to image representation is considered in Sect. 6.

2 Tools from Functional Analysis

In this section, we recall basic facts from functional analysis. Unless another reference is given, a proof can be found in [17]. In the entire section, $\mathcal{H}$ denotes a separable Hilbert space with inner product $\langle \cdot,\cdot \rangle$ .

The Pseudo-inverse Operator

It is well known that an arbitrary matrix has a pseudo-inverse, which can be used to find the minimal-norm least squares solution of a linear system. In the case of an operator on infinite-dimensional Hilbert spaces, one has to restrict the attention to linear operators with closed range in order to obtain a pseudo-inverse. Observe that a bounded operator (We will always assume linearity!) U on a Hilbert space $\mathcal{H}$ is invertible if and only if it is injective and surjective, while injectivity combined with a dense range is not sufficient in the infinite-dimensional case. However, if the range of U is closed, there exists a “right-inverse operator” U ^† in the following sense:

Lemma 1.

Let $\mathcal{H},\mathcal{K}$ be Hilbert spaces, and suppose that $U: \mathcal{K}\rightarrow \mathcal{H}$ is a bounded operator with closed range $\mathcal{R}_{U}$ . Then there exists a bounded operator $U^{\dag }: \mathcal{H}\rightarrow \mathcal{K}$ for which

$\displaystyle\begin{array}{rcl} UU^{\dag }x = x,\ \forall x \in \mathcal{R}_{ U}.& &{}\end{array}$

(1)

Proof.

Consider the operator obtained by taking the restriction of U to the orthogonal complement of the kernel of U, i.e., let

$\displaystyle{\tilde{U}:= U_{\mid \mathcal{N}_{U}^{\perp }}: \mathcal{N}_{U}^{\perp }\rightarrow \mathcal{H}.}$

Obviously, $\tilde{U}$ is linear and bounded. $\tilde{U}$ is also injective: if $\tilde{U}x = 0$ , it follows that $x \in \mathcal{N}_{U}^{\perp }\cap \mathcal{N}_{U} =\{ 0\}.$ We prove next that the range of $\tilde{U}$ equals the range of U. Given $y \in \mathcal{R}_{U}$ , there exists $x \in \mathcal{K}$ such that Ux = y. By writing x = x ₁ + x ₂, where $x_{1} \in \mathcal{N}_{U}^{\perp },\ x_{2} \in \mathcal{N}_{U}$ , we obtain that

$\displaystyle{\tilde{U}x_{1} = Ux_{1} = U(x_{1} + x_{2}) = Ux = y.}$

It follows from Banach’s theorem that $\tilde{U}$ has a bounded inverse

$\displaystyle\begin{array}{rcl} \tilde{U}^{-1}: \mathcal{R}_{ U} \rightarrow \mathcal{N}_{U}^{\perp }.& & {}\\ \end{array}$

Extending $\tilde{U}^{-1}$ by zero on the orthogonal complement of $\mathcal{R}_{U}$ , we obtain a bounded operator $U^{\dag }: \mathcal{H}\rightarrow \mathcal{K}$ for which UU ^† x = x for all $x \in \mathcal{R}_{U}$ . ■

The operator U ^† constructed in the proof of Lemma 1 is called the pseudo-inverse of U. In the literature, one will often see the pseudo-inverse of an operator U defined as the unique operator U ^† satisfying that

$\displaystyle\begin{array}{rcl} \mathcal{N}_{U^{\dag }} = \mathcal{R}_{U}^{\perp },\ \ \mathcal{R}_{ U^{\dag }} = \mathcal{N}_{U}^{\perp },\ \mbox{ and}\ UU^{\dag }x = x,x \in \mathcal{R}_{ U};& &{}\end{array}$

(2)

this definition is equivalent to the above construction. We collect some properties of U ^† and its relationship to U.

Lemma 2.

Let $U: \mathcal{K}\rightarrow \mathcal{H}$ be a bounded operator with closed range. Then the following holds:

(i)

The orthogonal projection of $\mathcal{H}$ onto $\mathcal{R}_{U}$ is given by UU ^†.

(ii)

The orthogonal projection of $\mathcal{K}$ onto $\mathcal{R}_{U^{\dag }}$ is given by U ^† U.

(iii)

U ^∗ has closed range, and (U ^∗ ) ^† = (U ^† ) ^∗.

(iv)

On $\mathcal{R}_{U}$ , the operator U ^† is given explicitly by

$\displaystyle\begin{array}{rcl} U^{\dag } = U^{{\ast}}(UU^{{\ast}})^{-1}.& & {}\end{array}$

(3)

Bessel Sequences in Hilbert Spaces

When we deal with infinite-dimensional vector spaces, we need to consider expansions in terms of infinite series. The purpose of this section is to introduce a condition that ensures that the relevant infinite series actually converge. When speaking about a sequence $\{f_{k}\}_{k=1}^{\infty }$ in $\mathcal{H}$ , we mean an ordered set, i.e., $\{f_{k}\}_{k=1}^{\infty } =\{ f_{1},f_{2},\ldots \}.$ That we have chosen to index the sequence by the natural numbers is just for convenience.

Definition 1.

A sequence $\{f_{k}\}_{k=1}^{\infty }$ in $\mathcal{H}$ is called a Bessel sequence if there exists a constant B > 0 such that

$\displaystyle\begin{array}{rcl} \sum _{k=1}^{\infty }\vert \langle f,f_{ k}\rangle \vert ^{2} \leq B\ \vert \vert f\vert \vert ^{2},\ \forall f \in \mathcal{H}.& &{}\end{array}$

(4)

Any number B satisfying (4) is called a Bessel bound for $\{f_{k}\}_{k=1}^{\infty }$ . The optimal bound for a given Bessel sequence $\{f_{k}\}_{k=1}^{\infty }$ is the smallest possible value of B > 0 satisfying (4). Except for the case $f_{k} = 0,\ \forall k \in \mathbb{N}$ , the optimal bound always exists.

Theorem 1.

Let $\{f_{k}\}_{k=1}^{\infty }$ be a sequence in $\mathcal{H}$ and B > 0 be given. Then $\{f_{k}\}_{k=1}^{\infty }$ is a Bessel sequence with Bessel bound B if and only if

$\displaystyle{T:\{ c_{k}\}_{k=1}^{\infty }\rightarrow \sum _{ k=1}^{\infty }c_{ k}f_{k}}$

defines a bounded operator from $\ell^{2}(\mathbb{N})$ into $\mathcal{H}$ and $\vert \vert T\vert \vert \leq \sqrt{B}$ .

The operator T is called the synthesis operator. The adjoint T ^∗ is called the analysis operator and is given by

$\displaystyle\begin{array}{rcl} T^{{\ast}}: \mathcal{H}\rightarrow \ell^{2}(\mathbb{N}),\ \ T^{{\ast}}f =\{\langle f,f_{ k}\rangle \}_{k=1}^{\infty }.& & {}\\ \end{array}$

These operators play key roles in the theory of frames, to be considered in section “Frames and Their Properties.”

The Bessel condition (4) remains the same, regardless of how the elements $\{f_{k}\}_{k=1}^{\infty }$ are numbered. This leads to a very important consequence of Theorem 1:

Corollary 1.

If $\{f_{k}\}_{k=1}^{\infty }$ is a Bessel sequence in $\mathcal{H}$ , then $\sum _{k=1}^{\infty }c_{k}f_{k}$ converges unconditionally for all $\{c_{k}\}_{k=1}^{\infty }\in \ell^{2}(\mathbb{N})$ , i.e., the series is convergent, irrespective of how and in which order the summation is realized.

Thus, a reordering of the elements in $\{f_{k}\}_{k=1}^{\infty }$ will not affect the series $\sum _{k=1}^{\infty }c_{k}f_{k}$ when $\{c_{k}\}_{k=1}^{\infty }$ is reordered the same way: the series will converge toward the same element as before. For this reason, we can choose an arbitrary indexing of the elements in the Bessel sequence; in particular, it is not a restriction that we present all results with the natural numbers as index set. As we will see in the sequel, all orthonormal bases and frames are Bessel sequences.

General Bases and Orthonormal Bases

We will now briefly consider bases in Hilbert spaces. In particular, we will discuss orthonormal bases, which are the infinite-dimensional counterparts of the canonical bases in $\mathbb{C}^{n}$ . Orthonormal bases are widely used in mathematics as well as physics, signal processing, and many other areas where one needs to represent functions in terms of “elementary building blocks.”

Definition 2.

Consider a sequence $\{e_{k}\}_{k=1}^{\infty }$ of vectors in $\mathcal{H}$ .

(i)

The sequence $\{e_{k}\}_{k=1}^{\infty }$ is a (Schauder) basis for $\mathcal{H}$ if for each $f \in \mathcal{H}$ , there exist unique scalar coefficients $\{c_{k}(f)\}_{k=1}^{\infty }$ such that

$\displaystyle\begin{array}{rcl} f =\sum _{ k=1}^{\infty }c_{ k}(f)e_{k}.& & {}\end{array}$

(5)

(ii)

A basis $\{e_{k}\}_{k=1}^{\infty }$ is an unconditional basis if the series (5) converges unconditionally for each $f \in \mathcal{H}$ .

(iii)

A basis $\{e_{k}\}_{k=1}^{\infty }$ is an orthonormal basis if $\{e_{k}\}_{k=1}^{\infty }$ is an orthonormal system, i.e., if

$\displaystyle\begin{array}{rcl} \langle e_{k},e_{j}\rangle =\delta _{k,j} = \left \{\begin{array}{lll} 1\quad \mbox{ if}\quad k = j,\\ 0\quad \mbox{ if} \quad k\neq j.\end{array} \right.& & {}\\ \end{array}$

An orthonormal basis leads to an expansion of the type (5) with an explicit expression for the coefficients c _k(f):

Theorem 2.

If $\{e_{k}\}_{k=1}^{\infty }$ is an orthonormal basis, then each $f \in \mathcal{H}$ has an unconditionally convergent expansion

$\displaystyle\begin{array}{rcl} f =\sum _{ k=1}^{\infty }\langle f,e_{ k}\rangle e_{k}.& &{}\end{array}$

(6)

In practice, orthonormal bases are certainly the most convenient bases to use: for other types of bases, the representation (6) has to be replaced by a more complicated expression. Unfortunately, the conditions for $\{e_{k}\}_{k=1}^{\infty }$ being an orthonormal basis are strong, and often it is impossible to construct orthonormal bases satisfying extra conditions. We discuss this in more detail later. Note also that it is not always a good idea to use the Gram–Schmidt orthonormalization procedure to construct an orthonormal basis from a given basis: it might destroy special properties of the basis at hand. For example, the special structure of a Gabor basis (to be discussed later) will be lost.

Frames and Their Properties

We are now ready to introduce one of the central subjects:

Definition 3.

A sequence $\{f_{k}\}_{k=1}^{\infty }$ of elements in $\mathcal{H}$ is a frame for $\mathcal{H}$ if there exist constants A, B > 0 such that

$\displaystyle\begin{array}{rcl} A\ \vert \vert f\vert \vert ^{2} \leq \sum _{ k=1}^{\infty }\vert \langle f,f_{ k}\rangle \vert ^{2} \leq B\ \vert \vert f\vert \vert ^{2},\quad \forall f \in \mathcal{H}.& &{}\end{array}$

(7)

The numbers A and B are called frame bounds . A special role is played by frames for which the optimal frame bounds coincide:

Definition 4.

A sequence $\{f_{k}\}_{k=1}^{\infty }$ in $\mathcal{H}$ is a tight frame if there exists a number A > 0 such that

$\displaystyle\begin{array}{rcl} \sum _{k=1}^{\infty }\vert \langle f,f_{ k}\rangle \vert ^{2} = A\,\vert \vert f\vert \vert ^{2},\quad \forall f \in \mathcal{H}.& & {}\\ \end{array}$

The number A is called the frame bound.

Since a frame $\{f_{k}\}_{k=1}^{\infty }$ is a Bessel sequence, the operator

$\displaystyle\begin{array}{rcl} T: \ell^{2}(\mathbb{N}) \rightarrow \mathcal{H},\ T\{c_{ k}\}_{k=1}^{\infty } =\sum _{ k=1}^{\infty }c_{ k}f_{k}& &{}\end{array}$

(8)

is bounded by Theorem 1. Composing T and T ^∗, we obtain the frame operator

$\displaystyle\begin{array}{rcl} S: \mathcal{H}\rightarrow \mathcal{H},\ \ Sf = TT^{{\ast}}f =\sum _{ k=1}^{\infty }\langle f,f_{ k}\rangle f_{k}.& &{}\end{array}$

(9)

The frame decomposition, stated in (10) below, is the most important frame result. It shows that if $\{f_{k}\}_{k=1}^{\infty }$ is a frame for $\mathcal{H}$ , then every element in $\mathcal{H}$ has a representation as an infinite linear combination of the frame elements. Thus, it is natural to view a frame as a “generalized basis.”

Theorem 3.

Let $\{f_{k}\}_{k=1}^{\infty }$ be a frame with frame operator S. Then

$\displaystyle\begin{array}{rcl} f =\sum _{ k=1}^{\infty }\langle f,S^{-1}f_{ k}\rangle f_{k},\quad \forall f \in \mathcal{H},& &{}\end{array}$

(10)

and

$\displaystyle\begin{array}{rcl} f =\sum _{ k=1}^{\infty }\langle f,f_{ k}\rangle S^{-1}f_{ k},\quad \forall f \in \mathcal{H}.& &{}\end{array}$

(11)

Both series converge unconditionally for all $f \in \mathcal{H}$ .

Theorem 3 shows that all information about a given vector $f \in \mathcal{H}$ is contained in the sequence $\{\langle f,S^{-1}f_{k}\rangle \}_{k=1}^{\infty }$ . The numbers $\langle f,S^{-1}f_{k}\rangle$ are called frame coefficients. The sequence $\{S^{-1}f_{k}\}_{k=1}^{\infty }$ is also a frame; it is called the canonical dual frame of $\{f_{k}\}_{k=1}^{\infty }$ .

Theorem 3 also immediately reveals one of the main difficulties in frame theory. In fact, in order for the expansions (10) and (11) to be applicable in practice, we need to be able to find the operator S ⁻¹ or at least to calculate its action on all $f_{k},\ k \in \mathbb{N}$ . In general, this is a major problem. One way of circumventing the problem is to consider only tight frames:

Corollary 2.

If $\{f_{k}\}_{k=1}^{\infty }$ is a tight frame with frame bound A, then the canonical dual frame is $\{A^{-1}f_{k}\}_{k=1}^{\infty }$ , and

$\displaystyle\begin{array}{rcl} f = \frac{1} {A}\sum _{k=1}^{\infty }\langle f,f_{ k}\rangle f_{k},\quad \forall f \in \mathcal{H}.& &{}\end{array}$

(12)

By a suitable scaling of the vectors $\{f_{k}\}_{k=1}^{\infty }$ in a tight frame, we can always obtain that A = 1; in that case, (12) has exactly the same form as the representation via an orthonormal basis; see (6). Thus, such frames can be used without any additional computational effort compared with the use of orthonormal bases; however, the family does not have to be linearly independent now.

Tight frames have other advantages. For the design of frames with prescribed properties, it is essential to control the behavior of the canonical dual frame, but the complicated structure of the frame operator and its inverse makes this difficult. If, e.g., we consider a frame $\{f_{k}\}_{k=1}^{\infty }$ for $L^{2}(\mathbb{R})$ consisting of functions with exponential decay, nothing guarantees that the functions in the canonical dual frame $\{S^{-1}f_{k}\}_{k=1}^{\infty }$ have exponential decay. However, for tight frames, questions of this type trivially have satisfactory answers, because the dual frame equals the original one. Also, for a tight frame, the canonical dual frame automatically has the same structure as the frame itself: if the frame has Gabor structure (to be described in Sect. 4), the same is the case for the canonical dual frame.

There is another way to avoid the problem of inverting the frame operator S. A frame that is not a basis is said to be overcomplete; in the literature, the term redundant frame is also used. For frames $\{f_{k}\}_{k=1}^{\infty }$ that are not bases, one can replace the canonical dual $\{S^{-1}f_{k}\}_{k=1}^{\infty }$ by other frames:

Theorem 4.

Assume that $\{f_{k}\}_{k=1}^{\infty }$ is an overcomplete frame. Then there exist frames $\left \{g_{k}\right \}_{k=1}^{\infty }\neq \{S^{-1}f_{k}\}_{k=1}^{\infty }$ for which

$\displaystyle\begin{array}{rcl} f =\sum _{ k=1}^{\infty }\langle f,g_{ k}\rangle f_{k},\quad \forall f \in \mathcal{H}.& &{}\end{array}$

(13)

A frame $\left \{g_{k}\right \}_{k=1}^{\infty }$ satisfying (13) is called a dual frame of $\{f_{k}\}_{k=1}^{\infty }$ . The hope is to find dual frames that are easier to calculate or have better properties than the canonical dual. Examples of this type can be found in [17].

3 Operators

In this section, we introduce several operators that play key roles in Gabor analysis. In particular, we will need the basic properties of the localized Fourier transform, which is called the STFT (short-time Fourier transform). It is natural for us to start with the Fourier transform, which is defined as an integral transform on the space of all (Lebesgue) integrable functions, denoted by ${\mathrm{L}}^{1}(\mathbb{R}^{d})$ .

The Fourier Transform

Definition 5.

For $f \in {\mathrm{L}}^{1}(\mathbb{R}^{d})$ , the Fourier transform is defined as

$\displaystyle{ \hat{f}(\omega ):= (\mathcal{F}f)(\omega ):=\int _{\mathbb{R}^{d}}f(x)\,e^{-2\pi {\mathrm{i}}x\cdot \omega }\,dx, }$

(14)

where $x\cdot \omega =\sum _{ k=1}^{d}x_{k}\omega _{k}$ is the usual scalar product of vectors in $\mathbb{R}^{d}$ .

Lemma 3 (Riemann–Lebesgue).

If $f \in {\mathrm{L}}^{1}(\mathbb{R}^{d})$ , then $\hat{f}$ is uniformly continuous and $\lim _{\vert \omega \vert \rightarrow \infty }\vert \hat{f}(\omega )\vert = 0$ .

The Fourier transform yields a continuous bijection from the Schwartz space $\mathcal{S}(\mathbb{R}^{d})$ to $\mathcal{S}(\mathbb{R}^{d})$ . This follows from the fact that it turns analytic operations (differentiation) into multiplication with polynomials and vice versa:

$\displaystyle{ \mathcal{F}(D^{\alpha }f) = (2\pi {\mathrm{i}})^{\vert \alpha \vert }X^{\alpha }(\mathcal{F}f) }$

(15)

and

$\displaystyle{ D^{\alpha }(\mathcal{F}f) = (-2\pi {\mathrm{i}})^{\vert \alpha \vert }\mathcal{F}(X^{\alpha }f), }$

(16)

with a multi-index $\alpha = (\alpha _{1},\ldots,\alpha _{d}) \in \mathbb{N}_{0}^{d}$ , $\vert \alpha \vert:=\sum _{ i=1}^{d}\alpha _{i}$ , D ^αas differential operator

$\displaystyle{D^{\alpha }f(x):= \frac{\partial ^{\alpha _{1}}\cdots \partial ^{\alpha _{d}}} {\partial x_{1}^{\alpha _{1}}\cdots \partial x_{d}^{\alpha _{d}}}f(x_{1},\ldots,x_{d})}$

and X ^αas multiplication operator $(X^{\alpha }f)(x):= x_{1}^{\alpha _{1}}\cdots x_{d}^{\alpha _{d}}f(x_{1},\ldots,x_{d})$ . It follows from the definition that $\mathcal{S}(\mathbb{R}^{d})$ is invariant under these operations, i.e.,

$\displaystyle{X^{\alpha }f \in \mathcal{S}(\mathbb{R}^{d})\quad {\mathrm{and}}\quad D^{\alpha }f \in \mathcal{S}(\mathbb{R}^{d})\quad \forall \alpha \in \mathbb{N}_{ 0}^{d}\quad \forall f \in \mathcal{S}(\mathbb{R}^{d}).}$

Using the reflection operator $(\mathcal{I}f)(x):= f(-x)$ , one can show that $\mathcal{F}^{2} = \mathcal{I}$ and so $\mathcal{F}^{4} =\mathop{ {\mathrm{Id}}}\nolimits _{\mathcal{S}(\mathbb{R}^{d})}$ . This yields

$\displaystyle{ \mathcal{F}^{-1} = \mathcal{I}\mathcal{F} }$

(17)

and we can give an inversion formula explicitly:

Theorem 5 (Inversion Formula).

The Fourier transform is a bijection from $\mathcal{S}(\mathbb{R}^{d})$ to $\mathcal{S}(\mathbb{R}^{d})$ , and the inverse operator is given by

$\displaystyle{ (\mathcal{F}^{-1}f)(x) =\int _{ \mathbb{R}^{d}}f(\omega )\,e^{2\pi {\mathrm{i}}x\cdot \omega }\,d\omega \quad \forall x \in \mathbb{R}^{d}. }$

(18)

Furthermore,

$\displaystyle{\langle \mathcal{F}f,\mathcal{F}g\rangle _{{\mathrm{L}}^{2}} =\langle f,g\rangle _{{\mathrm{L}}^{2}}\quad \forall f,g \in \mathcal{S}(\mathbb{R}^{d}).}$

We can extend the Fourier transform to an isometric operator on all of ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ . We will use the same symbol $\mathcal{F}$ although the Fourier transform on ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ is not defined by a Lebesgue integral (14) anymore if $f \in {\mathrm{L}}^{2}\setminus {\mathrm{L}}^{1}(\mathbb{R}^{d})$ , but rather by means of summability methods. Moreover, $\mathcal{F}f$ should be viewed as an equivalence class of functions, rather than a pointwise given function.

Theorem 6 (Plancherel).

If $f \in {\mathrm{L}}^{1} \cap {\mathrm{L}}^{2}(\mathbb{R}^{d})$ , then

$\displaystyle{ \Vert f\Vert _{{\mathrm{L}}^{2}} = \Vert \mathcal{F}f\Vert _{{\mathrm{L}}^{2}}. }$

(19)

As a consequence, $\mathcal{F}$ extends in a unique way to a unitary operator on ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ that satisfies Parseval’s formula

$\displaystyle{ \langle f,g\rangle _{{\mathrm{L}}^{2}} =\langle \mathcal{F}f,\mathcal{F}g\rangle _{{\mathrm{L}}^{2}}\quad \forall f,g \in {\mathrm{L}}^{2}(\mathbb{R}^{d}). }$

(20)

In signal analysis, the isometry of the Fourier transform has the interpretation that it preserves the energy of a signal. For more details on the role of the Schwartz class for the Fourier transform, see [78, V].

Translation and Modulation

Definition 6.

For $x,\omega \in \mathbb{R}^{d}$ , we define the translation operator T _xby

$\displaystyle{ (T_{x}f)(t):= f(t - x) }$

(21)

and the modulation operator M _ωby

$\displaystyle{ (M_{\omega }f)(t):= e^{2\pi {\mathrm{i}}\omega \cdot t}f(t). }$

(22)

One has T _x⁻¹ = T _−xand M _ω⁻¹ = M _−ω. The operator T _xis called a time shift and M _ωa frequency shift. Operators of the form T _xM _ωor M _ωT _xare called time–frequency shifts (TF-shifts) . They satisfy the commutation relations

$\displaystyle{ T_{x}M_{\omega } = e^{-2\pi {\mathrm{i}}x\cdot \omega }M_{\omega }T_{ x}. }$

(23)

Time–frequency shifts are isometries on L^pfor all 1 ≤ p ≤ ∞, i.e.,

$\displaystyle{\Vert T_{x}M_{\omega }f\Vert _{{\mathrm{L}}^{p}} = \Vert f\Vert _{{\mathrm{L}}^{p}}.}$

The interplay of TF-shifts with the Fourier transform is as follows:

$\displaystyle{ \widehat{T_{x}f} = M_{-x}\hat{f}\quad {\mathrm{or}}\quad \mathcal{F}T_{x} = M_{-x}\mathcal{F} }$

(24)

and

$\displaystyle{ \widehat{M_{\omega }f} = T_{\omega }\hat{f}\quad {\mathrm{or}}\quad \mathcal{F}M_{\omega } = T_{\omega }\mathcal{F}. }$

(25)

Equation (25) explains why modulations are also called frequency shifts : modulations become translations on the Fourier transform side. Altogether, we have

$\displaystyle{\widehat{T_{x}M_{\omega }f} = M_{-x}T_{\omega }\hat{f} = e^{-2\pi {\mathrm{i}}x\cdot \omega }T_{\omega }M_{ -x}\hat{f}.}$

Convolution, Involution, and Reflection

Definition 7.

The convolution of two functions $f,g \in {\mathrm{L}}^{1}(\mathbb{R}^{d})$ is the function f ∗ g defined by

$\displaystyle{ (f {\ast} g)(x):=\int _{\mathbb{R}^{d}}f(y)\,g(x - y)\,dy. }$

(26)

It satisfies

$\displaystyle{\Vert f {\ast} g\Vert _{{\mathrm{L}}^{1}} \leq \Vert f\Vert _{{\mathrm{L}}^{1}}\Vert g\Vert _{{\mathrm{L}}^{1}}\quad \mbox{ and}\quad \widehat{f {\ast} g} =\hat{ f} \cdot \hat{ g}.}$

One may view f ∗ g as f being “smeared” by g and vice versa. One can thus smoothen a function by convolving it with a narrow bump function.

Definition 8.

The involution of a function is defined by

$\displaystyle{ f^{{\ast}}(x):= \overline{f(-x)}. }$

(27)

It follows that

$\displaystyle{\widehat{f^{{\ast}}} =\bar{\hat{ f}}\quad {\mathrm{and}}\quad \widehat{\mathcal{I}f} = \mathcal{I}\hat{f}.}$

Finally, let us mention that convolution corresponds to pointwise multiplication (and conversely), i.e., the so-called convolution theorem is valid:

$\displaystyle{ \widehat{g {\ast} f} =\hat{ f} \cdot \hat{ g}. }$

(28)

The Short-Time Fourier Transform

The Fourier transform as described in section “The Fourier Transform” provides only global frequency information of a signal f. This is useful for signals that do not vary during the time, e.g., for analyzing the spectrum of a violin tone. However, dynamic signals such as a melody have to be split into short time intervals over which it can be well approximated by a linear combination of few pure frequencies. Since sharp cutoffs would introduce discontinuities in the localized signal and therefore leaking of the frequency spectrum, a smooth window function g is usually used in the definition of the short-time Fourier transform.

In image processing, one has plane waves instead of pure frequencies; thus, the global Fourier transform is only well suited to stripe-like patterns. Again, a localized version of the Fourier transform allows to determine dominant plane waves locally, and one can reconstruct an image from such a redundant transform. Gabor analysis deals with the question of how one can reconstruct an image from only somewhat overlapping local pieces, which are stored only in the form of a sampled (local) 2D Fourier transform (Fig. 1).

Fig. 1

Two signals and their (short-time) Fourier transforms. (a) Signal 1: Concurrent frequencies. (b) Signal 2: Consecutive frequencies. (c) Fourier power spectrum 1. (d) Fourier power spectrum 2. (e) STFT 1 with wide window. (f) STFT 2 with wide window. (g) STFT 1 with narrow window. (h) STFT 2 with narrow window

Definition 9.

Fix a window function $g \in {\mathrm{L}}^{2}(\mathbb{R}^{d})\setminus \{0\}$ . The short-time Fourier transform (STFT), also called ( continuous ) Gabor transform of a function $f \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ with respect to g, is defined as

$\displaystyle{ (\mathcal{V}_{g}f)(x,\omega ):=\int _{\mathbb{R}^{d}}f(t)\,\overline{g(t - x)}\,e^{-2\pi {\mathrm{i}}t\cdot \omega }\,dt\quad {\mathrm{for}}x,\omega \in \mathbb{R}^{d}. }$

(29)

For $f,g \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ , the STFT $\mathcal{V}_{g}f$ is uniformly continuous (by Riemann–Lebesgue) on $\mathbb{R}^{2d}$ and can be written as

$\displaystyle\begin{array}{rcl} (\mathcal{V}_{g}f)(x,\omega )& =& \widehat{f \cdot T_{x}\bar{g}}(\omega ){}\end{array}$

(30)

$\displaystyle\begin{array}{rcl} & =& \langle f,M_{\omega }T_{x}g\rangle _{{\mathrm{L}}^{2}}{}\end{array}$

(31)

$\displaystyle\begin{array}{rcl} & =& e^{-2\pi {\mathrm{i}}x\cdot \omega }(f {\ast} M_{\omega }g^{{\ast}})(x).{}\end{array}$

(32)

The STFT as a function in x and ω seems to provide the possibility to obtain information about the occurrence of arbitrary frequencies ω at arbitrary locations x as desired. However, the uncertainty principle (cf. [51]) implies that there is a limitation concerning the joint resolution. In fact, the STFT has limitations in its time–frequency resolution capability: Low frequencies can hardly be located with narrow windows, and similarly, short pulses remain invisible for wide windows. The choice of the analyzing window is therefore crucial.

Just like the Fourier transform, the STFT is a kind of time–frequency representation of a signal. This again raises the question of how to reconstruct the signal from its time–frequency representation. To approach this, we need the orthogonality relations of the STFT, which corresponds to Parseval’s formula (20) for the Fourier transform:

Theorem 7 (Orthogonality relations for STFT).

Let $f_{1},f_{2},g_{1},g_{2} \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ . Then $\mathcal{V}_{g_{j}}f_{j} \in {\mathrm{L}}^{2}(\mathbb{R}^{2d})$ for j ∈{ 1,2}, and

$\displaystyle{\langle \mathcal{V}_{g_{1}}f_{1},\mathcal{V}_{g_{2}}f_{2}\rangle _{{\mathrm{L}}^{2}(\mathbb{R}^{2d})} =\langle f_{1},f_{2}\rangle _{{\mathrm{L}}^{2}}\overline{\langle g_{1},g_{2}\rangle _{{\mathrm{L}}^{2}}}.}$

Corollary 3.

If $f,g \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ , then

$\displaystyle{\Vert \mathcal{V}_{g}f\Vert _{{\mathrm{L}}^{2}(\mathbb{R}^{2d})} = \Vert f\Vert _{{\mathrm{L}}^{2}}\Vert g\Vert _{{\mathrm{L}}^{2}}.}$

In the case of $\Vert g\Vert _{{\mathrm{L}}^{2}} = 1$ , we have

$\displaystyle{ \Vert f\Vert _{{\mathrm{L}}^{2}} = \Vert \mathcal{V}_{g}f\Vert _{{\mathrm{L}}^{2}(\mathbb{R}^{2d})}\quad \forall f \in {\mathrm{L}}^{2}(\mathbb{R}^{d}), }$

(33)

i.e., the STFT as an isometry from ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ into ${\mathrm{L}}^{2}(\mathbb{R}^{2d})$ .

Formula (33) shows that the STFT preserves the energy of a signal; it corresponds to (19) which shows the same property for the Fourier transform. Therefore, f is completely determined by $\mathcal{V}_{g}f$ , and the inversion is given by a vector-valued integral (for good functions valid in the pointwise sense):

Corollary 4 (Inversion formula for the STFT).

Let $g,\gamma \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ and $\langle g,\gamma \rangle \neq 0$ . Then

$\displaystyle{ f(x) = \frac{1} {\langle \gamma,g\rangle _{{\mathrm{L}}^{2}}} \iint _{\mathbb{R}^{2d}}\mathcal{V}_{g}f(x,\omega )\;M_{\omega }T_{x}\gamma (x)\;d\omega \,dx\quad \forall f \in {\mathrm{L}}^{2}(\mathbb{R}^{d}). }$

(34)

Obviously, γ = g is a natural choice here. The time–frequency analysis of signals is usually done by three subsequent steps:

(i)

Analysis: Using the STFT, the signal is transformed into a joint time–frequency representation.

(ii)

Processing: The obtained signal representation is then manipulated in a certain way, e.g., by restriction to a part of the signal yielding the relevant information.

(iii)

Synthesis: The inverse STFT is applied to the processed representation, thus creating a new signal.

A function is completely represented by its STFT but in a highly redundant way. To minimize the influence of the uncertainty principle, the analyzing window g should be chosen such that g and its Fourier transform $\hat{g}$ both decay rapidly, e.g., as Schwartz functions. A computational implementation can only be obtained by a discretization of both the functions and the STFT. Therefore, only sampled versions of the STFT are possible, and only certain locations and frequencies are used for analyzing a given signal. The challenge is to find the appropriate lattice constants in time and frequency and to obtain good time–frequency resolution.

4 Gabor Frames in L²( $\mathbb{R}d$ )

By formula (31), the STFT analyzes a function $f\,\in \,{\mathrm{L}}^{2}(\mathbb{R}^{d})$ into coefficients $\langle f,M_{\omega }T_{x}g\rangle _{{\mathrm{L}}^{2}}$ using modulations and translations of a single window function $g\,\in \,{\mathrm{L}}^{2}(\mathbb{R}^{d})\setminus \{0\}$ . One problem we noticed was that these TF-shifts are infinitesimal and overlap largely, making the STFT a highly redundant time–frequency representation. An idea to overcome this is to restrict to discrete choices of time positions x and frequencies ω such that this redundancy is decreased while leaving enough information in the coefficients about the time–frequency behavior of f. This is the very essence of Gabor analysis: It is sought to expand functions in ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ into an absolutely convergent series of modulations and translations of a window function g. Therefore, it is interesting to find necessary and sufficient conditions on g and a discrete set $\Lambda \subseteq \mathbb{R}^{d} \times \mathbb{R}^{d}$ such that

$\displaystyle{\{g_{x,\omega }\}_{(x,\omega )\in \Lambda }:=\{ M_{\omega }T_{x}g\}_{(x,\omega )\in \Lambda }}$

forms a frame for ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ . The question arises how the sampling set $\Lambda$ should be structured. It turns out to be very convenient to have this set closed under the addition operation, urging $\Lambda$ to be a subgroup of the time–frequency plane, i.e., $\Lambda \unlhd \mathbb{R}^{d} \times \mathbb{R}^{d}$ . Dennis Gabor (actually Dénes Gábor) suggested in his Theory of Communication [45], 1946, to use fixed step sizes α, β > 0 for time and frequency and use the set $\{\alpha k\}_{k\in \mathbb{Z}^{d}}$ for the time positions and $\{\beta n\}_{n\in \mathbb{Z}^{d}}$ for the frequencies, yielding the functions

$\displaystyle{g_{k,n}(x):= M_{\beta n}T_{\alpha k}g(x) = e^{2\pi {\mathrm{i}}\beta n\cdot x}g(x -\alpha k)}$

as analyzing elements. This is the approach that is usually presented in the literature, although there is also a more general group theoretical setting possible where $\Lambda$ is an arbitrary (discrete) subgroup. This subgroup is also called a time–frequency lattice , although it does not have to be of such a “rectangular” shape in general.

Definition 10.

A lattice $\Lambda \subseteq \mathbb{R}^{d}$ is a (discrete) subgroup of $\mathbb{R}^{d}$ of the form $\Lambda = \mathfrak{A}\mathbb{Z}^{d}$ , where $\mathfrak{A}$ is an invertible $d \times d$ -matrix over $\mathbb{R}$ . Lattices in $\mathbb{R}^{2d}$ can be described as

$\displaystyle{\Lambda = \left \{(x,y) \in \mathbb{R}^{2d}\,\big\vert \,(x,y) = (Ak + B\ell,Ck + D\ell),\,(k,\ell) \in \mathbb{Z}^{2d}\right \}}$

with $A,B,C,D \in \mathbb{C}^{d\times d}$ and

$\displaystyle{\mathfrak{A} = \left (\begin{array}{*{10}c} A&B\\ C &D \end{array} \right ).}$

A lattice $\Lambda =\alpha \mathbb{Z}^{d} \times \beta \mathbb{Z}^{d}\unlhd \mathbb{R}^{2d}$ for α, β > 0 is called a separable lattice, a product lattice, or a grid.

In the following, our lattice will be of the separable type for fixed lattice parameters α, β > 0.

Definition 11.

For a nonzero window function $g \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ and lattice parameters α, β > 0, the set of time–frequency shifts

$\displaystyle{\mathcal{G}(g,\alpha,\beta ):=\{ M_{\beta n}T_{\alpha k}g\}_{k,n\in \mathbb{Z}^{d}}}$

is called a Gabor system. If $\mathcal{G}(g,\alpha,\beta )$ is a frame for ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ , it is called a Gabor frame or Weyl–Heisenberg frame . The associated frame operator is the Gabor frame operator and takes the form

$\begin{array}{cc} Sf & = \mathop {\sum \sum } \limits_{k,n\in \mathbb{Z}^{d}}\langle f,M_{\beta n}T_{\alpha k}g\rangle _{{\mathrm{L}}^{2}}\,M_{\beta n}T_{\alpha k}g \\ & = \mathop{\sum \sum } \limits_{k,n\in \mathbb{Z}^{d}}\mathcal{V}_{g}f(\alpha k,\beta n)\,M_{\beta n}T_{\alpha k}g{}\end{array}$

(35)

for all $f \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ . The window g is also called the Gabor atom .

According to the general frame theory, $\{S^{-1}g_{k,n}\}_{k,n\in \mathbb{Z}^{d}}$ yields the canonical dual frame. So we would have to compute S ⁻¹ and apply it to all modulated and translated versions of the Gabor atom g. A direct computation shows that for arbitrary fixed indices $\ell,m \in \mathbb{Z}^{d}$ ,

$\displaystyle{ SM_{\beta m}T_{\alpha \ell} = M_{\beta m}T_{\alpha \ell}S. }$

(36)

Consequently, also S ⁻¹ commutes with time–frequency shifts, which gives the following fundamental result for (regular) Gabor analysis:

Theorem 8.

If the given Gabor system $\mathcal{G}(g,\alpha,\beta )$ is a frame for ${\mathrm{L}}^{2}(\mathbb{R}^{d})$ , then all of the following hold:

(a)

There exists a dual window $\gamma \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ such that the dual frame is given by the Gabor frame $\mathcal{G}(\gamma,\alpha,\beta )$ .

(b)

Every $f \in {\mathrm{L}}^{2}(\mathbb{R}^{d})$ has an expansion of the form

$\displaystyle\begin{array}{rcl} f& =& \mathop{\sum \sum }\limits_{k,n\in \mathbb{Z}^{d}}\langle f,M_{\beta n}T_{\alpha k}g\rangle _{{\mathrm{L}}^{2}}\,M_{\beta n}T_{\alpha k}\gamma \\ & =& \mathop{\sum \sum }\limits_{k,n\in \mathbb{Z}^{d}}\langle f,M_{\beta n}T_{\alpha k}\gamma \rangle _{{\mathrm{L}}^{2}}\,M_{\beta n}T_{\alpha k}g {}\end{array}$