False Discovery Rate in Signal Space for Transformation-Invariant Thresholding of Statistical Maps

where X is an m-by-n ( is a column vector of length n, and $\varepsilon _{i}$ s ( $i=1,\cdots ,m$ ) are identical and independently distributed (i.i.d.) variables following a Gaussian distribution $N(0,\sigma ^{2})$ . In research applications, Y may be an fMRI time serials at a voxel location and X may be the design matrix of functional tasks, or in a tensor-based morphometry (TBM) study Y may be subjects’ Jacobian determinant at a voxel location and X may be a factor matrix of age, gender or disease states, etc. Please note:

We assume that X is full column-rank so that the inverse of $X^{\intercal }X$ exists.
We assume that .

The maximum-likelihood and unbiased estimate of $\beta$ is

$\begin{aligned} \hat{\beta }\equiv & {} \left( X^{\intercal }X\right) ^{-1}X^{\intercal }Y=\beta +\left( X^{\intercal }X\right) ^{-1}X^{\intercal }\varepsilon . \end{aligned}$

The residuals and the unbiased estimate of $\sigma ^{2}$ are

$\begin{aligned} {\left\{ \begin{array}{ll} \hat{\varepsilon } &{} \equiv Y-X\hat{\beta }=\left[ I-X\left( X^{\intercal }X\right) ^{-1}X^{\intercal }\right] \varepsilon ,\\ \hat{\sigma }^{2} &{} \equiv \frac{\hat{\varepsilon }^{\intercal }\hat{\varepsilon }}{df},\text { where }df=m-n. \end{array}\right. } \end{aligned}$

Let us focus on the probabilistic and geometric properties of $\hat{\varepsilon }$ .

It follows a $\sigma ^{2}$ -variance isotropic multi-variate Gaussian distribution embedded in the null space of X’s columns. More specifically, there exists an m-by-df matrix Z which satisfies $X^{\intercal }Z=0$ and $Z^{\intercal }Z=I$ , such that $\tilde{\varepsilon }\equiv Z^{\intercal }\hat{\varepsilon }\sim N(0,\sigma ^{2}I_{df\times df})$ and $\hat{\varepsilon }=Z\tilde{\varepsilon }$ .

It is independent of $\hat{\beta }$ .

Its normalized vector $\hat{u}=\hat{\varepsilon }/|\hat{\varepsilon }|$ uniformly distributes on a unit hyper-sphere in the null space of X’s columns, independent of $\sigma ^{2}$ and $\hat{\sigma }^{2}$ .

Property 1 is the most insightful and it easily derives properties 2 and 3. We outline its proof as follows:

Because X is a full column-rank m-by-n matrix, its column null space has $$df=m-n$$

dimensions.

Define Z as an m-by-df matrix whose columns are a set orthonormal bases of the null space of X’s columns. By definition, Z satisfies $X{}^{\intercal }Z=0$ and $Z{}^{\intercal }Z=I$ .

Define $\tilde{\varepsilon }\equiv Z^{\intercal }\hat{\varepsilon }$ , then this df-element random vector follows a Gaussian distribution $N(0,\sigma ^{2}I_{df\times df})$ , because:

$\tilde{\varepsilon }\equiv Z^{\intercal }\hat{\varepsilon }=Z^{\intercal }\left[ I-X\left( X^{\intercal }X\right) ^{-1}X^{\intercal }\right] \varepsilon =Z^{\intercal }\varepsilon$ , as a linear combination of $\varepsilon$ , follows a multi-variate Gaussian distribution;
The expected value of $\tilde{\varepsilon }$ is $E\tilde{\varepsilon }=Z^{\intercal }E\varepsilon =0$ ;
The variance of $\tilde{\varepsilon }$ is $E\tilde{\varepsilon }\tilde{\varepsilon }^{\intercal }=E\left[ Z^{\intercal }\varepsilon \varepsilon ^{\intercal }Z\right] =Z^{\intercal }E\left[ \varepsilon \varepsilon ^{\intercal }\right] Z=I\sigma ^{2}$ .

Z also satisfies $ZZ^{\intercal }=I-X\left( X^{\intercal }X\right) ^{-1}X^{\intercal }$ because:

Both $ZZ^{\intercal }\left[ \begin{array}{cc} X&Z\end{array}\right]$ and $\left[ I-X\left( X^{\intercal }X\right) ^{-1}X^{\intercal }\right] \left[ \begin{array}{cc} X&Z\end{array}\right]$ equal $\left[ \begin{array}{cc} 0&Z\end{array}\right]$ ;
$\left[ \begin{array}{cc} X&Z\end{array}\right]$ is a full-rank m-by-m matrix so its inverse exists;
Both $ZZ^{\intercal }$ and $\left[ I-X\left( X^{\intercal }X\right) ^{-1}X^{\intercal }\right]$ equal $\left[ \begin{array}{cc} 0&Z\end{array}\right] \left[ \begin{array}{cc} X&Z\end{array}\right] ^{-1}$ .

$\hat{\varepsilon }$ equals $Z\tilde{\varepsilon }$ because

$\begin{aligned} \hat{\varepsilon }&=\left[ I-X\left( X^{\intercal }X\right) ^{-1}X^{\intercal }\right] \varepsilon =ZZ^{\intercal }\varepsilon =Z\tilde{\varepsilon }. \end{aligned}$

2.2 Weighted FDR in Volume

Let $R_{pos}$ denote the detected region, $R_{tru}$ the underlying truth, and $\left| \bullet \right|$ the volume of a region. The volume-based FDR is defined as follows

$\begin{aligned} FDR\equiv E\left[ \frac{\left| R_{pos}\setminus R_{tru}\right| }{\left| R_{pos}\right| }\right] \text { where }\frac{\left| R_{pos}\setminus R_{tru}\right| }{\left| R_{pos}\right| }\equiv 0\text { if }\left| R_{pos}\right| =0. \end{aligned}$

(1)

Genovese, Lazar, and Nichols [9] defined the volumetric measure in the image space, and consequently it can be translated as the number of voxels in voxel-based analysis. Benjamini and Hochberg’s step-up procedure [2] was applied to control the FDR. The step-up procedure finds

$\begin{aligned} k^{*}=\max \{k|\frac{p_{(k)}N}{k}\leqslant q\}, \end{aligned}$

where q is the user specified FDR level, $p_{(k)}$ is the k-th smallest voxel p-value, and N is the number of voxels. This step-up procedure is able to handle positive dependence among tests, as Benjamini and Yekutieli discussed in [4]. For more general dependence among tests, please refer to [4].

Benjamini and Hochberg (1997) [3] upgraded it to a weighted version whose FDR and control procedure are

$\begin{aligned} FDR\equiv E\left[ \frac{\sum _{i\in R_{pos\setminus R_{tru}}}w_{i}}{\sum _{i\in R_{pos}}w_{i}}\right] ,\text { }k^{*}=\max \{k|\frac{p_{(k)}\sum _{i=1}^{N}w_{(i)}}{\sum _{i=1}^{k}w_{(i)}}\leqslant q\}, \end{aligned}$