Projektionsmatrix (Statistik)

In der Statistik ist eine Projektionsmatrix eine symmetrische und idempotente Matrix.^[1] Weiterhin sind alle Eigenwerte einer Projektionsmatrix entweder 0 oder 1 und Rang und Spur einer Projektionsmatrix sind identisch.^[2] Die einzige nichtsinguläre Projektionsmatrix ist die Einheitsmatrix. Alle anderen Projektionsmatrizen sind singulär. Die wichtigsten Projektionsmatrizen in der Statistik stellen die Prädiktionsmatrix ${\boldsymbol {P}}$ und die residuenerzeugende Matrix bzw. Residualmatrix ${\boldsymbol {Q}}={\boldsymbol {I}}-{\boldsymbol {P}}$ dar. Sie sind ein Beispiel für eine Orthogonalprojektion im Sinne der linearen Algebra, wo jeder Vektor $y$ eines Vektorraumes mit Skalarprodukt bei gegebener Projektionsmatrix ${\boldsymbol {P}}$ in eindeutiger Weise zerlegt werden kann gemäß $y={\boldsymbol {P}}y+({\boldsymbol {I}}-{\boldsymbol {P}})y$ . Eine weitere in der Statistik wichtige Projektionsmatrix ist die zentrierende Matrix.

Ausgangslage

Als Ausgangslage betrachten wir ein typisches multiples lineares Regressionsmodell mit gegebenen Daten $\{y_{i},x_{ik}\}_{i=1,\dots ,n,k=1,\dots ,K}$ für $n$ statistische Einheiten und $K$ Regressoren. Der Zusammenhang zwischen der abhängigen Variablen und den unabhängigen Variablen kann wie folgt dargestellt werden

y_{i}=\beta _{0}+x_{i1}\beta _{1}+x_{i2}\beta _{2}+\ldots +x_{iK}\beta _{K}+\varepsilon _{i}=\mathbf {x} _{i}^{\top }{\boldsymbol {\beta }}+\varepsilon _{i},\quad i=1,2,\dotsc ,n

.

In Matrixnotation auch

{\begin{pmatrix}y_{1}\\y_{2}\\\vdots \\y_{n}\end{pmatrix}}_{(n\times 1)}\quad =\quad {\begin{pmatrix}1&x_{11}&x_{12}&\cdots &x_{1K}\\1&x_{21}&x_{22}&\cdots &x_{2K}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&x_{n1}&x_{n2}&\cdots &x_{nK}\end{pmatrix}}_{(n\times p)}\quad \cdot \quad {\begin{pmatrix}\beta _{0}\\\beta _{1}\\\vdots \\\beta _{K}\end{pmatrix}}_{(p\times 1)}\quad +\quad {\begin{pmatrix}\varepsilon _{1}\\\varepsilon _{2}\\\vdots \\\varepsilon _{n}\end{pmatrix}}_{(n\times 1)}

mit $p=K+1$ . In kompakter Schreibweise

\mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }}

.

Hier stellt ${\boldsymbol {\beta }}$ einen Vektor von unbekannten Parametern dar (bekannt als Regressionskoeffizienten), die mithilfe der Daten geschätzt werden müssen. Des Weiteren wird angenommen, dass die Fehlerterme im Mittel null sind: $\mathbb {E} [{\boldsymbol {\boldsymbol {\varepsilon }}}]=\mathbf {0}$ , was bedeutet, dass wir davon ausgehen können, dass unser Modell im Mittel korrekt ist.

Prädiktionsmatrix

Eine der wichtigsten Projektionsmatrizen in der Statistik ist die Prädiktionsmatrix. Die Prädiktionsmatrix ist wie folgt definiert

{\boldsymbol {P}}\equiv \mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\quad

mit

\quad {\boldsymbol {P}}\in \mathbb {R} ^{n\times n}

,

wobei $\mathbf {X}$ die Datenmatrix darstellt. Die Diagonalelemente der Prädiktionsmatrix ${\boldsymbol {P}}$ werden $p_{ii}$ genannt und können als Hebelwerte interpretiert werden.

Residuenerzeugende Matrix

Die residuenerzeugende Matrix^[3] (englisch residual-maker matrix), auch Residuum-erzeugende Matrix, Residualmatrix ist wie folgt definiert

{\boldsymbol {Q}}=\left(\mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\right)=\left(\mathbf {I} -{\boldsymbol {P}}\right)

,

wobei P die Prädiktionsmatrix darstellt. Der Name residuenerzeugende Matrix ergibt sich dadurch, dass diese Projektionsmatrix multipliziert mit dem y-Vektor den Residualvektor ${\hat {\boldsymbol {\varepsilon }}}$ ergibt. Der kann durch die Prädiktionsmatrix kompakt wie folgt ausgedrückt werden

{\hat {\boldsymbol {\varepsilon }}}=\mathbf {y} -\mathbf {\hat {y}} =\mathbf {y} -{\boldsymbol {P}}\mathbf {y} =\left(\mathbf {I} -{\boldsymbol {P}}\right)\mathbf {y} ={\boldsymbol {Q}}\mathbf {y}

.

Bei linearen Modellen sind Rang und Spur einer Projektionsmatrix identisch. Für den Rang der residuenerzeugenden Matrix gilt

{\begin{aligned}\operatorname {Rang} ({\boldsymbol {Q}})&=\operatorname {Spur} ({\boldsymbol {Q}})\\&=\operatorname {Spur} (\mathbf {I} -\mathbf {P} )\\&=\sum \nolimits _{i=1}^{n}(1-p_{ii})\\&=n-\sum \nolimits _{i=1}^{n}p_{ii}\\&=n-\operatorname {Spur} ({\boldsymbol {P}})\\&=n-\operatorname {Rang} ({\boldsymbol {P}})\\&=n-p\\&=n-(K+1)\end{aligned}}

Idempotenz

Die Idempotenzeigenschaft der residuenerzeugenden Matrix kann wie folgt gezeigt werden

{\begin{aligned}{\boldsymbol {Q}}^{2}&={\boldsymbol {Q}}\cdot {\boldsymbol {Q}}\\&=\left(\mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\right)\left(\mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\right)\\&=\mathbf {I} \mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\mathbf {I} +\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\\&=\mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }-\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }+\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\\&=\mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\\&=\left(\mathbf {I} -{\boldsymbol {P}}\right)\\&={\boldsymbol {Q}}\qquad \Box \end{aligned}}

Symmetrie

Die Symmetrie der residuenerzeugenden Matrix folgt direkt aus der Symmetrie der Prädiktionsmatrix und kann wie folgt gezeigt werden

{\begin{aligned}{\boldsymbol {Q}}^{\top }&=\left(\mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\right)^{\top }\\&=\ \mathbf {I} ^{\top }-\left(\left(\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\right)\left(\mathbf {X} ^{\top }\right)\right)^{\top }\\&=\ \mathbf {I} -\left(\mathbf {X} ^{\top }\right)^{\top }\left(\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\right)^{\top }\\&=\ \mathbf {I} -\mathbf {X} \left(\left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\right)^{\top }\mathbf {X} ^{\top }\\&=\ \mathbf {I} -\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\\&=\left(\mathbf {I} -{\boldsymbol {P}}\right)\\&={\boldsymbol {Q}}\qquad \Box \end{aligned}}

Weitere Eigenschaften

Die Projektionsmatrix hat eine Fülle von nützlichen algebraischen Eigenschaften.^[4]^[5] In der Sprache der linearen Algebra ist die Projektionsmatrix eine orthogonale Projektion auf den Spaltenraum der Datenmatrix $\mathbf {X}$ . Weitere Eigenschaften der Projektionsmatrizen werden im Folgenden zusammengefasst:

$\mathbf {u} =(\mathbf {I} -\mathbf {P} )\mathbf {y} ,$ und $\mathbf {u} =\mathbf {y} -\mathbf {P} \mathbf {y} \perp \mathbf {X}$
$\mathbf {X}$ ist invariant unter $\mathbf {P}$ : $\mathbf {PX} =\mathbf {X} ,$ folglich $\left(\mathbf {I} -\mathbf {P} \right)\mathbf {X} =\mathbf {0}$ .
$\left(\mathbf {I} -\mathbf {P} \right)\mathbf {P} =\mathbf {P} \left(\mathbf {I} -\mathbf {P} \right)=\mathbf {0}$ („Anwendung der Regression auf die Residuen liefert ${\hat {y}}=0$ “)
$\mathbf {P}$ ist eindeutig für einen bestimmten Unterraum
Alle Eigenwerte einer Projektionsmatrix sind entweder 0 oder 1

Anwendungen

Schätzung des Varianzparameters nach der Kleinste-Quadrate-Schätzung

Die Residuenquadratsumme, kurz SQR (Summe der Quadrate der Restabweichungen (oder: „Residuen“) bzw. englisch sum of squared residuals, kurz SSR) ergibt in Matrixschreibweise

SQR:={\hat {\boldsymbol {\varepsilon }}}^{\top }{\hat {\boldsymbol {\varepsilon }}}=\mathbf {y} ^{\top }(\mathbf {I} -\mathbf {P} )^{\top }(\mathbf {I} -\mathbf {P} )\mathbf {y} =\mathbf {y} ^{\top }{\boldsymbol {Q}}{\boldsymbol {Q}}\mathbf {y} =\mathbf {y} ^{\top }{\boldsymbol {Q}}\mathbf {y}

.

Dies kann auch geschrieben werden als

SQR:={\hat {\boldsymbol {\varepsilon }}}^{\top }{\hat {\boldsymbol {\varepsilon }}}=\|y-{\hat {y}}\|_{2}^{2}=\sum \limits _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}

.

Eine erwartungstreue Schätzung der Varianz der Störgrößen ist das „mittlere Residuenquadrat“:

{\hat {\sigma }}^{2}={\frac {SQR}{n-p}}={\frac {\sum \nolimits _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}}{n-p}}

.

Mithilfe der residuenerzeugenden Matrix lässt sich die Varianz der Fehlerterme auch schreiben als

{\hat {\sigma }}^{2}={\frac {\mathbf {y} ^{\top }{\boldsymbol {Q}}\mathbf {y} }{n-p}}={\frac {\mathbf {y} ^{\top }{\boldsymbol {Q}}\mathbf {y} }{\operatorname {Rang} ({\boldsymbol {Q}})}}

.

Einzelnachweise

↑ Alexander Basilevsky: Applied Matrix Algebra in the Statistical Sciences. Dover, 2005, ISBN 0-486-44538-0, S. 160–176 (google.com).
↑ Wilhelm Caspary: Fehlertolerante Auswertung von Messdaten, ".124
↑ Peter Hackl: Einführung in die Ökonometrie. 2. aktualisierte Auflage, Pearson Deutschland GmbH, 2008., ISBN 978-3-86894-156-2, S. 75.
↑ P. Gans: Data Fitting in the Chemical Sciences. Wiley, 1992, ISBN 0-471-93412-7.
↑ N. R. Draper, H. Smith: Applied Regression Analysis. Wiley, 1998, ISBN 0-471-17082-8.

[1] Alexander Basilevsky: Applied Matrix Algebra in the Statistical Sciences. Dover, 2005, ISBN 0-486-44538-0, S. 160–176 (google.com).

[Caspary-2] Wilhelm Caspary: Fehlertolerante Auswertung von Messdaten, ".124

[3] Peter Hackl: Einführung in die Ökonometrie. 2. aktualisierte Auflage, Pearson Deutschland GmbH, 2008., ISBN 978-3-86894-156-2, S. 75.

[4] P. Gans: Data Fitting in the Chemical Sciences. Wiley, 1992, ISBN 0-471-93412-7.

[5] N. R. Draper, H. Smith: Applied Regression Analysis. Wiley, 1998, ISBN 0-471-17082-8.

[1]

[2]

[3]

[4]

[5]