Hello, and welcome to the Paper of the Day (Po'D): From Signal Processing to Cognition Edition. Today's interesting paper comes from 2005: L. K. Hansen, P. Ahrendt, and J. Larsen, "Towards cognitive component analysis," in Proc. Int. Interdisciplinary Conf. Adaptive Knowledge Representation Reasoning, (Espoo, Finland), pp. 148-153, June 2005.

The discipline of linear algebra, one of the great accomplishments in mathematics of the last century, offers exceptionally powerful tools for working with and understanding data. Consider that we have any real matrix \(\MX \in \MR^{N\times K}\) of \(K\) independent observations (each with zero mean) of \(N\) data types (e.g., sample values in time). It could be, for instance, \(N\) features derived from \(K\) people saying the word "hello." One of the principal results of linear algebra, singular value decomposition (SVD), provides a way to transform these "hellos" such that we can describe them as a linear combination of orthonormal (uncorrelated) sources ordered by "importance", i.e., $$\MX = \MU(\Sigma\MA)$$ where \(\Sigma \in \MR^{N\times K}\) is a positive diagonal matrix of decreasing weights, \(\MA \in \MR^{K\times K}\), and \(\MU \in \MR^{N\times N}\) is a dictionary of "hello" features satisfying \(\MU^T\MU = \MU\MU^T = \MI\). Each row in the \(k\)th column of \(\Sigma\MA\) shows what columns of \(\MU\) are used to create the data of the \(k\)th "hello," starting with the first column in \(\MU\), i.e., the feature that has the most variability in all the data, i.e., the feature that varies the most in all the "hello" data of \(\MX\).

Principal component analysis (PCA) uses SVD to find the orthonormal matrix \(\MP \in \MR^{K\times N}\) such that the transformation \(\MP\MX\) will have a diagonal covariance, i.e.,

$$\MP\MX ( \MP\MX )^T = \MP\MX \MX^T\MP^T = \textrm{diagonal}.$$ This means that we have a linear transformation that decorrelates the data types of all of our observations. The matrix we are looking for is exactly \(\MP = \MU^T\), given by the SVD of \(\MX\). The rows of \(\MP\) are known as the principal components of \(\MX\). Again, they describe the directions of variability of the data from greatest to least. This decorrelation in essence strips away the second-order dependencies between the data types in each observation, and assumes that there is no higher-order statistics between the data types. The implication here is that the data is distributed as a zero-mean multivariate distribution with some specific covariance matrix, and this is all the information that is needed to completely specify the statistics of our data, i.e., each random vector comes from a multivariate Gaussian distribution.

While PCA works well (in a least-squares sense) as long as the underlying data can be modeled by a multivariate Gaussian random vector, any data with higher-order statistical dependencies will not be well-represented (in a least-squares sense) by the principal components found through PCA. In such cases one turns to Independent Component Analysis (ICA). Much in the same way as PCA, ICA finds for our set of "hello" measurements \(\MX\) a set of "independent components" \(\MS \in \MR^{N\times K}\), as well as a mixing matrix \(\MM \in \MR^{N\times N}\), such that $$\MX = \MM\MS$$ without making assumptions about the types of distributions of the components, suffice it to say that they are independent and not distributed as a multivariate Gaussian. If our "hello" data is not distributed as a multivariate Gaussian, then the principal components of the data will not signify much. In this case then, we would attempt to learn the independent components of the data, to find those distinguishing features of "hello."

Together, PCA and ICA provide excellent methods for digging into data and discovering its significant features, which can then be used for data analysis. However, an interesting notion is that somehow our brains could be performing the same tricks in order to make sense of the massive amount of information reaching our senses while only having a limited throughput in the nervous system. When coupled with the idea of sparsity, or minimal energy, the results appear to have implications for how our brains work. This article provides a vision offering an interesting perspective about data analysis and the processes in the black box that is the trained brain. The authors discuss the observation that when data, e.g., text, social networks, music, is clustered in an unsupervised way using an analysis based on independent components, the clusters resemble those assembled by a trained human. This is what the authors define "cognitive component analysis," which here is not defined in a mathematical sense. Do these principal components and independent components signify something of higher-order, say at a cognitive level?

The discipline of linear algebra, one of the great accomplishments in mathematics of the last century, offers exceptionally powerful tools for working with and understanding data. Consider that we have any real matrix \(\MX \in \MR^{N\times K}\) of \(K\) independent observations (each with zero mean) of \(N\) data types (e.g., sample values in time). It could be, for instance, \(N\) features derived from \(K\) people saying the word "hello." One of the principal results of linear algebra, singular value decomposition (SVD), provides a way to transform these "hellos" such that we can describe them as a linear combination of orthonormal (uncorrelated) sources ordered by "importance", i.e., $$\MX = \MU(\Sigma\MA)$$ where \(\Sigma \in \MR^{N\times K}\) is a positive diagonal matrix of decreasing weights, \(\MA \in \MR^{K\times K}\), and \(\MU \in \MR^{N\times N}\) is a dictionary of "hello" features satisfying \(\MU^T\MU = \MU\MU^T = \MI\). Each row in the \(k\)th column of \(\Sigma\MA\) shows what columns of \(\MU\) are used to create the data of the \(k\)th "hello," starting with the first column in \(\MU\), i.e., the feature that has the most variability in all the data, i.e., the feature that varies the most in all the "hello" data of \(\MX\).

Principal component analysis (PCA) uses SVD to find the orthonormal matrix \(\MP \in \MR^{K\times N}\) such that the transformation \(\MP\MX\) will have a diagonal covariance, i.e.,

$$\MP\MX ( \MP\MX )^T = \MP\MX \MX^T\MP^T = \textrm{diagonal}.$$ This means that we have a linear transformation that decorrelates the data types of all of our observations. The matrix we are looking for is exactly \(\MP = \MU^T\), given by the SVD of \(\MX\). The rows of \(\MP\) are known as the principal components of \(\MX\). Again, they describe the directions of variability of the data from greatest to least. This decorrelation in essence strips away the second-order dependencies between the data types in each observation, and assumes that there is no higher-order statistics between the data types. The implication here is that the data is distributed as a zero-mean multivariate distribution with some specific covariance matrix, and this is all the information that is needed to completely specify the statistics of our data, i.e., each random vector comes from a multivariate Gaussian distribution.

While PCA works well (in a least-squares sense) as long as the underlying data can be modeled by a multivariate Gaussian random vector, any data with higher-order statistical dependencies will not be well-represented (in a least-squares sense) by the principal components found through PCA. In such cases one turns to Independent Component Analysis (ICA). Much in the same way as PCA, ICA finds for our set of "hello" measurements \(\MX\) a set of "independent components" \(\MS \in \MR^{N\times K}\), as well as a mixing matrix \(\MM \in \MR^{N\times N}\), such that $$\MX = \MM\MS$$ without making assumptions about the types of distributions of the components, suffice it to say that they are independent and not distributed as a multivariate Gaussian. If our "hello" data is not distributed as a multivariate Gaussian, then the principal components of the data will not signify much. In this case then, we would attempt to learn the independent components of the data, to find those distinguishing features of "hello."

Together, PCA and ICA provide excellent methods for digging into data and discovering its significant features, which can then be used for data analysis. However, an interesting notion is that somehow our brains could be performing the same tricks in order to make sense of the massive amount of information reaching our senses while only having a limited throughput in the nervous system. When coupled with the idea of sparsity, or minimal energy, the results appear to have implications for how our brains work. This article provides a vision offering an interesting perspective about data analysis and the processes in the black box that is the trained brain. The authors discuss the observation that when data, e.g., text, social networks, music, is clustered in an unsupervised way using an analysis based on independent components, the clusters resemble those assembled by a trained human. This is what the authors define "cognitive component analysis," which here is not defined in a mathematical sense. Do these principal components and independent components signify something of higher-order, say at a cognitive level?

Your work in linear algebra use same matrices the present in my work! Important question: this matrices is invariant with elementary algebrical operations [for example multiplications operations] ?

Thanks the same aricles, with created me one beatifull day !

Your respect,peter lovasz ,teacher

Thank you Peter, though I am not quite sure what matrix you are talking about. The principal component matrix \(\MP\)? In that case, what aspect of this unitary matrix are you talking about? Its trace, norm, determinant?