To our delight, the book Matrix Methods in Data Mining and Pattern Recognition, written by Lars Elden and published by SIAM in 2007, dedicates whole section 9.2.1  to our NNDSVD method.

Abstract

We describe Nonnegative Double Singular Value Decomposition (NNDSVD), a new method designed to enhance the initialization stage of nonnegative matrix factorization (NMF). NNDSVD can readily be combined with existing NMF algorithms. The basic algorithm contains no randomization and is based on two SVD processes, one approximating the data matrix, the other approximating positive sections of the resulting partial SVD factors utilizing an algebraic property of unit rank matrices. Simple practical variants for NMF with dense factors are described. NNDSVD is also well suited to initialize NMF algorithms with sparse factors. Many numerical examples suggest that NNDSVD leads to rapid reduction of the approximation error of many NMF algorithms.


The Problem

Given an m x n nonnegative matrix A, an integer k, and an iterative NMF algorithm that computes a nonnegative matrix factorization to A, i.e A~WH, for an m x k matrix W and an k x n matrix H, we seek a pair of matrices (W,H) to initialize the iterative algorithm.


The Algorithm

The basic algorithmic method of this paper is the NNDSVD algorithm. It is based on the Perron–Frobenius Theorem which guarantees nonnegativity for the first left and right singular vectors of a nonnegative matrix. Given a nonnegative m x n matrix A and an integer k (k < min(m,n)), NNDSVD produces two nonnegative matrices, say W, with dimensions m x k, and H, with dimensions k x n. The algorithm works as follows (matlab notation):

Input: A, k
- [U,S,V] = svds(A,k)                   % 1st SVD
- W(:,1) = sqrt(S(1,1)) * U(:,1)  ;  % The first left singular vector is nonnegative
- H(1,:) = sqrt(S(1,1)) * V(:,1)'  ;  % The first right singular vector is nonnegative
- for j=2:k
   Tmp = U(:,j) * S(j,j) * V(:,j)' ;    % form the rank one factor
   Tmp(find(Tmp<0)) = 0 ;           % zero out the negative elements, the rank increases at most by one.
   [u,s,v] = svds(Tmp,1) ;            % 2nd SVD
   W(:,j) = sqrt(s) * u ;                 % The first left singular vector is nonnegative
   H(j,:) = sqrt(s) * v' ;                 %  The first right singular vector is nonnegative
- end
Output: W, H


Theoretical contributions

Our main theoretical contributions are summarized in Lemma 1 and Theorem 2 of the paper. The first implies that if we zero out the negative elements of an arbitrary rank-one matrix, the resulting matrix will have rank at most two. The second is an extension of the Perron–Frobenius theorem and gives a class of nonnegative matrices whose singular value decomposition consists of entirely nonnegative factors . Here we first fix the notation and then provide the formal versions of the above statements:






Practical contributions

From a practical perspective, we present a deterministic method to initialize the existing NMF algorithms. Most of the algorithms that produce nonnegative matrix factorizations start with a pair of matrices (say W and H) and iterate, updated these matrices, until  convergence. Not surprisingly, this initial pair plays a crucial role for the convergence speed of the iterative algorithm; a typical scenario for  optimization algorithms of this type. Our numerical experiments indicate that some of the existing NMF algorithms can be improved if the initial pair is selected with our NNDSVD method.


Software

We provide a Matlab implementation of our algorithm. The algorithm provided above in this page is only a high-level description of our method. The actual computations utilize theorem 2 and avoid the explicit svd computations of the for-loop. The nndsvd.m file implements the algorithm as described in Table 1 of the paper. 


Data

Table 4 of our paper describes the datasets we utilized in our experiments. We make available these matrices as .mat files; the datasets USGS and SHUTTLE aren't available. Note that the references of this table here correspond to the reference list of the paper.


Experiments

Please check the numerical experiments section (section 3) of the paper for a series of experiments of our method to a combination of NMF algorithms and datasets.