Recurrent neural networks (RNNs) have been shown to be effective for audio processing tasks, such as speech enhancement, source separation, automatic speech recognition, and speech emotion recognition. Another model that has also been effective for these various audio tasks is sparse nonnegative matrix factorization (SNMF), which decomposes a nonnegative feature matrix into sparse coefficients using a dictionary of feature templates. We use a recently-proposed neural network that combines the advantages of RNNs and SNMF, the deep recurrent NMF (DR-NMF) network, a RNN whose forward pass corresponds to the iterations of an inference algorithm for SNMF. Since DR-NMF is based on the principled statistical model, it can be initialized the maximum likelihood parameters of model and fine-tuned for any task. This principled initialization is especially beneficial when only a small amount of labeled training data is available. In this paper, we apply DR-NMF networks to the specific problem of detecting anger from speech, for which only a small amount of training data is available. We compare the performance of DR-NMF, both randomly initialized and pretrained with SNMF, to conventional state-of-the-art LSTMs and convolutional neural networks (CNNs) across a variety of realistic acted, elicited, and natural emotional speech data.