Investigating Maxout Activation Functions in Speech and Sound Recognition  
Author Gabriel Castaneda


Co-Author(s) Paul Morris; Taghi M. Khoshgoftaar


Abstract Frequency-domain convolutional neural networks have proven that they can outperform fullyconnected deep neural networks in speech recognition tasks. Most of the convolutional network results published up until now use sigmoid or rectified linear neurons. Maxout is an alternative activation function that has proven successful in several domains. Maxout activation function selects the maximum of multiple linear neurons to form a piecewise linear function. This nonlinearity is a generalization to the rectified nonlinearity and can approximate any form of activation function. Maxout networks have gained great success in many computer vision tasks, but there is limited work on other recognition tasks. We explore the performance of multiple maxout activation variants on speech and sound recognition using fast Fourier transform convolutional neural networks. Our experiments compare the rectified linear unit, leaky rectified linear unit, scaled exponential linear unit, and hyperbolic tangent activation function to four maxout variants. We found that on average, across all datasets, maxout is better than traditional functions in terms of classification. Our experiments suggest that adding more filters enhances the classification accuracy of rectified linear unit networks, without adversely affecting its advantage over maxout activations in training speed.


Keywords Maxout networks, deep learning, sound recognition, speech recognition
    Article #:  DSIS19-28
Proceedings of ISSAT International Conference on Data Science & Intelligent Systems
August 1-3, 2019 - Las Vegas, NV, U.S.A.