-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the Generalized-Pooling-Functions-Mixed-and-Gated-in-Convolutional-Neural-Networks wiki!
In modern visual recognition systems, pooling operations play a role in producing “downstream” representations that are more reboust to the effects of variations in data while still preserving important motifs. [1] In the current deep learning literature, popular pooling functions include average, max and stochastic pooling. And among all, max pooling is most widely used. This operation mimics the spatial selective attention mechanism of human which attends to the important and discriminable areas of the input image[3], and it retains the most valuale featuers in the local patch by picking the maximum value in the pooling region. It drastically reduces the spatial dimension (the length and the width) of the input volume, for example by half in most commonly used 2x2 max-pooling, and truncates the number of parameters, which improves computational efficiency. Thus, most recent literatures have used max pooling as the default choice in their building convolutional neural networks.
However, realizing that pooling operations have been little revised beyond the few current options, some other approaches were investigated and proposed. Introduced in [1] by Chen Lee, P.W.Gallagher, Z.Tu., they combine the typical max pooling and average pooling function in two different strategies, “nonresponsively” and “responsively” to the regions being pooled, and generate a natural generalizaiton of pooling operations, which they name it mixed max-average pooling and gated max-average pooling.(will be discussed in detail later) By responsive, we mean that the mixing proportion between max and average is learned and depends on the characteristics in the pooling region, rather than directly using a fixed value, i.e being unresponsive to the pooling regions. With the unsophisticated algorithm, they are easy to implement. Also, only a light increase in computational overhead (ranging from 5% to 15% additional overhead in timing experiments) and a very modest increase in the number of model parameters are resulted. It is found that by replacing the conventional pooling operations with the proposed generalized pooling methods, a boost in performance was observed on MNIST, CIFAR10 and SNHV. They also propoes a tree pooling approach, learning and combining pooling filters in a binary tree structure as another way of natural generalization, which I will not discuss in this paper.
My goal is to follow and further study these two proposed pooling operation: mixed and gated max-average pooling. Experiments are performed to more thoroughly explore the potential of the operations by training across a wider range of hyperparameters (e.g. 50/50, 20/80, 40/60, etc mixed ) and different learning options (learn a mixing proportion parameter (a) per network, (b)per layer, (c) per layer/region being pooled, (d) per layer/region/channel, etc), so as to evaluate the sensitivity of my network to these parameters and their influences on the overal training performances in object recognition. Also, by standing on the published observations in [1], I will again comapre mixed and gated pooling with the conventional pooling operations (max and average pooling in this paper), with and without data augmentation. In this way, I aimed to further discuss the benefits and potential of these two generalized pooling functions under CNN-like architectures in object recognition.