Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on submodular optimization. In particular, for a given layer, we select neurons/channels to prune and corresponding new weights for the next layer, that minimize the change in the next layer's input induced by pruning. We show that this selection problem is a weakly submodular maximization problem, thus it can be provably approximated using an efficient greedy algorithm. Our method is guaranteed to have an exponentially decreasing error between the original model and the pruned model outputs w.r.t the pruned size, under reasonable assumptions. It is also one of the few methods in the literature that uses only a limited-number of training data and no labels. Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime.
Marwa El Halabi is a research scientist at the Samsung SAIT AI Lab Montreal, within Mila. Before that, she was a PostDoc in the Machine Learning Group at MIT working with Stefanie Jegelka. She completed her PhD in June 2018 at the Computer and Communication Sciences department at EPFL where she was advised by Volkan Cevher. Previously, she received her B.Eng. in Computer and Communications Engineering from the American University of Beirut. She is interested in algorithmic and theoretical aspects of Machine Learning. In particular, her research has been in submodular optimization, convex optimization and structured sparsity.