Optimizing Over-Parameterized Models Using PL Regularization Summary

Background The paper by Chen et al. introduces a novel framework, Only-Train-Once (OTO), which significantly simplifies the neural network pruning process. Traditional pruning methods often involve multi-stage training, are heuristic, and require fine-tuning to reach optimal performance. OTO, on the other hand, compresses full neural networks into slimmer architectures in a single pass, maintaining competitive performance and significantly reducing the computational cost (FLOPs) and model parameters. Method The key to OTO's approach lies in two novel concepts: Zero-Invariant Groups (ZIGs): The network's parameters are divided into these groups. Pruning these zero groups does not affect the network's output, thus enabling efficient one-shot pruning. This approach is adaptable to various neural network architectures, including complex ones like residual blocks and multi-head attention mechanisms. Half-Space Stochastic Projected Gradient (HSPG): This is a new optimization method that addresses the structured-sparsity optimization problem. It surpasses traditional proximal methods in promoting group sparsity while maintaining comparable convergence rates. The uniqueness of HSPG is its capability to induce sparsity more effectively in deep neural networks (DNNs). The Half-Space Stochastic Projected Gradient (HSPG) is a novel optimization method introduced by Chen et al. in their paper on the Only-Train-Once framework. To understand HSPG, let's break it down into its fundamental concepts and how it functions within the context of neural network training and pruning: Fundamental Concepts Structured Sparsity: HSPG is designed to induce structured sparsity in neural networks. Structured sparsity is about making entire sets of parameters (like filters or neurons) zero, as opposed to unstructured sparsity, where individual weights are set to zero. This is beneficial…

Optimizing Over-Parameterized Models Using PL Regularization Summary

Only Train Once: A One-Shot Neural Network Training And Pruning Framework Review