The Negative Effect of Pruning Large Language Models on Finetuning

Large language models (LLMs) such as Llama and Falcon often serve as foundational models, the bedrock upon which more specialized AI applications are built through finetuning. However, the computational demands of these models pose a significant challenge for both inference and finetuning. Techniques in optimizing neural networks, particularly pruning, may offer a solution by reducing the number of model parameters for more efficient inference without a consequent large negative effect on accuracy. However, while pruning reduces computational requirements for inference, it also reduces a model’s ability to be finetuned, as will be explained in this article. This poses a challenge for organizations wishing to leverage AI language models for specialized use.

Pruning aims to make neural networks more efficient by removing unnecessary connections within the model. Common pruning methods include:

Magnitude-based pruning: Eliminates connections with weights below a certain threshold.
Structured pruning: Removes entire groups of neurons or connections.

Pruning is effective in reducing the size of different models, but given the recent gold rush to deploy language models, it is important to look at its effect on these language models.

For large language models like GPT, Llama, and Falcon, pruning offers inference efficiency gains. However, pruning may potentially compromise adaptability during fine-tuning. Redundant neurons are often useful for encoding new information from finetuning, but they are often removed during the pruning process, thus hindering the model’s capacity to learn new information.

To illustrate, consider a language model pre-trained on a diverse corpus of text. The redundancies in the model allow it to store the same information across different layers and weights. During finetuning for a sentiment analysis, the model may learn to differentiate words or phrases with the same denotation (explicit meaning) but different connotation (culturally-specific nuance) based on the provided training data and can use the neural network’s redundancies to store these differences. However, if these redundancies had been pruned, the model loses the flexibility to encode these more nuanced relationships, leading to suboptimal performance.

Organizations therefore face a strategic decision regarding pruning:

Resource Constraints: Pruned FLMs offer efficiency benefits, suitable for resource-constrained environments where computational resources are limited. Despite potential limitations in fine-tuning, efficiency gains may outweigh adaptability concerns.
Task Complexity: For intricate tasks demanding precise adaptation, unpruned language models with full learning capacity are desired. These models will excel in capturing complex relationships from new information, essential for nuanced tasks like legal document analysis.

Pruning large language models presents a trade-off, optimizing efficiency at the expense of adaptability later on. Striking the right balance is paramount as organizations seek to leverage AI in an evolving business environment.

My advice would be to use the largest language model which an organization’s hardware permits as a foundational model for finetuning, then only start pruning the model after finetuning is finished.

By understanding the nuances of pruning and its implications for fine-tuning, businesses can make informed decisions, harnessing the power of language models while achieving desired outcomes in specialized domains.

Share on:

The Negative Effect of Pruning Large Language Models on Finetuning

The Negative Effect of Pruning Large Language Models on Finetuning

Read more posts about...