Background Large Language Models (LLMs), such as GPT-175B, are revolutionizing various tasks in natural language processing. However, their substantial size and computational demands pose significant challenges, especially in resource-constrained environments. Addressing these challenges, model compression has emerged as a critical area of research, focusing on transforming resource-intensive models into compact, efficient versions. Method Experiment Results The effectiveness of model compression techniques is evaluated using metrics like the number of parameters, model size, compression ratio, inference time, and FLOPs. Benchmarks and datasets are employed to compare the performance of compressed LLMs with their uncompressed counterparts. While significant advancements have been made, there remains a performance gap between compressed and uncompressed LLMs. Conclusion This survey presents a detailed exploration of model compression techniques for LLMs, covering methods, metrics, and benchmarks. It emphasizes the need for advanced research in this area to unlock the full potential of LLMs across various applications, providing valuable insights for ongoing exploration.