Post by @rss3_ • Hey

Sparse Pre-training and Dense Fine-tuning for LLMs -- a 2.5x reduction in pre-training FLOPs, without a significant loss in accuracy on the downstream task

Stats

Comments