The 8088 The 8088 ← All news
arXiv cs.LG AI Research Apr 23

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

★★★★★ significance 3/5

Researchers propose 'expert upcycling' to expand the capacity of Mixture-of-Experts (MoE) models during continued pre-training. This method uses expert duplication and router extension to increase model parameters without increasing per-token inference costs. The approach significantly improves efficiency by providing a warm initialization for larger models.

Why it matters Expanding model capacity through continued pre-training offers a path to higher performance without increasing the inference-time computational burden.
Read the original at arXiv cs.LG

Tags

#mixture-of-experts #scaling laws #model training #efficiency

Related coverage