Introducing Instella: New State-of-the-art Fully Open 3B Language Models

AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs) trained from scratch on AMD Instinct™ MI300X GPUs. Instella models outperform existing fully open models of similar sizes and achieve competitive performance compared to state-of-the-art open-weight models such as Llama-3.2-3B, Gemma-2-2B, and Qwen-2.5-3B, including their instruction-tuned counterparts.

Our journey with Instella builds upon the foundation laid by our previous 1-billion-parameter LMs, AMD OLMo which helped showcase the feasibility of training LMs end-to-end on AMD GPUs. With Instella, we have scaled our efforts by transitioning from a 1-billion-parameter model trained on 64 AMD Instinct MI250 GPUs using 1.3T tokens to a 3-billion-parameter model trained on 128 Instinct MI300X GPUs using 4.15T tokens. While we compared our previous model with similarly sized fully open models only, Instella not only surpasses existing fully open models but also achieves overall competitive performance as compared to state-of-the-art open-weight models, marking a significant step in bridging this gap.

Introducing Instella: New State-of-the-art Fully Open 3B Language Models — ROCm Blogs

Comments