Yan 2.0 Preview Iterative Upgrade: More Stable Training, Faster Inference

To enhance the model’s training stability and inference speed, RockAI has been continuously refining and optimizing its Yan architecture large model, achieving new progress based on Yan 2.0 Preview.

During training, RockAI researchers observed differences in the distribution of activation values and instability in training, and thus carried out an iterative technical upgrade. The ablation experiment showed that when the model's memory module adopts the strategy of "update first, then use," convergence is improved.

You can also find this video on Youtube: Yan 2.0 Preview Iteration Upgrade

1) More Stable Training

Based on new findings regarding memory, RockAI researchers abandoned inter-sample memory updates during the pre-training stage (while retaining intra-sample updates), and adjusted memory updates to occur during instruction alignment and inference, using a better structure to improve the model's training speed and stability.

训练更稳定.png

Before Optimization (Blue) and After Optimization (Orange)

2) Faster Inference

Strengthening training stability not only accelerates data processing efficiency during inference, but also drives the optimization of all key metrics, providing a more adaptable technical foundation for edge deployment and reducing deployment difficulty.

推理更快.png