Technical Insights

2025年08月25日

Letting Large Models Learn Without External Intervention! This Non-Transformer AI Dark Horse Shines at WAIC

分享


 

156decf7-6e13-4993-8ea6-2e87fdd3fedd.png

64440.webp

    Native Memory, Offline Intelligence, Has the Watershed of Large Model Evolution Truly Arrived?


    Author | Cheng Xi

Editor | Moying


    Zhidx reported on July 26 that as the World Artificial Intelligence Conference opened today, a Chinese AI startup was seen making a strong push against the mainstream Transformer architecture at its booth in Shanghai.

    At its booth, a robotic dog learned a user’s greeting gesture in less than 30 seconds and reproduced it precisely—mimicking not only the movement but also the fact that the user used their right hand. Remarkably, this was achieved entirely offline, without relying on the cloud.

999999940.gif

    This was the scene unfolding at RockAI’s booth. Since releasing Yan 1.0 in January 2024—the first domestic large model based on a non-Transformer architecture—followed by Yan 1.3 with swarm intelligence in September 2024, RockAI has now unveiled its latest model: Yan 2.0 Preview. The key to the robotic dog’s impressive performance lies in the introduction of memory and self-learning capabilities within Yan 2.0 Preview.

    Despite having only 3B parameters, Yan 2.0 Preview already outperforms larger-scale models like Llama 3, Qwen 3, and Gemma 3 on benchmarks such as ARC-C, ARC-E, and WinoGrande.

效果对比.webp

    RockAI CEO Liu Fanping stated that while the Transformer paradigm follows pre-training → fine-tuning → application, the Yan architecture enables models to learn and interact in the physical world without cloud dependence, breaking free from today’s conventional learning paradigm. Yan 2.0 Preview carries the mission of endowing models with autonomous learning capabilities.

 

01. Learning and Reproducing Actions Offline Within 30 Seconds — Enabling Native Memory

    At WAIC, Yan 2.0 Preview, built upon the Yan non-Transformer architecture, demonstrated text, vision, and audio multimodal understanding as well as end-to-end audio and video generation.

    Its performance spoke for itself.

    For example, a RockAI dexterous robotic hand played a box-pushing game autonomously—analyzing, evaluating, and deciding each step until the box was placed in the correct position.

推箱子.gif

    January 2024, Yan 1.0 was released, featuring higher training and inference efficiency, throughput, and memory capability compared to Transformer models of the same scale. It also showed reduced hallucinations, supported CPU operation, and was fully deployable for private use.

    Eight months later, Yan 1.3 was released, evolving into a multimodal swarm-intelligence large model capable of running inference on a Raspberry Pi single-board computer.

    Now, Yan 2.0 Preview verifies self-learning synchronized with training and inference, though it remains an intermediate step in RockAI’s exploration. Its differentiable memory module enables storing, retrieving, and forgetting information. According to RockAI CTO Yang Hua, empowering models with autonomous learning may become a critical technical barrier—and represents a key step toward AGI.

    Currently, Transformer-based large models excel in short-term dialogue but lack true native memory. The mainstream industry approach relies on external mechanisms like RAG (Retrieval-Augmented Generation), long-context windows, or external databases to mimic memory. However, these are essentially one-off retrievals, lacking continuity, growth, and correction—making it difficult to achieve human-like, long-term memory.

    In contrast, native memory is a cornerstone for large models on the path toward AGI. Its significance lies in enabling a model not only to remember who the user is, what they said, and what they like, but also to update knowledge, personalize over time, and understand context during long-term interaction. Only then can models evolve from tools into true personal assistants, capable of supporting applications such as content creation, education, and business decision-making with coherent, deep intelligence.

    As mentioned earlier, the Yan 2.0 Preview-powered robotic dog can learn and replicate designated actions within 30 seconds while continuously remembering each visitor’s preferences and interaction style. This capability requires no cloud computing, functioning entirely in an offline deployment environment, while retaining native memory, autonomous understanding, and adaptability.

    Once Yan 2.0 Preview is deployed offline, the robotic dog gains the ability to become a living bionic companion. When large models possess native memory, end devices are endowed with true intelligence.

image1.png

    This aligns seamlessly with RockAI’s mission: "Make Every Device Its Own Intelligence. Starting with non-Transformer architectures, RockAI is equipping its models with multimodality, real-time human-machine interaction, and autonomous learning capabilities, pushing the boundaries of AI evolution.

02. Neural Network Memory Units Introduced — PC Deployment Achieved

    The enhancements in self-learning and multimodal understanding capabilities have placed higher demands on the underlying architecture of Yan 2.0 Preview.

    At its core, Yan 2.0 Preview introduces differentiable memory modules to achieve information storage, retrieval, and forgetting. Its forward process consists of two phases: memory updating and memory retrieval.

    The memory updating phase allows the model to retain long-term dependencies through gated updates while flexibly integrating new knowledge based on input distribution characteristics.

    The memory retrieval phase expands the model’s memory capacity while enhancing its retrieval capability.

    Beyond autonomous learning, the model also possesses understanding and generation capabilities across multiple modalities. Its core components include:

a language model based on the Yan 2.0 Preview architecture, a vision encoder, a video token compression module, a vision connection layer, an audio discretization module, and

an audio decoder.

    The audio discretization module improves modeling efficiency by quantizing continuous speech signals into a finite set of discrete values, enabling unified modeling of both semantic and acoustic information at low bitrates.

    For audio modality expansion, the Yan multimodal architecture can effectively learn audio sequences and capture fine-grained acoustic features. Training included about 1 million hours of audio data for modality extension and alignment, and 8 million pairs of speech Q&A data for supervised fine-tuning of audio question-answering tasks.

    Finally, in the audio decoding phase, the audio decoder reconstructs discrete audio tokens generated by the Yan multimodal model into final audio waveforms, achieving high-quality end-to-end speech synthesis.

    It is clear that Yan 2.0 Preview continues to push the boundaries of reducing computational resource requirements and improving model performance through foundational innovations in multimodal architecture.

    These advances reflect RockAI’s persistence on the non-Transformer path, closely aligned with the core needs of on-device model deployment. Today, RockAI’s Yan series models have already been deployed on brand-name PCs, powering functions such as an integrated large-model meeting assistant and more.

 

03 Offline Intelligence: Redefining Hardware for Collective Intelligence

    Choosing a path of non-mainstream foundational innovation, RockAI's journey has been challenging from the start. Since its establishment in June 2023, RockAI has been committed to creating non-Transformer architectures.

    Coupled with its technology roadmap and an understanding of the large model industry's future trends, RockAI announced its mission in July of the same year: " Make Every Device Its Own Intelligence " This mission has been continuously internalized into the company's business development.

不if残疾人v跟日文.webp

    On one hand, its Yan series models gradually align with the parameter scale and performance required for edge devices. On the other hand, RockAI has achieved offline deployment of its models on edge hardware such as smartphones, computers, drones, and robots, deploying them both as embedded or external components, including on devices like DJI drones and Raspberry Pi single-board computers.

    As the first to focus on non-Transformer architecture, RockAI initially faced skepticism from the industry and technical challenges, such as the system’s technical reusability on existing architectures, as well as building a bottom-up framework and enabling machines to have self-learning capabilities from scratch.

    Looking at RockAI's models, we see that intelligence is redefining hardware. The lifecycle of hardware has shifted from a one-time delivery to one that possesses long-term memory, growing alongside the user.

    Traditional hardware's value peaks at the moment of sale, then depreciates due to wear and obsolescence. True intelligent hardware, however, is dynamic, with its core value increasing over time through algorithmic iterations and model self-learning. Users are no longer purchasing a fixed-function product, but rather a service and evolving platform that grows with them. To achieve this sustained growth of higher-order intelligence, disruptive innovation at the foundational level is essential.

    With the release of the Yan series models and deeper collaborations with manufacturers like PC companies, RockAI’s persistence on this challenging yet correct path has started to bear fruit. Behind this success lies the visionary insights of the founding team and their robust technical expertise. While the industry was still immersed in the technological dividends of Transformer architectures, RockAI recognized their limitations in computational efficiency and scene adaptability, making a decisive move into the exploration of non-Transformer architectures.

    Looking ahead, RockAI is firmly committed to the concept of collective intelligence. As Yang Hua explains, their vision is not the evolution of a single intelligent entity but the creation of a machine society composed of multiple models and terminals, where collaboration and collective intelligence emerge like in human society. In this system, each intelligent terminal not only possesses environmental perception capabilities but can also interact in real-time with the physical world, self-learn, and evolve, forming an organic and collaborative intelligent group that continuously grows together.

    Collective intelligence is not only a technological leap but also what RockAI believes to be the key pathway toward achieving general artificial intelligence. With the release of Yan 2.0 Preview and its implementation on terminal devices, we may soon see the first signs of this vision coming to life.

Conclusion: Sticking to the Non-Transformer Architecture Path to Enable Every Device with Intelligence

    In the face of the mainstream Transformer architecture, RockAI has not blindly followed this trend. Instead, it has adhered to independent innovation, exploring a technological roadmap that is more in line with the essence of true intelligence. Through practical actions, it is exploring the innovative potential of non-Transformer architectures—this approach is one of the most valuable aspects of the current AI industry development.

    While the achievements RockAI has made so far are significant, its deep insights into industry pain points have injected vitality into the diverse development of AI technologies. By addressing real-world problems from the foundational logic, RockAI is driving the AI industry away from a reliance on technological dividends toward genuine innovation breakthroughs.

    Yan 2.0 Preview represents not just a breakthrough in technological paradigms, but also our reflection on the future of human-machine relationships: not a distant, unattainable super-model, but an intelligent new era where every device can think, collaborate, and grow.

    “Make Every Device Its Own Intelligence " is RockAI's mission. This vision has already been fully reflected in its early technological choices and innovative strategies, and it is now seeing tangible results.

 

 

 


syforwa.jpg ph.jpg

Win-Win Cooperation

Make every device its own intelligence

a0icon15.svg

Book Now