Technical Insights
2025年08月20日
Offline + Memory: A Watershed Moment in the Evolution of Large Models
分享
Shanghai, August 11, 2025-RockAI
After an eight-year journey led by Transformer-based AI, Google has begun exploring new directions, while Chinese startup RockAI is taking innovation even further with its Yan 2.0 Preview—the world's first large model with native memory which is able to run offline on a Raspberry Pi and accumulate its own memories over time like a human-being, completely overcoming Transformer's heavy reliance on computational power and data silos.
The highly anticipated World Artificial Intelligence Conference (WAIC 2025) recently concluded in Shanghai. During the event, RockAI's booth was constantly surrounded by visitors. A robotic dog could remember learned actions and preferences without pre-programmed instructions. A robotic hand could autonomously play games on a keyboard in offline mode using visual recognition and dynamic decision-making. These astonishing demonstrations all stem from RockAI's newly released Yan 2.0 Preview large model.
Unlike current mainstream large models, it is not only the world's first large model with "native memory," enabling the model to learn autonomously through the introduction of neural network-based memory units, but it also adopts a non-Transformer architecture, and further expands multimodal capabilities into the video domain.
Native memory, offline intelligence, non-Transformer architecture—these are RockAI's most prominent features, and the reason for its standout performance at this year's WAIC.
1. Yan 2.0 Preview: World's First Large Model with "Native Memory"
At RockAI's booth in WAIC 2025, the robotic dog demonstration stole the show. The staff first demonstrated a greeting gesture to the robotic dog, which immediately learned autonomously and performed the action without any program modifications. The staff also told the robotic dog, "This is my favorite drink", and it promptly located the drink. This ability to "learn and apply knowledge on the spot, and to become smarter through use" significantly changed traditional perceptions of AI, serving as a direct manifestation of the "native memory" feature in Yan 2.0 Preview.
Whether it's Google's Titan architecture or Meta's Chief Scientist Yann LeCun, they all emphasize that memory modules must be introduced into the model, as AI's learning ability is closely equal to its memory capability. In traditional large models, memory often relies on external plugin calls (such as RAG). Yan 2.0 Preview's breakthrough lies in the mechanism of synchronized training and inference, which means the model is no longer a static product but a continuously evolving AI agent. Each interaction with its environment and each new task scenario can all serve as nourishment for the model's autonomous learning and evolution. When discussing autonomous learning, it's impossible not to mention Yan 2.0 Preview's memory module.
Yan 2.0 Preview innovatively introduces neural network-based memory units. Specifically, it enables information storage, retrieval, and forgetting through differentiable memory module. Unlike technical solutions such as “context engineering,” which explicitly store memory information, Yan 2.0 Preview implicitly stores the acquired relevant information in the weights of a multilayer neural network. By utilizing multi-level abstraction and non-linear modeling of neural networks, along with data-dependent adaptive forgetting factors and adaptive learning rates, it dynamically adjusts the memory strength. It can retain long-term dependencies through gated updates, and also flexibly integrate new knowledge based on input distribution characteristics. To increase memory capacity, a sparse memory module was further designed.
Yan 2.0 Preview "Native Memory" Module
Yan 2.0 Preview's forward propagation can be divided into two phases: memory update and memory retrieval.
During the memory update phase, the model determines which old knowledge can be forgotten and extracts valuable information from the current task to write into the memory module. This process does not rely on external plug-ins or caching; instead, a dedicated neural network simulates memory behavior to achieve adaptive forgetting and incremental learning. By preserving important historical information, it flexibly integrates new knowledge.
During the memory retrieval phase, Yan 2.0 Preview introduces a sparse memory mechanism, in which the model selects the Top-K activated memories from multiple memory slots, merges them with long-term shared memory, and generates new outputs. As a result, the model not only possesses memory but can also "reason with memory." Combined, these mechanisms enable Yan 2.0 Preview to achieve initial validation of the effectiveness of its memory network. The model is no longer a static brain but begins to evolve into a growing AI agent.
As it can be seen from above, the autonomous learning mechanism based on training and inference synchronization implicitly stores relevant information in the wieghts of multilayer neural network. Compared to explicit context engineering, this approach is closer to the working principle of the human brain.
Unlike traditional external memory, native memory integrates key information from the learning process deeply into the model parameters, enabling true "long-term memory" and personalized evolution. This kind of memory is not simply about storing information—it is a evolutionary process of accumulated intelligence. Just as how humans form cognitive system through memory and experience, only when large models possess this capability, can they evolve from being passive executors of commands into partners that understands user intent. More importantly, edge deployment enables all of this to happen locally on the device—also known as offline intelligence—ensuring privacy while also improving response speed and data security.
2. Offline Intelligence: From Cloud to Edge with Self-Learning
Future devices will no longer be machines that passively execute commands, but rather “digital brains” with the ability to perceive, remember, and learn. The profound significance lies in the fact that native memory gives large models “continuity of thought.” When each device can accumulate experience through memory, they will no longer be isolated electronic components, but rather partners that can grow together with users. This is an essential step on the path toward Artificial General Intelligence (AGI).
2.1 The Fundamental Difference Between Offline Intelligence and Edge AI
When mentioning AI capabilities on local devices, many people associate this with "edge AI". However, RockAI's "offline intelligence" differs fundamentally from traditional edge AI.
Traditional edge AI is essentially a "simplified version of a cloud-based large model," where cloud-based large models are compressed into small models using techniques such as pruning and distillation, and then deployed to local devices. Although this approach reduces computational requirements, it comes at the cost of greatly reduced capabilites—restricting the model to simple tasks such as voice activation. Complex multimodal interactions and logical reasoning still relies on the cloud.
However, RockAI believes that on-device large models should not be merely compressed versions of cloud-based large models. Instead, they should be built on an innovative model architecture, capable of local deployment and customization on devices. Their core capability lies in enabling autonomous learning and memory through multimodal perception, to provide personalized services while ensuring data privacy and operational security, just like giving a real brain to each device.
In other words, offline intelligence does not merely mean "offline operation," but rather refers to the model that can complete the entire process of understanding, inference, and even learning, all locally in a closed-loop system. It has three core features:
(1) Fully local inference: The entire inference process runs locally on the device and does not rely on cloud-based computational power. The model is deployed on the device and is usable offline.
(2) Multimodal understanding: Its multimodal understanding can process complex inputs such as voice, images, and videos, with strong local perception and interaction capabilities. Currently, Yan 2.0 Preview can perform multimodal question-answering at 5 tokens/s on a Raspberry Pi.
(3) Learning while using: The growing ability of "learning while using" possesses training and inference synchronization ability. New information from interactions with user can be written into the local model's memory, enabling gradual growth.
At the RockAI booth at WAIC 2025, the AIPC on display was able to perform real-time tranion, translation, and meeting summarization tasks even without an internet connection. This ability to "function intelligently even offline" addresses a major pain point in global markets—particularly in regions with unstable networks like Africa and the Middle East, as well as in Europe and the Americas, where data security requirements are stringent.
It can be said that, traditional edge AI is like “fitting a square peg into a round hole,” forcing compromises that sacrifice capabilities to fit on the device. In contrast, RockAI's offline intelligence is like “tailor-made,” with the optimal architecture designed for end devices from the outset. This is why it can deliver offline intelligence experiences on budget smartphones that even traditional flagship devices cannot provide.
2.2. Core Values of Offline Intelligence
In real-world global applications, the value of offline intelligence has been continuously validated, particularly in three key dimensions:
(1) Data Security: From “Passive Protection” to “Active Defense”
Under increasingly strict data security regulations, “data staying on the device” has become a core user requirement. According to The General Data Protection Regulation (GDPR) in Europe and the United States, uploading corporate data to the cloud for processing requires explicit user authorization; otherwise, companies may face fines of up to 4% of global revenue. In sensitive sectors such as finance and healthcare, cross-border data transfer is even more tightly restricted.
RockAI's offline intelligence provides a fundamental solution to protect data security. Taking the AI meeting feature as an example, the entire process—including speech-to-text conversion, translation, and summary generation—is completed locally on the device. The original audio and text data never leave the device, preventing the risk of leakage from the source.
(2) Response Speed: From Waiting for the Cloud to Instant Interacting
Network latency is a common pain point for users worldwide. In Nigeria, the average 4G network latency exceeds 200 ms, and connections frequently drop; in the desert regions of the Middle East, the high latency of satellite networks makes cloud-based AI practically unusable. Traditional cloud-dependent AI services often lead to a frustrating scenario where, after a command is issued, the system spins for 10 seconds before ultimately returning a network error.
Offline intelligence can reduce response latency to real-time on-device processing levels. At the MWC 2025 Barcelona exhibition, the AIPC equipped with the Yan architecture large model was demonstrated on-site, achieving millisecond-level recognition and tranion offline, zero-latency real-time translation, and over 95% accuracy. This is also why distributors from Europe, the Americas, and Africa highly commended it at the exhibition.
(3) Hardware Inclusivity: From Flagship-Only to Available to All
The high computational power demands of traditional large models have made AI capabilities exclusive to flagship devices. For example, a certain brand's AIPC only offers full AI functions in flagship models priced above 10,000 RMB, and it requires a combination of edge and cloud computing. Whereas, through architectural optimization, RockAI's offline intelligence lowers the deployment threshold of large models to entry-level devices. The AIPC developed by RockAI and its partners costs just 3,000 RMB and runs entirely on local CPUs. This inclusivity is reshaping the hardware market, with AI glasses and AI toy manufacturers actively approaching RockAI.
2.3 From Offline Intelligence to Collective Intelligence
RockAI aims to do more than just add intelligence to individual devices. In the Yan 2.0 Preview, the combination of native memory and offline intelligence has begun to reveal the early stages of collective intelligence—which RockAI views as a crucial step toward achieving Artificial General Intelligence (AGI).
RockAI defines collective intelligence in the AI era as multiple AI units with autonomous learning capabilities that, through environmental perception, self-organization, and interaction and cooperation, solve complex problems and achieve overall intelligence enhancement in dynamically changing environments. In simpler terms, it enables multiple devices to collaborate via a network, forming collective intelligence that far surpasses individual capabilities.
Currently, leading tech companies' AGI explorations often follow a "single super-model approach"—focusing resources on training a single super-large model in an attempt to achieve general intelligence by increasing parameter scale and computational investment. This approach requires massive computational resources (training a 100-billion-paramater model once can use as much electricity as powering tens of thousands of people for a year) and centralizes model control, posing risks of technological monopoly.
RockAI has chosen a “distributed” path, where, rather than pursuing the perfection of a single super-model, it focuses on the collaborative evolution of countless edge AI agents with native memory and autonomous learning capabilities, ultimately giving rise to collective intelligence. The advantages of this approach include:
1) Computing inclusivity: distributing intelligence across a vast number of end devices to avoid the inefficient centralization of computing resources;
2) Secure and controllable: the absence of a centralized super-model reduces the risk of technological abuse;
3) Continuous evolution: each AI agent can learn in real-world scenarios, with overall intelligence improving as the scale of applications expands.
As RockAI states in its future vision: “AGI should not be a ‘digital god’ controlled by a few giants, but rather an ‘intelligent ecosystem’ formed by billions of collaborating devices. For artificial intelligence to become infrastructure, it must be closely linked with devices. When every device can remember, learn, and collaborate, society's overall intelligence will experience a qualitative leap.”
3. Yan Architecture Commercial Deployment
In today's landscape where the Transformer architecture dominates the large model domain, RockAI has chosen to pursue an alternative path by developing the Yan architecture. Since the Transformer was introduced in 2017, it has become the preferred choice for leading tech companies like OpenAI, Meta, and Google due to its powerful parallel computing capabilities and long-text processing capabilities. However, RockAI's team keenly recognized that the Transformer architecture's heavy reliance on computing power and cloud centralization fundamentally conflicts with the future trend toward offline intelligence and device inclusivity.
3.1 Computational Shackles of the Transfomer Architecture
The core of the Transformer architecture is the “Attention (self-attention) mechanism,” which allows the model to focus on the relationships between different words in the input text, thereby enhancing its understanding capabilities. However, this mechanism has extremely high computational complexity, with the computational load increasing quadratically as the length of the input sequence grows. For example, when processing text with 1,000 tokens, the computational load of the self-attention matrix is approximately 1 million operations; when processing text with 10,000 tokens, the computational load increases dramatically to 100 million operations.
This characteristic leads to two serious issues:
1) High computational power demand: Training a 100-billion-parameter Transformer large model requires tens of thousands of GPUs running for months; during inference, a single complex conversation consumes as much energy as an ordinary smartphone uses in a day.
2) Cloud dependency: Due to the limited computational power of end devices, complex tasks for Transformer large models must rely on cloud data centers for processing, which conflicts with the requirements of offline intelligence.
If we continue down the Transformer path, AI will forever remain “exclusive to the cloud” controlled by a few tech giants, unable to truly enter every household. To “make every device its own intelligence", we must break free from the architectural constraints.
3.2 Disruptive Innovation of the Yan Architecture
Compared to the Transformer architecture, the Yan architecture abandons Attention mechanism and instead adopts memory units and brain-inspired activation mechanisms. Memory units adopt a feature-state-driven memory mechanism, where information storage, retrieval, and forgetting are acheived through differentiable memory modules. The brain-inspired activation mechanism, also known as the “dynamic neuron selection-driven algorithm,” dynamically builds a neural network when a user inputs a query. This network is temporarily constructed as needed, rather than pre-defined, and intelligently allocates computational resources.
This advantage is vividly demonstrated during the demonstration of the robotic hand playing the game: the model must simultaneously process information from three modalities—visual input from the game screen (identifying button positions), voice input from user commands (skill activation), and sensor input from hand movements (current finger positions). The Yan architecture can fuse this information in real-time, make decisions within seconds, and control finger movements to carry out precise operations.
The Yan architecture achieves both reduced computational complexity and improved model performance, which is also the main reason why Yan architecture large models can be deployed on-device.
3.3 Commercial Deployment: From the Lab to the Global Market
Yan architecture's series of large models require no pruning or quantization and have been successfully deployed on Raspberry Pi, Snapdragon 6-series mobile chips, as well as AMD and Intel PC processors. Additionally, in 2025, amid the “hundred-model showdown” cooling down and many high-profile projects facing transformation anxiety, RockAI achieved notable commercialization success. Its model technology has been sold to global markets including Africa, the Middle East, Europe, the Americas, Southeast Asia, and Russia, with shipments of devices featuring the Yan architecture beginning to take shape.
(1) Breaking into Overseas Markets with Strong Demand
The demand for offline intelligence in overseas markets far exceeds expectations, particularly in regions with unstable network infrastructure and strict data security requirements. RockAI's AIPC, developed in collaboration with a leading overseas brand manufacturer, will enter mass production and be officially released in August for overseas markets. This is the world's first device capable of offline AI operation, offering users features such as AI Meeting, AI Gallery, Local Knowledge Base, and Voice Assistant. Meanwhile, devices co-developed with other brands are also being gradually deployed.
(2) Strategic Partnerships in the Supply Chain
During WAIC 2025, RockAI's signing ceremony with global chip manufacturer AMD attracted industry attention. According to the Memorandum of Understanding (MOU), the two parties will engage in in-depth technical collaboration in the AIPC field, jointly optimizing the Yan architecture's performance and compatibility with AMD chips. AMD stated: “In addition to their outstanding technical capabilities, we also highly appreciate RockAI's innovative spirit of daring to break new ground.”
In addition to AMD, the shipment volume of devices equipped with RockAI models is beginning to take shape. Its partners include not only the previously mentioned consumer electronics brands, ODM manufacturers, smartphone manufacturers, robot manufacturers, and automotive chip manufacturers, but also numerous manufacturers with even more demanding requirements for power consumption and performance, such as home appliances and XR glasses manufacturers, who are seeking collaboration. This full-chain collaboration model across upstream and downstream partners enables the Yan architecture to be rapidly deployed across various devices, creating a positive feedback loop among technology, products, and the market.
A Watershed Moment for Large Models
The large model industry in 2025 stands at a watershed moment. On one hand, the hype of the “hundred-model showdown” has faded, and many projects that were heavily driven by investment and had limited technological defensibility are stagnating. On the other hand, the genuine demand for AI from device manufacturers, brands, and users has never been stronger—what they seek is not laboratory “benchmark-chasing models”, but AI services that are deployable, practical, and capable of safeguarding privacy.
RockAI's counter-cyclical growth precisely confirms the industry's shift: the core of technological innovation is no longer a race for parameter scale, but rather breakthroughs in architecture, adaptation to scenarios, and the creation of user value. Native memory capabilities address the pain points of large models, such as the inability to remember or learn; offline intelligence breaks the shackles of “cloud dependency and computing power monopolization”; and non-Transformer architectures provide the technical feasibility for wider adoption of large models.
At WAIC 2025, RockAI demonstrated through tangible results that true AI innovation lies not in following trends, but in daring to blaze new trails. As the world's first large model featuring native memory runs smoothly on offline devices, and the slogan “even entry-level devices can run large models” becomes a reality, the evolution of large models is being rewritten.
The significance of this revolution lies not only in the technical breakthrough itself, but also in how it redefines the relationship between AI and humans—intelligence is no longer an unattainable technological myth but a companion in daily life; it is no longer a scarce resource monopolized by giants, but a foundational capability available to all. From offline intelligence to collective intelligence, from single-device evolution to ecosystem collaboration, the path RockAI is exploring is leading the large model industry toward a future that is smarter, more autonomous, and more inclusive.

