Event Reports

2025年12月22日

Paying for Tokens Is Unwise, Users Should Pay for Intelligence Instead | RockAI’s Liu Fanping at MEET2026

分享

Original article by 量子位 QbitAI


“For artificial intelligence to reach the next level, two major barriers must be overcome: the Transformer and the backpropagation algorithm.

As models' size continues to grow, and computing power as well as data are being pushed to their limits, RockAI founder Liu Fanping has put forward a view that is completely different from mainstream consensus.

The next stage of intelligence will not lie in being “bigger,” but in becoming “alive.”

At its core, it frees models from the constraints of static functions, enabling edge devices to have native built-in memory, autonomous learning, and the ability to continuously evolve.

This means that the direction of AI must shift from centralized, cloud-based computing power competition to a new paradigm in which every device and every unit can participate in learning and generating knowledge.

At QbitAI's MEET2026 Intelligent Future Conference, Liu Fanping referred to this turning point as “Hardware Awakening”:

when models on edge devices can activate sparsely like the human brain, form memories in real time, and continuously update themselves in the physical world, devices are no longer tools, but “living” AI agents.

As a large number of such AI agents learn and collaborate in the real world, they will nurture collective intelligence truly capable of generating knowledge.

MEET2026 照片1.png

This is not only a direct breakthrough in overcoming the "two major barriers," namely the Transformer and the backpropagation algorithm, but also a new pathway toward artificial general intelligence.

To accurately present Liu Fanping’s complete line of thought, the following content has been compiled and edited based on the speech tran, with the hope of offering new perspectives and insights.

The MEET2026 Intelligent Future Conference is an industry summit hosted by QbitAI. Nearly 30 industry representatives attended the discussion, with approximately 1,500 attendees on site and over 3.5 million live stream viewers. The conference received widespread attention and coverage from mainstream media.


Key Points Summary

■ Paying for tokens is unwise. Users should pay for intelligence instead.

■ Edge models are not small-parameter versions of cloud-based large models. The key for edge models lies in autonomous learning and native built-in memory, which Transformer architecture models cannot achieve on edge devices.

■ For artificial intelligence to advance to the next level, it must overcome two major barriers: the Transformer and the backpropagation algorithm.

■ The changes brought by native built-in memory and autonomous learning go beyond eliminating token charges. They also redefine the value of hardware.

■ When each device possesses its own intelligence and is able to learn from the physical world, collective intelligence emerges, similar to how individuals collaborate to generate knowledge. Existing large models (especially Transformer architecture models) do not generate knowledge themselves, they only transmit it. Collective intelligence is thus the optimal pathway toward artificial general intelligence.

■ ……

Below is the full text of Liu Fanping's speech:


Can Hardware Awaken?

I’m very glad to have the opportunity to share RockAI’s thoughts from a model-level perspective with everyone today. What I will discuss may differ somewhat from what people usually understand, as we believe that artificial general intelligence has its own development path.

Today’s topic is "Hardware Awakening."

We know that hardware is not alive, so how could it possibly awaken? Exactly. That's why we need to rethink everything when building large models, because the Transformer constrains us.

Let me ask everyone a question. What kind of intelligent hardware do you expect in the future? Is it your smartphone or tablet, or perhaps the Doubao phone released just a few days ago?

Many speakers at this conference mentioned agents, and they consistently highlighted one point: agents are tools and, importantly, more efficient ones.

Nowadays, many people still treat large models as tools. Like a calculator, you take it out when you need it and put it away when you don’t.

From the perspective of AI development, if the Doubao phone can open apps and perform tasks according to instructions, what comes next? If it can open WeChat and send messages, will WeChat still look and function the same in the future? The same goes for Amap, will Amap look and function the same 10 years later?

It becomes clear that what we have today is an intermediate state, not a final state.


Paying for Tokens Is Unwise

Just now, many speakers mentioned that token consumption has increased tenfold, especially when using agents.

This is basically paying for tokens.

But have you ever thought that paying for tokens is unwise.

Why do we build large models? It is for intelligence. If anything should be paid for, it should be for intelligence, so why pay for tokens?

To put it another way, some people can get their point across in just a few words, while others are long-winded. So, do I have to pay for their wordiness? Of course not.

If you think about it carefully, paying for tokens is a mistake. In the future (two years from now), when you look back, I’m sure everyone will wonder why we ever paid for tokens or topped them up.


Edge Models Are Not Simply Small-Parameter Versions of Cloud-Based Large Models

Hardware has already changed significantly, and current cloud-based large models are gradually moving toward edge devices.

MEET2026照片2.webp

Why is that? We do not deny the advantages of cloud-based large models, especially as tools. They are outstanding.

But in the future, AI belongs to everyone. To bring AI into everyone’s world, the most important thing is on-device intelligence.

On one hand, edge devices are closer to you. On the other, they offer the advantage of "data being everywhere."

I have always been opposed to collecting data in the cloud, training the model there, and then delivering it to users.

The data is right next to you, so why can’t it just stay where it is, instead of being in the cloud? Cloud-based large models have too many parameters, and there are not enough devices to collect the data around you.

If edge models can collect data that is entirely personal locally on your device, and connect with your other devices, people will no longer treat the model merely as a tool.

Many people think that edge devices are limited by hardware and computing power, so they build tens-of-billion-parameter “large” models in the cloud, and a few-billion-parameter “small” models on the edge, which they then consider an edge model.

But edge models are not small-parameter versions of cloud-based large models.

RockAI defines two critical characteristics for edge models, namely autonomous learning and native built-in memory, which we consider the most important.

Models with a Transformer architecture cannot achieve autonomous learning and native built-in memory on edge devices.


Looking Beyond the Transformer Architecture

The Transformer is outstanding. 

I myself was one of the earliest researchers on Transformer in China, and I have great respect for its early success.

However, it has now entered a kind of death spiral, creating a problem: to make the model’s capabilities stand out, we need to increase computing power and data, which significantly raises costs. Everyone, including competitors, is doing the same thing.

MEET2026照片3.png

You’ll notice that no one focuses on architecture. Everyone is busy with data and compute, because they think, “As long as I have enough data and compute, I will do better.”

We think that the belief in the success of Scaling Law appears to be a mistake. And it’s not just me, but many people now share similar perspectives.

At its core, the problem is not that the model isn’t large enough, but that the way of thinking is limited.

The model itself is a static function, which makes it unlikely to possess true intelligence. By contrast, the human brain is a dynamic function, constantly forming new connections with dynamic structures. It is for this reason that the human brain possesses memory.

MEET2026照片4.png

Another misconception is “more parameters mean more intelligence.”

Within the Transformer architecture, this way of thinking makes sense, but if we step outside the Transformer framework, it no longer holds.

Take a simple example from the biological world: does a snake or a small rabbit lack intelligence? Surely no one would deny their intelligence. 

Compared to the human brain, their brain contains far fewer "parameters."

Besides that, there is also the matter of long context. 

In 2024, there have been many breakthroughs in long-context capabilities. However, we have never considered long context to be a form of memory. True memory should function like the hippocampus in the human brain, where all information is processed, compressed, and stored, with some parts removed as needed.

This kind of memory is parametric, which is not achieved through context alone. If memory relies on context, it will be very short.

MEET2026照片5.png

Now, why is everyone focused on long context again? The reason lies in agents. Once deployed, Transformer architecture models behind agents are static functions, where their capabilities can only be modified through context.

At this point, it becomes clear that long context is actually a second-best solution, not a true solution for intelligence.

Whether the context window exceeds 1 million tokens, 2 million tokens, or 10 million tokens, the number of tokens generated every moment far exceeds these amounts. Take today's conference as an example, the content shared alone already exceeds 10,000 tokens.

Memory enables humans to form long-term cognition, which is a process. Our values are built up through the accumulation of memory over time. If memory relies solely on long context, values cannot form, nor can knowledge truly accumulate.

Human intelligence emerges from long-term accumulation.


Training and Inference Synchronization Enables Autonomous Evolution

Returning to the earlier point, the most important aspects of future intelligent hardware should be native built-in memory and autonomous learning.

Earlier, we have already discussed native built-in memory. Now, let us turn to autonomous learning. Autonomous learning must extend into the physical world.

One major benefit of autonomous learning is that the model will not “die” upon deployment. 

Many people may not realize, once the parameters are fixed, a model is dead at the moment it is deployed. Any change could only be done by uploading it to cloud servers for retraining, and then redistributing it to users after some time.

Once autonomous learning becomes possible, the resulting autonomous evolution will bring completely new changes. We will no longer view the model as a fixed tool, but as something that can continuously learn, which we refer to as training and inference synchronization.

To illustrate training and inference synchronization, take me as an example. I am standing here producing output (analogous to alarge model's inference process), while simultaneously acquiring new information. My inference and training occur at the same time, and the brain not only performs inference, but its parameters are constantly changing. That is what makes it “alive.”

If a model is released today, and if you ask the model three months later regarding what happened during those three months, it knows nothing. The only way to compensate is through external knowledge sources or plug-ins, such as RAG. Isn't this just a temporary solution?

As researchers, we must face this reality, in which many current solutions for large models are temporary solutions, not true ultimate solutions. Ultimate solutions require changes at the architecture level.

In my opinion, for artificial intelligence to advance to a higher stage, it must overcome two major barriers, the Transformer architecture and the backpropagation algorithm (the latter limits the development of many current devices, including the development of computing power).


The Model Architecture Must Be Changed

In order for a model to no longer “die” and be able to evolve, the model's architecture must be changed.

Take our large model with Yan architecture as an example. The entire model is extremely sparse, with an activation mechanism even sparser than MoE.

It imitates the operating mechanism of the human brain. The human brain has roughly 86 billion parameters, yet needs just over 20 watts of peak power to operate.

MEET2026照片6.png

Additionally, we have incorporated a memory module into the model. This means that during inference, as you interact with it, changes occur in the memory module. This is where true memory and true personalization begin.

If a device possesses autonomous learning, new possibilities emerge.

At this year’s World Artificial Intelligence Conference, we released a robot dog on which our model is deployed. At first, it had no abilities at all, but it could learn in real time. Our model does not necessarily require a cloud GPU, it can run directly on a mobile phone or CPU.

MEET2026照片9.png

And this is just a simple robot dog. If we expand the scope a bit, what about embodied intelligence?

Embodied intelligence cannot enter thousands of households yet. The main reason is that, at the time of manufacture, it cannot adapt to each household and provide services tailored to each one. It needs to learn.

Just like a person arriving at a hotel, they still need to check the layout to know where the study is and where the bathroom is. The same goes for future devices. They need to develop a specific understanding and go through a learning process, rather than being able to use all appliances in their factory state. This learning process is something that the Transformer architecture currently finds very difficult to achieve.


Intelligence Will Redefine the Value of Hardware

The changes brought by native built-in memory and autonomous learning are not just about tokens no longer being charged. More importantly, they show how intelligence can redefine the value of hardware.

For example, suppose you spend 20,000 Yuan on a pet dog. It accompanies you for two years, and you develop an emotional bond with it. Two years later, would you still sell it for 20,000 Yuan? At that point, you probably wouldn’t be thinking about the money, you would care more about the depth of your emotional bond with the dog.

Thus, in the future, hardware will need to allow users to co-create value, rather than simply paying for its features.

It’s like buying a smartphone. In the future, you won’t pay for its RAM, but for the value you co-created with it. When you first buy it, its value is at its lowest.

So, we believe intelligence will redefine the value of hardware, and it will no longer be just a tool.

Our model can run flexibly on devices such as smartphones, embodied intelligence, and others. For example, a 3B offline model deployed on a phone ensures user privacy and security, while also providing a smooth user experience.

It is important to emphasize that, in offline scenarios, if multimodal perception can possess memory and autonomous learning capabilities, the value of hardware will inevitably change significantly. This is an entirely new possibility brought by a completely new architecture.

Transformer architectures are almost incapable of reaching this level, because running them on a phone consumes a large portion of RAM.


When Each Device Possesses Its Own Capabilities and Is Able to Learn from the Physical World, Collective Intelligence Will Emerge

What further changes will occur when hardware is equipped with native built-in memory and autonomous learning?

Different from the approaches of OpenAI and DeepSeek, we believe this path leads to collective intelligence. 

When each device posesses its own intelligence and can also learn from the physical world, collective intelligence will emerge.

Collective intelligence is somewhat like human society. No one is capable of everything, and we don't need to create someone who is, nor do we need everyone to be capable of everything. Everyone just needs to have their own area of expertise.

More intelligence emerges from collaboration, and it is through collaboration that real knowledge is created.

MEET2026照片10.png

There are two parts of knowledge: generation and dissemination.

Today's large models—especially those with Transformer architectures—have a major problem. They do not generate knowledge themselves.

True intelligence lies in the ability to generate knowledge. Humans continuously generate knowledge through interaction, precisely because each individual is different and therefore produces different solutions. 

The emergence of true intelligence comes from individuals. After each individual generates information, it is then shared with others, which is how human civilization is gradually formed and developed, rather than relying on a sufficiently smart, cloud-based general-purpose model to create a "single super-model."

The strength of cloud-based general-purpose models lies mainly in the data they collect, which ultimately comes from human experience. If a model does not possess native built-in memory and autonomous learning, it cannot generate true intelligence.

RockAI has always believed that collective intelligence is the best path toward general artificial intelligence, in contrast to the"single super-model" approach advocated by OpenAI. 

This concludes my presentation. Thank you!





syforwa.jpg ph.jpg

Win-Win Cooperation

Make every device its own intelligence

a0icon15.svg

Book Now