DeepMind's Breakthrough: AI Learns Culture and Achieves AGI Milestone
DeepMind's breakthrough in AI involves training an agent in a 3D environment without human data, showcasing significant progress toward Artificial General Intelligence.
The learning capacity of AI in the human world has thus far been confined to the language level.
Feeding large models with language data, initially from sources like Wikipedia and Reddit and later expanding to include audio, visual images, and even radar and thermal images, has been a common practice. Entrepreneurs in the generative AI space view an extremely intelligent large language model as the potential solution to achieving AGI.
Imagination about unknown life forms, including silicon-based life, is limited. When discussing extraterrestrial life, the first thought that comes to mind is extraterrestrial languages, as seen in "The Three-Body Problem." Language serves as the operating system of human civilization, and the fear is that an AI mastering human language could infiltrate the entirety of human civilization, as expressed by Yuval Noah Harari, the author of "Sapiens."
However, AI's occupation of human language resources represents the current limit of human imagination regarding the threat of AI. In other words, things that cannot be abstracted into language and expressed and recorded, AI cannot learn. Thus, the ability to gather life experiences from the surrounding environment becomes the last bastion for humanity when facing the existential question posed by AI.
Until DeepMind released a new paper, suggesting that this last bastion may also be in jeopardy.
DeepMind's Latest Research
Avishkar Bhoopchand, a senior research engineer at DeepMind, who also oversees various African AI technology communities, and Bethanie Brownfield, a team leader at DeepMind with 5 years of experience in various gaming companies, recently published a new research paper in the journal Nature.
In simple terms, they trained an intelligent agent in a 3D simulation environment using neural networks combined with reinforcement learning. This agent, having never used any pre-collected human data, learns from scratch within the simulated environment and acquires human-like behavior.
In this experiment, AI and the concept of "Culture" are linked, seemingly for the first time.
Broadly speaking, when discussing human "intelligence," it can be understood as the ability to effectively acquire new knowledge, skills, and behaviors. More practically, it is the ability to achieve goals through a series of actions in appropriate contexts. For example:
How to use formulas and auxiliary lines to solve a geometry problem.
How to turn a recipe from a little red book into a dish on the dinner table.
How to start a profitable company.
All these demonstrate intelligence.
The examples mentioned in this paper are simpler—how to follow a guide during a tour or how to explain to a colleague how to use a printer.
In fact, many of the skills we possess are not learned in a rigid manner. Human intelligence relies heavily on our ability to efficiently acquire knowledge from others. This knowledge is collectively referred to as culture, and the process of transmitting knowledge from one individual to another is known as cultural transmission.
Cultural transmission is a social behavior that relies on the real-time, high-fidelity acquisition and use of information from each other by the entire group. This ultimately leads to the accumulation and refinement of skills, tools, and knowledge, resulting in the formation of civilization and the high-stability transfer of knowledge between individuals and even generations. This entire process does not start from a set of designed books or video courses.
When AI researchers worry that the data fed to large models will be depleted in 5 years, it is based on the assumption that AI has a significant blind spot—the inability to abstractly capture and utilize divergent information directly from the environment.
Intelligence and Cultural Transmission
DeepMind introduced GoalCycle3D in the training of the intelligent agent—a 3D physical simulation task space built in Unity. Looking at the image, it is evident that this space has rugged terrain and various obstacles. There are colored spherical targets between obstacles and complex terrain, and passing through these targets in a specific cycle results in positive rewards.
DeepMind set up an "omniscient view" in this space to understand how the red intelligent agent, which knows how to act to receive rewards, interacts with the blue intelligent agent, which has no game experience and is the "learner."
Achieving a high reward score is considered a form of "culture." A completely game-naive intelligent agent has a cultural transmission (CT) value of 0, while an agent entirely dependent on an expert has a CT value of 0.75. An agent that can perfectly follow the red agent when it is present and continue to achieve high scores when the red agent leaves has a CT value of 1.
The results of the experiment show that, in a randomly generated fictional world, the blue intelligent agent learned and surpassed the "high-scoring" culture through reinforcement learning. This process went through four different training stages.
- In the first stage, the blue agent familiarized itself with the task, learning representation, movement, and exploration but did not show significant improvement in scoring.
- In the second stage, the blue agent gained enough experience and failed attempts, learning its first skill: following the red agent. Its CT value eventually reached 0.75, indicating a pure following behavior.
- In the third stage, the blue agent remembered the rewarding cycle when the red agent was present and could continue to solve the task when the red agent was absent.
- In the final fourth stage, the blue agent could independently achieve higher scores without following the red agent. This is reflected in the training cultural transmission measure falling to 0—meaning the blue agent no longer followed the red agent—but the score continued to increase. More precisely, the blue agent exhibited an "experimental" behavior at this stage, even starting to use hypothesis testing to infer the correct cycle, rather than relying on the robot's guidance. Consequently, the blue agent surpassed the red agent, more effectively obtaining the cyclic reward.
This experiment, starting with imitation learning and then using deep reinforcement learning to continue self-optimization and even finding solutions that surpass the one being imitated, demonstrates that AI agents can learn and mimic behaviors by observing other intelligent agents. This ability to start from zero samples and acquire and utilize information in real-time and with high fidelity is very close to how humans accumulate and refine knowledge across generations.
This research is considered a significant step towards Artificial General Intelligence (AGI), and such an important step was achieved by DeepMind in a game.
Historical Background in AGI Research
DeepMind had previously accomplished a similar disruption in another game, but that time it disrupted itself. The game in question was Go.
On March 12, 2016, Lee
Sedol resigned. This meant that humans suffered a complete defeat in Go, a game created by humans themselves. Notably, AlphaGo, without sitting across from its opponent, completed training on 160,000 game records in a matter of months and then was defeated.
The entity that defeated AlphaGo was AlphaGo Zero—an AI player that had never seen any game records, learning only from the basic rules of Go. The version commemorating the defeat of Lee Sedol was called AlphaGo Lee, and AlphaGo Zero completely defeated AlphaGo Lee with a perfect record of 100:0, having trained for only three days at that time.
At the time, AlphaGo Zero, much like the blue agent in GoalCycle3D, had no unsupervised learning, did not use any human experience, and ultimately caught up with and defeated its predecessor.
Richard Everett, who joined DeepMind as an intern in 2016, is also one of the 18 authors of this paper. The interaction between human players and seemingly intelligent computer-controlled players in electronic games fascinated him, eventually leading him into the field of artificial intelligence. This project on "AI Learning Cultural Transmission" is one of his favorite projects at DeepMind.
Describing his work at DeepMind, Richard Everett said it feels like being a child in the world's largest candy store. The research presented in this paper is credited to more than two years of close collaboration between artists, designers, ethicists, project managers, QA testers, as well as scientists, software engineers, and research engineers.
Conclusion
The success of AlphaGo Zero prompted DeepMind to continue adhering to the technical route of deep reinforcement learning in AGI research, leading to everything presented in GoalCycle3D. This large-scale game experiment on the road to AGI is still ongoing. On the X platform, the latest tweet under Google DeepMind's homepage reads:
"Welcome Gemini."
Learn more:
ZGY:Share more interesting programming and AI knowledge.