• CO/AI
  • Posts
  • Toward General Computer Control

Toward General Computer Control

General Computer Control:

  • A landmark study out of the Beijing Academy of Artificial Intelligence explored the capabilities of AI agents in General Computer Control (GCC) settings such as video games

  • Researchers Unveiled the Cradle framework that processes input from video sequences and outputs keyboard and mouse commands based on internal reasoning

“In this work, we propose…building foundation agents that can master any computer task by taking only screen images (and possibly audio) of the computer as input, and producing keyboard and mouse operations”

Beijing Academy of Artificial Intelligence

The Framework:

Cradle Framework (Simplified)

The Experiment:

  • The framework ⬆️ was applied to a case study using Red Dead Redemption (A popular complex video game from the makers of Grand Theft Auto)

  • Where an AI agent played the game in real-time with this framework

The Framework in Action

Early Results:

CRADLE exhibits strong performance in learning new skills, following the game storyline, and accomplishing real missions in the game. To the best of our knowledge, this is the first LMM-based agent that has managed to complete concrete missions from scratch in AAA games.

Why it Matters:

  • The Development of generalized AI capabilities is pivotal for the pursuit of Artificial General Intelligence (AGI).

  • Training AI across a universal platform — the computer — aims to eliminate the need for tailored AI solutions for each unique task.

    • The ability to universally apply an AI model could mean completely automating complex procedures.

Did you Like This issue?

Login or Subscribe to participate in polls.

Reply

or to participate.