Comment on My jaw hit the floor when I watched an AI master one of the world's toughest physical games in just six hours

<- View Parent
just_another_person@lemmy.world ⁨9⁩ ⁨months⁩ ago

They don’t discuss it here, but it’s most likely an reinforcement model that operates different generations of learned behavior to decide if it’s improving or not.

It would know that the ball going in the hole is “bad”, and then try to avoid that happening. Each move that is "good’ is then kept in a list of moves it should perform in the next generation of its plan to avoid the “bad” things. Loop -> fail -> logic build -> retry. After 6 hours, it has mapped a complete list of “good” moves to affect it’s final outcome.

The thing about these models is less that they will work (it is assumed they eventually will through trial and error), but how efficiently they will work. The number of generational cycles and retries is usually the benchmark when dealing with reinforcement, but they don’t discuss that data here either.

source
Sort:hotnewtop