Comment

Comment on My jaw hit the floor when I watched an AI master one of the world's toughest physical games in just six hours

INeedMana@lemmy.world ⁨1⁩ ⁨year⁩ ago

It’s cool but my question is (I did not see this addressed in the article nor video but might have missed it) did it learn to win the game in general terms or only this one example? I mean, if the layout of the board was changed, would it still solve it?

source

Sort:hotnew top

just_another_person@lemmy.world ⁨1⁩ ⁨year⁩ ago
They don’t discuss it here, but it’s most likely an reinforcement model that operates different generations of learned behavior to decide if it’s improving or not.

It would know that the ball going in the hole is “bad”, and then try to avoid that happening. Each move that is "good’ is then kept in a list of moves it should perform in the next generation of its plan to avoid the “bad” things. Loop -> fail -> logic build -> retry. After 6 hours, it has mapped a complete list of “good” moves to affect it’s final outcome.

The thing about these models is less that they will work (it is assumed they eventually will through trial and error), but how efficiently they will work. The number of generational cycles and retries is usually the benchmark when dealing with reinforcement, but they don’t discuss that data here either.

source
- INeedMana@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Yes, but that’s kind of my point
  
  We see it learn something with insane precision but most often it is almost an effect of over-training. It probably would require less time to learn another layout but it’s not learning the general rules (can’t go through walls, holes are bad, we want to get to X), it learns the specific layout. Each time a layout changes, it would have to re-learn it
  
  It is impressive and enables automation in a lot of areas, but in the end it is still only machine learning, adapting weights to specific scenario
  
  source
indomara@lemmy.world ⁨1⁩ ⁨year⁩ ago
It did learn to use shortcuts to skip parts of the maze, and had to be told not to. Super interesting!

source
- INeedMana@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Yes, but that’s only because a generation found some random, specific motion that scored better. Not because it analyzed that doing a skip should be possible
  
  source