Claude has taken control of my computer...
Submitted 6 days ago by FlorianSimon@sh.itjust.works to technology@lemmy.world
https://www.youtube.com/watch?v=DVRg0daTads
Submitted 6 days ago by FlorianSimon@sh.itjust.works to technology@lemmy.world
https://www.youtube.com/watch?v=DVRg0daTads
Hackworth@lemmy.world 6 days ago
I was watching users test this out, and am generally impressed. At one point, Claude tried to open Firefox, but it was not responding. So it killed the process from the console and restarted. A small thing, but not something I would have expected it to overcome this early. It’s clearly not ready for prime time (by their repeated warnings), but I’m happy to see these capabilities finally making it to a foundation model’s API. It’ll be interesting to see how much remains of GUIs (or high level programming languages for that matter) if/when AI can reliably translate common language to hardware behavior.
FierySpectre@lemmy.world 6 days ago
That’s the crazy thing here, it is interacting with programs in a way that is wildly inefficient. At some point stuff like this will be properly integrated, and that both scares and excites me.
Hackworth@lemmy.world 6 days ago
Yeah, using image recognition on a screenshot of the desktop and directing a mouse around the screen with coordinates is definitely an intermediate implementation. Open Interpreter, Shell-GPT, LLM-Shell, and DemandGen make a little more sense to me for anything that can currently done from a CLI, but I’ve never actually tested em.