[deleted]

⁨24⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨SuspciousCarrot78@lemmy.world⁩ to ⁨technology@lemmy.world⁩

[deleted]

source

Comments

Sort:hotnew top

mech@feddit.org ⁨2⁩ ⁨months⁩ ago

For 50m, drive unless it’s an exceptionally pleasant, sheltered walk you’d enjoy.

You can tell the AI was trained on US data.

source
- SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨months⁩ ago
  We even lie to our machines, eh?
  
  www.youtube.com/watch?v=ORzNZUeUHAM
  
  Qwen’s an alibaba cook (though the router works with anything). Irrespective of that, yeah…I dunno why they tend to default to “walk”.
  
  I mean, I can probably figure it out, but cloud based LLMs are black boxes…and I’m not a fan of that.
  
  source
kkj@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago

Time: Driving takes ~1 minute; walking takes ~8-12 minutes.

Averaging that to 10 minutes, the LLM thinks that you move 5m/min, or 1m per 12 seconds? 8cm/sec? That’s about 20% faster than a three-toed sloth.

source
- SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Exactly.
  
  The machines tell elegant lies. Don’t trust them.
  
  Ok, maybe not elegant. Stupid. Or maybe the think we’re three toed sloths.
  
  source
fubarx@lemmy.world ⁨2⁩ ⁨months⁩ ago
When LLMs first came out, I asked them a few fun logic puzzles. The kind that Martin Gardner used to publish in Scientific American.

Got total gibberish answers. A while later, tried again. This time, perfect word-for-word responses. Had LLMs become sentient and developed logic? Turned out they had found all the old Scientific American back issues to train on.

Guessing the same is going on with the carwash question. The more posts come out about it, the more likely the LLM responses will get closer to publisher answers.

Lather. Rinse. Repeat.

source
- SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Possible. I do hope they take the more principled approach of solving the global problem for that class of question (I tried to) rather than cheating on the local maxima.
  
  You want generalisability, not parroting.
  
  source
Kjell@lemmy.world ⁨2⁩ ⁨months⁩ ago
I tried Le Chat with the car wash question.

That’s a fun question! Since the car wash is only 50 meters away, walking is definitely the more practical choice. Driving such a short distance would take more time to start the car, drive, and park than it would to simply walk over. Plus, you’ll save a bit of fuel and avoid any unnecessary wear on your car. If you’re already outside and the weather is nice, it’s a great opportunity for a quick stretch!

However, I also asked it with a slightly different phrasing and then the result was different. At least it replies that it might make sense to drive there.

I need to wash my car. The car wash is 50 meters away from my house, should I drive or walk there?

That’s a fun question! Since the car wash is only 50 meters away, walking would be the most practical choice—it’s just a short stroll, and you’d avoid the hassle of starting your car for such a tiny distance. Plus, you’d save a bit of fuel and get a little fresh air! But if you’re planning to wash the car yourself and need to bring supplies, driving might make sense. Are you using an automatic car wash, or are you doing it by hand?

source
- SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨months⁩ ago
  That’s the thing. It’s not that the LLMs can’t solve the problem…it’s the way they’re optimized.
  
  To give the crude analogy: if most LLMs are set up for the equivalent of typing BOOBS on a calculator (the big players are happy to keep it that way; more engagement, smoother vibes etc), constraints first approach is what happens when you use a calculator to do actual maths.
  
  2+2=4 (always, unless shrooms are in play).
  
  I said this before, so pardon me for being gauche and quoting myself
  
  Every reasoning system needs premises - you, me, a 4yr old. You cannot deduce conclusions from nothing. Demanding that a reasoner perform without premises (note: constraints) isn’t a test of reasoning, it’s a demand for magic. Premise-dependence isn’t a bug, it’s the definition.
  
  People see things like Le-Chat fall over and go “Ha ha. Auto-complete go brrr”. That’s lazy framing. A calculator is “just” voltage differentials on silicon. That description is true and also tells you nothing useful about whether it’s doing arithmetic.
  
  My argument is this: the question of whether something is or isn’t reasoning IS NOT answered by describing what it runs on; it’s answered by looking at whether it exhibits the structural properties of reasoning. I think LLMs can do that…they’re just borked (…intentionally?). Case in point - see my top post.
  
  I literally “Tony Stanked” my way to it. Now imagine if someone with resources and a budget did it.
  
  source
sbv@sh.itjust.works ⁨2⁩ ⁨months⁩ ago

However, according to Opper. ai, only 11/53 cloud-based Al passed the test (~20%). Worrying, about the same error rate as humans

lololol

source
- SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Sorry; brain fart. That could have been clearer.
  
  On a single call, only 11 out of 53 LLM got it right (~20%) Humans: about 71.5% (so, almost 1 in 3 gave the incorrect answer)
  
  Of the 20% of LLMs got it right, 5 got it right every time across multiple tests Claude Opus 4.6, Gemini 2.0 Flash Lite, Gemini 3 Flash, Gemini 3 Pro, Grok-4
  
  source
  - sbv@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
    Phew. I’m glad humans did better than bots.
    
    source
    -> View More Comments