Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

AGI achieved 🤖

⁨0⁊ ⁨likes⁊

Submitted ⁨⁨10⁊ ⁨months⁊ ago⁊ by ⁨cyrano@lemmy.dbzer0.com⁊ to ⁨[deleted]⁊

https://lemmy.dbzer0.com/pictrs/image/7efced45-504a-4177-a992-a5a2ce0e8b6f.webp

source

Comments

Sort:hotnewtop
  • ZILtoid1991@lemmy.world ⁨9⁊ ⁨months⁊ ago

    Reality:

    The AI was trained to answer 3 to this question correctly.

    Wait until the AI gets burned on a different question. Skeptics will rightfully use it to criticize LLMs for just being stochastic parrots, until LLM developers teach their models to answer it correctly, then the AI bros will use it as a proof it becoming “more and more human like”.

    source
    • outhouseperilous@lemmy.dbzer0.com ⁨9⁊ ⁨months⁊ ago

      No but see they’re not skeptics, they’re just haters, and there is no valid criticism of this tech. Sorry.

      And also youve just been banned from like twenty places tor being A FANATIC “anti ai shill”. Genuinely check the mod log, these fuckers are cultists.

      source
  • jsomae@lemmy.ml ⁨9⁊ ⁨months⁊ ago

    When we see LLMs struggling to demonstrate an understanding of what letters are in each of the tokens that it emits or understand a word when there are spaces between each letter, we should compare it to a human struggling to understand a word written in IPA format (/sʌtʃ əz ðɪs/) even though we can understand the word spoken aloud perfectly fine.

    source
    • GandalftheBlack@feddit.org ⁨9⁊ ⁨months⁊ ago

      But if you’ve learned IPA you can read it just fine

      source
      • jsomae@lemmy.ml ⁨9⁊ ⁨months⁊ ago

        I know IPA but I can’t read English text written in pure IPA as fast as I can read English text written normally. I think this is the case for almost anyone who has learned the IPA and knows English.

        source
  • sheetzoos@lemmy.world ⁨10⁊ ⁨months⁊ ago

    Honey, AI just did something new. It’s time to move the goalposts again.

    source
  • Echo5@lemmy.world ⁨10⁊ ⁨months⁊ ago

    Maybe OP was low on the priority list for computing power? Idk how this stuff works

    source
  • bitjunkie@lemmy.world ⁨10⁊ ⁨months⁊ ago

    Deep reasoning is not needed to count to 3.

    source
    • sheetzoos@lemmy.world ⁨10⁊ ⁨months⁊ ago

      It is if you’re creating ragebait.

      source
  • RizzoTheSmall@lemm.ee ⁨10⁊ ⁨months⁊ ago

    o3-pro? Damn, that’s an expensive goof

    source
  • UrPartnerInCrime@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

    Worked well for me

    Image

    source
    • lordbritishbusiness@lemmy.world ⁨9⁊ ⁨months⁊ ago

      One of the interesting things I notice about the ‘reasoning’ models is their responses to questions occasionally include what my monkey brain perceives as ‘sass’.

      I wonder sometimes if they recognise the trivialness of some of the prompts they answer, and subtilly throw shade.

      One’s going to respond to this with ‘clever monkey! 🐒 Have a banana 🍌.’

      source
    • cashsky@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

      What is that font bro…

      source
      • UrPartnerInCrime@sh.itjust.works ⁨9⁊ ⁨months⁊ ago

        Its called sweetpea and my sweatpea picked it out for me. How dare I stick with something my girl picked out for me.

        But the fact that you actually care what font someone else uses is sad

        source
        • -> View More Comments
    • ynthrepic@lemmy.world ⁨10⁊ ⁨months⁊ ago

      Nice Rs.

      source
    • nyamlae@lemmy.world ⁨10⁊ ⁨months⁊ ago

      Is this ChatGPT o3-pro?

      source
      • UrPartnerInCrime@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

        ChatGPT 4o

        source
  • Korhaka@sopuli.xyz ⁨10⁊ ⁨months⁊ ago

    I asked it how many Ts are in names of presidents since 2000. It said 4 and stated that “Obama” contains 1 T.

    source
    • TheOakTree@lemm.ee ⁨10⁊ ⁨months⁊ ago

      Toebama

      source
  • jsomae@lemmy.ml ⁨10⁊ ⁨months⁊ ago

    People who think that LLMs having trouble with these questions is evidence one way or another about how good or bad LLMs are just don’t understand tokenization. This is not a big-picture problem that indicates LLMs is deeply incapable. You may hate AI but that doesn’t excuse being ignorant about how it works.

    source
    • moseschrute@lemmy.world ⁨10⁊ ⁨months⁊ ago

      Also just checked and every open ai model bigger than 4.1-mini can answer this. I think the joke should emphasize how we developed a super power inefficient way to solve some problems that can be accurately and efficiently answered with a single algorithm. Another example is using ChatGPT to do simple calculator math. LLMs are good at specific tasks and really bad at others, but people kinda throw everything at them.

      source
    • __dev@lemmy.world ⁨10⁊ ⁨months⁊ ago

      And yet they can seemingly spell and count (small numbers) just fine.

      source
      • buddascrayon@lemmy.world ⁨10⁊ ⁨months⁊ ago

        The problem is that it’s not actually counting anything. It’s simply looking for some text somewhere in its database that relates to that word and the number of R’s in that word. There’s no mechanism within the LLM to actually count things. It is not designed with that function. This is not general AI, this is a Generative Adversarial Network that’s using its vast vast store of text to put words together that sound like they answer the question that was asked.

        source
      • jsomae@lemmy.ml ⁨10⁊ ⁨months⁊ ago

        what do you mean by spell fine? They’re just emitting the tokens for the words. Like, it’s not writing “strawberry,” it’s writing tokens <302, 1618, 19772>, which correspond to st, raw, and berry respectively. If you ask it to put a space between each letter, that will disrupt the tokenization mechanism, and it’s going to be quite liable to making mistakes.

        I don’t think it’s really fair to say that the lookup 19772 -> berry counts as the LLM being able to spell, since the LLM isn’t operating at that layer. It doesn’t really emit letters directly. I would argue its inability to reliably spell words when you force it to go letter-by-letter or answer queries about how words are spelled is indicative of its poor ability to spell.

        source
        • -> View More Comments
    • untorquer@lemmy.world ⁨10⁊ ⁨months⁊ ago

      These sorts of artifacts wouldn’t be a huge issue except that AI is being pushed to the general public as an alternative means of learning basic information. The meme example is obvious to someone with a strong understanding of English but learners and children might get an artifact and stamp it in their memory, working for years off bad information. Not a problem for a few false things every now and then, that’s unavoidable in learning. Thousands accumulated over long term use, however, and your understanding of the world will be coarser, like the Swiss cheese with voids so large it can’t hold itself up.

      source
      • jsomae@lemmy.ml ⁨10⁊ ⁨months⁊ ago

        You’re talking about hallucinations. That’s different from tokenization reflection errors. I’m specifically talking about its inability to know how many of a certain type of letter are in a word that it can spell correctly. This is not a hallucination per se – at least, it’s a completely different mechanism that causes it than whatever causes other factual errors. This specific problem is due to tokenization, and that’s why I say it has little bearing on other shortcomings of LLMs.

        source
        • -> View More Comments
  • MrLLM@ani.social ⁨10⁊ ⁨months⁊ ago

    We gotta raise the bar, so they keep struggling to make it “better”

    Image

    My attempt

    0000000000000000 0000011111000000 0000111111111000 0000111111100000 0001111111111000 0001111111111100 0001111111111000 0000011111110000 0000111111000000 0001111111100000 0001111111100000 0001111111100000 0001111111100000 0000111111000000 0000011110000000 0000011110000000 Btw, I refuse to give my money to AI bros, so I don’t have the “latest and greatest”

    source
    • ipitco@lemmy.super.ynh.fr ⁨10⁊ ⁨months⁊ ago

      Tested on ChatGPT o4-mini-high

      0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0
      0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0
      0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0
      0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
      0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
      0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0
      0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0
      0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0
      0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
      1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
      1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
      1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
      1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
      0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0
      0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0
      1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0
      

      It sent me this

      source
      • xavier666@lemm.ee ⁨9⁊ ⁨months⁊ ago

        I just murdered a bunch of trees and killed a random dude with the water it used, but it looks good

        Tech bros: “Worth it!”

        source
        • -> View More Comments
  • qx128@lemmy.world ⁨10⁊ ⁨months⁊ ago

    I really like checking these myself to make sure it’s true. I WAS NOT DISAPPOINTED!

    (Total Rs is 8. But the LOGIC ChatGPT pulls out is ……. remarkable!)

    Image

    source
    • AnUnusualRelic@lemmy.world ⁨10⁊ ⁨months⁊ ago

      What is this devilry?

      source
    • scholar@lemmy.world ⁨10⁊ ⁨months⁊ ago

      Image

      source
      • jsomae@lemmy.ml ⁨9⁊ ⁨months⁊ ago

        This is deepseek model right? OP was posting about GPT o3

        source
        • -> View More Comments
    • ipitco@lemmy.super.ynh.fr ⁨10⁊ ⁨months⁊ ago

      Try with o4-mini-high. It’s made to think like a human by checking its answer and doing step by step, rather than just kinda guessing one like here

      source
    • Zacryon@feddit.org ⁨10⁊ ⁨months⁊ ago

      “Let me know if you’d like help counting letters in any other fun words!”

      Oh well, these newish calls for engagement sure take on ridiculous extents sometimes.

      source
      • filcuk@lemmy.zip ⁨10⁊ ⁨months⁊ ago

        I want an option to select Marvin the paranoid android mood: “there’s your answer, now if you could leave me to wallow in self-pitty”

        source
        • -> View More Comments
  • LMurch@thelemmy.club ⁨10⁊ ⁨months⁊ ago

    AI is amazing, we’re so fucked.

    /s

    source
    • Korhaka@sopuli.xyz ⁨10⁊ ⁨months⁊ ago

      Unironically, we are fucked when management think AI can replace us. Not when AI can actually replace us.

      source
  • slaacaa@lemmy.world ⁨10⁊ ⁨months⁊ ago

    Singularity is here

    source
  • ICastFist@programming.dev ⁨10⁊ ⁨months⁊ ago

    Now ask how many asses there are in assassinations

    source
    • Rin@lemm.ee ⁨10⁊ ⁨months⁊ ago

      It works if you use a reasoning model… but yeah, still ass

      Image

      source
    • notdoingshittoday@lemmy.zip ⁨10⁊ ⁨months⁊ ago

      Image

      source
      • rumba@lemmy.zip ⁨10⁊ ⁨months⁊ ago

        ohh god, I never through to ask reasoning models,

        DeepSeekR17b was gold too

        Image

        source
        • -> View More Comments
      • LodeMike@lemmy.today ⁨10⁊ ⁨months⁊ ago

        Man AI is ass at this

        *laugh track*

        source
  • RedstoneValley@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

    It’s funny how people always quickly point out that an LLM wasn’t made for this, and then continue to shill it for use cases it wasn’t made for either (The “intelligence” part of AI, for starters)

    source
    • outhouseperilous@lemmy.dbzer0.com ⁨10⁊ ⁨months⁊ ago

      I would say more “blackpilling”, i genuinely don’t believe most humans are people anymore.

      source
    • merc@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

      then continue to shill it for use cases it wasn’t made for either

      The only thing it was made for is “spicy autocomplete”.

      source
      • jsomae@lemmy.ml ⁨10⁊ ⁨months⁊ ago

        Turns out spicy autocomplete can contribute to the bottom line. Capitalism :(

        source
        • -> View More Comments
    • SoftestSapphic@lemmy.world ⁨10⁊ ⁨months⁊ ago

      Maybe they should call it what it is

      Machine Learning algorithms from 1990 repackaged and sold to us by marketing teams.

      source
      • outhouseperilous@lemmy.dbzer0.com ⁨10⁊ ⁨months⁊ ago

        Hey now, that’s unfair and queerphobic.

        These models are from 1950, with juiced up data sets. Alan turing personally sid a lot of work on them, before he cracked the math and figured out they were shit and would always be shit.

        source
        • -> View More Comments
      • jsomae@lemmy.ml ⁨10⁊ ⁨months⁊ ago

        Machine learning algorithm from 2017, scaled up a few orders of magnitude so that it finally more or less works, then repackaged and sold by marketing teams.

        source
        • -> View More Comments
    • UnderpantsWeevil@lemmy.world ⁨10⁊ ⁨months⁊ ago

      LLM wasn’t made for this

      There’s a thought experiment that challenges the concept of cognition, called The Chinese Room. What it essentially postulates is a conversation between two people, one of whom is speaking Chinese and getting responses in Chinese. And the first speaker wonders “Does my conversation partner really understand what I’m saying or am I just getting elaborate stock answers from a big library of pre-defined replies?”

      The LLM is literally a Chinese Room. And one way we can know this is through these interactions. The machine isn’t analyzing the fundamental meaning of what I’m saying, it is simply mapping the words I’ve input onto a big catalog of responses and giving me a standard output. In this case, the problem the machine is running into is a legacy meme about people miscounting the number of "r"s in the word Strawberry. So “2” is the stock response it knows via the meme reference, even though a much simpler and dumber machine that was designed to handle this basic input question could have come up with the answer faster and more accurately.

      When you hear people complain about how the LLM “wasn’t made for this”, what they’re really complaining about is their own shitty methodology. They build a glorified card catalog. A device that can only take inputs, feed them through a massive library of responses, and sift out the highest probability answer without actually knowing what the inputs or outputs signify cognitively.

      Even if you want to argue that having a natural language search engine is useful (damn, wish we had a tool that did exactly this back in August of 1996, amirite?), the implementation of the current iteration of these tools is dogshit because the developers did a dogshit job of sanitizing and rationalizing their library of data.

      Imagine asking a librarian “What was happening in Los Angeles in the Summer of 1989?” and that person fetching you back a stack of history textbooks, a stack of Sci-Fi screenplays, a stack of regional newspapers, and a stack of Iron-Man comic books all given equal weight? Imagine hearing the plot of the Terminator and Escape from LA intercut with local elections and the Loma Prieta earthquake.

      That’s modern LLMs in a nutshell.

      source
      • Leet@lemmy.zip ⁨10⁊ ⁨months⁊ ago

        Can we say for certain that human brains aren’t sophisticated Chinese rooms…

        source
        • -> View More Comments
      • outhouseperilous@lemmy.dbzer0.com ⁨10⁊ ⁨months⁊ ago

        Yes but have you considered that it agreed with me so now i need to defend it to the death against you horrible apes, no matter the allegation or terrain?

        source
      • Knock_Knock_Lemmy_In@lemmy.world ⁨10⁊ ⁨months⁊ ago

        a much simpler and dumber machine that was designed to handle this basic input question could have come up with the answer faster and more accurately

        The human approach could be to write a (python) program to count the number of characters precisely.

        When people refer to agents, is this what they are supposed to be doing? Is it done in a generic fashion or will it fall over with complexity?

        source
        • -> View More Comments
      • RedstoneValley@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

        That’s a very long answer to my snarky little comment :) I appreciate it though. Personally, I find LLMs interesting and I’ve spent quite a while playing with them. But after all they are like you described, an interconnected catalogue of random stuff, with some hallucinations to fill the gaps. They are NOT a reliable source of information or general knowledge or even safe to use as an “assistant”. The marketing of LLMs as being fit for such purposes is the problem. Humans tend to turn off their brains and to blindly trust technology, and the tech companies are encouraging them to do so by making false promises.

        source
      • jsomae@lemmy.ml ⁨10⁊ ⁨months⁊ ago

        You’ve missed something about the Chinese Room. The solution to the Chinese Room riddle is that it is not the person in the room but rather the room itself that is communicating with you. The fact that there’s a person there is irrelevant, and they could be replaced with a speaker or computer terminal.

        Put differently, it’s not an indictment of LLMs that they are merely Chinese Rooms, but rather one should be impressed that the Chinese Room is so capable despite being a completely deterministic machine.

        If one day we discover that the human brain works on much simpler principles than we once thought, would that make humans any less valuable? It should be deeply troubling to us that LLMs can do so much while the mathematics behind them are so simple. Arguments that because LLMs are just scaled-up autocomplete they surely can’t be very good at anything are not comforting to me at all.

        source
        • -> View More Comments
      • merc@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

        Imagine asking a librarian “What was happening in Los Angeles in the Summer of 1989?” and that person fetching you … That’s modern LLMs in a nutshell.

        I agree, but I think you’re still being too generous to LLMs. A librarian who fetched all those things would at least understand the question. An LLM is just trying to generate words that might logically follow the words you used.

        IMO, one of the key ideas with the Chinese Room is that there’s an assumption that the computer / book in the Chinese Room experiment has infinite capacity in some way. So, no matter what symbols are passed to it, it can come up with an appropriate response. But, obviously, while LLMs are incredibly huge, they can never be infinite. As a result, they can often be “fooled” when they’re given input that semantically similar to a meme, joke or logic puzzle. The vast majority of the training data that matches the input is the meme, or joke, or logic puzzle. LLMs can’t reason so they can’t distinguish between “this is just a rephrasing of that meme” and “this is similar to that meme but distinct in an important way”.

        source
        • -> View More Comments
      • frostysauce@lemmy.world ⁨10⁊ ⁨months⁊ ago

        (damn, wish we had a tool that did exactly this back in August of 1996, amirite?)

        Wait, what was going on in August of '96?

        source
        • -> View More Comments
      • shalafi@lemmy.world ⁨10⁊ ⁨months⁊ ago

        You might just love Blind Sight. Here, they’re trying to decide if an alien life form is sentient or a Chinese Room:

        “Tell me more about your cousins,” Rorschach sent.

        “Our cousins lie about the family tree,” Sascha replied, “with nieces and nephews and Neandertals. We do not like annoying cousins.”

        “We’d like to know about this tree.”

        Sascha muted the channel and gave us a look that said Could it be any more obvious? “It couldn’t have parsed that. There were three linguistic ambiguities in there. It just ignored them.”

        “Well, it asked for clarification,” Bates pointed out.

        “It asked a follow-up question. Different thing entirely.”

        Bates was still out of the loop. Szpindel was starting to get it, though… .

        source
        • -> View More Comments
    • REDACTED@infosec.pub ⁨10⁊ ⁨months⁊ ago

      There are different types of Artificial intelligences. Counter-Strike 1.6 bots, by definition, were AI. They even used deep learning to figure out new maps.

      source
      • ouRKaoS@lemmy.today ⁨10⁊ ⁨months⁊ ago

        If you want an even older example, the ghosts in Pac-Man could be considered AI as well.

        source
        • -> View More Comments
    • BarrelAgedBoredom@lemm.ee ⁨10⁊ ⁨months⁊ ago

      It’s marketed like its AGI, so we should treat it like AGI to show that it isn’t AGI. Lots of people buy the bullshit

      source
      • Knock_Knock_Lemmy_In@lemmy.world ⁨10⁊ ⁨months⁊ ago

        AGI is only a benchmark because it gets OpenAI out of a contract with Microsoft when it occurs.

        source
      • merc@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

        You can even drop the “a” and “g”. There isn’t even “intelligence” here. It’s not thinking, it’s just spicy autocomplete.

        source
        • -> View More Comments
    • Gladaed@feddit.org ⁨10⁊ ⁨months⁊ ago

      Fair point, but a big part of “intelligence” tasks are memorization.

      source
      • BussyCat@lemmy.world ⁨10⁊ ⁨months⁊ ago

        Computers for all intents are purposes have perfect recall so since it was trained on a large data set it would have much better intelligence. But in reality what we consider intelligence is extrapolating from existing knowledge which is what “AI” has shown to be pretty shit at

        source
        • -> View More Comments
  • abfarid@startrek.website ⁨10⁊ ⁨months⁊ ago

    I get the meme aspect of this. But just to be clear, it was never fair to judge LLMs for specifically this. The LLM doesn’t even see the letters in the words, as every word is broken down into tokens, which are numbers. I suppose with a big enough corpus of data it might eventually extrapolate which words have which letter from texts describing these words, but normally it shouldn’t be expected.

    source
    • Zacryon@feddit.org ⁨10⁊ ⁨months⁊ ago

      I know that words are tokenized in the vanilla transformer. But do GPT and similar LLMs still do that as well? I assumed they also tokenize on character/symbol level, possibly mixed up with additional abstraction down the chain.

      source
    • abfarid@startrek.website ⁨10⁊ ⁨months⁊ ago

      I don’t know what part of what I said prompted all those downvotes, but of course all the reasonable people understood, that the “AGI in 2 years” was a stock price pump.

      source
    • kayzeekayzee@lemmy.blahaj.zone ⁨10⁊ ⁨months⁊ ago

      I’ve actually messed with this a bit. The problem is more that it can’t count to begin with. If you ask it to spell out each letter individually (ie each letter will be its own token), it still gets the count wrong.

      source
      • abfarid@startrek.website ⁨10⁊ ⁨months⁊ ago

        In my experience, when using reasoning models, it can count, but not very consistently. I’ve tried random assortments of letters and it can count them correctly sometimes. It seems to have much harder time when the same letter repeats many times, perhaps because those are tokenized irregularly.

        source
    • cyrano@lemmy.dbzer0.com ⁨10⁊ ⁨months⁊ ago

      True and I agree with you yet we are being told all job are going to disappear, AGI is coming tomorrow, etc. As usual the truth is more balanced

      source
  • hornyalt@lemmynsfw.com ⁨10⁊ ⁨months⁊ ago

    “A guy instead”

    source
  • VirgilMastercard@reddthat.com ⁨10⁊ ⁨months⁊ ago

    Biggest threat to humanity

    Image

    source
    • idiomaddict@lemmy.world ⁨10⁊ ⁨months⁊ ago

      I know there’s no logic, but it’s funny to imagine it’s because it’s pronounced Mrs. Sippy

      source
      • merc@sh.itjust.works ⁨10⁊ ⁨months⁊ ago

        How do you pronounce “Mrs” so that there’s an “r” sound in it?

        source
        • -> View More Comments
      • sp3ctr4l@lemmy.dbzer0.com ⁨10⁊ ⁨months⁊ ago

        I was gonna say something similar, I have heard a LOT of people pronounce Mississippi as if it does have an R in it.

        source
      • jaybone@lemmy.zip ⁨10⁊ ⁨months⁊ ago

        And if it messed up on the other word, we could say because it’s pronounced Louisianer.

        source
  • cyrano@lemmy.dbzer0.com ⁨10⁊ ⁨months⁊ ago

    Next step how many r in Lollapalooza

    Image

    source
  • DmMacniel@feddit.org ⁨10⁊ ⁨months⁊ ago

    We are fecking doomed!

    source
  • loomy@lemy.lol ⁨10⁊ ⁨months⁊ ago

    I don’t get it

    source
  • besselj@lemmy.ca ⁨10⁊ ⁨months⁊ ago

    It’s all about weamwork 🤝

    Image

    source