Comment

Comment on Did ChatGPT come up with Trump’s tariff rate formula? AI chatbots ChatGPT, Gemini, Claude and Grok all return the same formula for reciprocal tariff calculations, several X users claim.

<- View Parent

MartianSands@sh.itjust.works ⁨9⁩ ⁨months⁩ ago

Except these AI systems aren’t search engines, and people treating them like they are is really dangerous

source

Sort:hotnew top

UnderpantsWeevil@lemmy.world ⁨9⁩ ⁨months⁩ ago
The basic graphing technology used by AI is the same pioneered by Alta Vista and optimized by Google years later. We’ve added a layer of abstraction through user I/O, such that you get a formalized text response encapsulating results rather than a series of links containing related search terms. But the methodology used to harvest, hash, and sort results is still all rooted in graph theory.

source
- MartianSands@sh.itjust.works ⁨9⁩ ⁨months⁩ ago
  That simply isn’t true. There’s nothing in common between an LLM and a search engine, except insofar as the people developing the LLM had access to search engines, and may have used them during their data gathering efforts for training data
  
  source
  - DarkCloud@lemmy.world ⁨9⁩ ⁨months⁩ ago
    “data gathering” and “training data” is just what they have you calling it.
    
    It’s not data gathering, it’s stealing. It’s not training data, it’s our original work.
    
    source
    MartianSands@sh.itjust.works ⁨9⁩ ⁨months⁩ ago
    You’re putting words in my mouth, and inventing arguments I never made.
    
    I didn’t say anything about whether the training data is stolen or not. I also didn’t say a single word about intelligence, or originality.
    
    I haven’t been tricked into using one piece of language over another, I’m a software engineer and know enough about how these systems actually work to reach my own conclusions.
    
    There is not a database tucked away in the LLM anywhere which you could search through and find the phrases which it was trained on, it simply doesn’t exist.
    
    That isn’t to say it’s completely impossible for an LLM to spit out something which formed part of the training data, but it’s pretty rare. 99% of what it generates doesn’t come from anywhere in particular, and you wouldn’t find it in any of the sources which were fed to the model in training.
    
    source
    -> View More Comments
DarkCloud@lemmy.world ⁨9⁩ ⁨months⁩ ago
They are. They record the data, stealing it. They search it, and reprint it (in whole or in part) upon request.

They search the data-space or what they’re trained on (our content, the content of human beings), and reproduce statistically defined elements of it.

They’re search engines that have stolen what they’re trained on, and reproduce it as “results”.

Searching and reproducing content they’ve already recorded, is absolutely part of what they are.

source
- futatorius@lemm.ee ⁨8⁩ ⁨months⁩ ago
  
  They are.
  
  Their input sides are based on crawling, just as search is.
  
  source
  - DarkCloud@lemmy.world ⁨8⁩ ⁨months⁩ ago
    Yeah, and then they convert that to a weighted probabilities or a “data space” they then search during content generation.
    
    source