Comment

Comment on Firefox 148 introduces the promised AI kill switch for people who aren't into LLMs

<- View Parent

bridgeenjoyer@sh.itjust.works ⁨9⁩ ⁨hours⁩ ago

OCR did that for 20 years .

Nothing these slop generators do is novel or new.

source

Sort:hotnew top

BastingChemina@slrpnk.net ⁨6⁩ ⁨hours⁩ ago
I remember using Google translate that was doing that live on the phone camera and translating the text at the same time 15 years ago.

source
dual_sport_dork@lemmy.world ⁨4⁩ ⁨hours⁩ ago
Random aside to rant about consumer OCR.

Recently for my work I had to do some OCR stuff to get some numbers out of a document that the vendor in their infinite wisdom refused to provide in an editable/selectable form. I.e. they just slapped a .jpeg onto a page and saved it as a .pdf. (This is a separate thing that infuriates me.)

Anyway, what I’m actually here to complain about is the baffling phenomenon that every single piece of OCR software I tried ranging from open source to trials of commercial programs, to the thingy that came with one of our all-in-one printer/scanners, and everything in between is that it’s somehow still exactly as crap as the lousy OCR programs we were all struggling with in the late '90s.

I have absolutely no idea how this particular facet of technology in particular has utterly and categorically failed to make any forward progress whatsoever in literal decades. I’ve personally worked on machine vision driven pick-and-place machines capable of accurately determining the orientation of densely printed cosmetics tubes, among other items, and placing them all face up in a box several times per second. Yet somehow the latest and greatest OCR transcription algorithms still can’t tell a 5 from a 6 or ye gods forbid an S, or an L from a J, or an M from a collection of back and forward slashes, all despite being handed crisp high contrast seriffed text that’s at least 60 pixels high.

Given the incredibly low bar for performance here given that apparently every single programmer involved just walked away circa about 2001, I can’t imagine that the current slop generation machines fare any better…

source
- teuniac_@lemmy.world ⁨4⁩ ⁨hours⁩ ago
  I have tried some of the popular LLMs a few months back when I had to digitise an old policy document from which only an old scan still existed. I had trouble reading it.
  
  The results varied wildly. OpenAI was really poor at it while Gemini got it right completely. I was quite impressed. ABBYY FineReader is supposed to be the best non-LLM software for OCR, but it doesn’t come near the performance of Gemini
  
  source
lolola@lemmy.blahaj.zone ⁨7⁩ ⁨hours⁩ ago
How else do people think we were translating all that hentai before the slop generators took off

source
Jakeroxs@sh.itjust.works ⁨7⁩ ⁨hours⁩ ago
OCR kinda sucked lmao

source
- brianary@lemmy.zip ⁨4⁩ ⁨hours⁩ ago
  Always worked well enough for me.
  
  source
  - Jakeroxs@sh.itjust.works ⁨4⁩ ⁨hours⁩ ago
    I remember trying to use some pre-LLM OCRs and it often got hand-writing really poorly. LLM backed seems to perform generally better, now typed OCR was usually pretty good.
    
    source