Comment on Selfhosted & AI
midribbon_action@lemmy.blahaj.zone 20 hours agoI don’t think you need hardly any hardware to do ocr. USPS started doing reliable ocr on 80s hardware. You really think an ai cluster is necessary for that?
Anyways, cool anecdote, not an actual financial study or report, and very long-winded honestly.
curbstickle@anarchist.nexus 19 hours ago
OCR <> data ingest
OCR wouldn’t work, as I mentioned, because of the varying structures of the forms.
I’m sorry my answer was too “long winded” for you, I was trying to be informative, but clearly you aren’t interested in that. Enjoy your day.
midribbon_action@lemmy.blahaj.zone 19 hours ago
Don’t think that’s true. You can run the whole form through, come out with an identical pdf with searchable/copyable text. Even a completely novel form uses the same alphabet. Add some regex to pull out the fields you need to enter, and on failure give it to a human. All of that can be done with python on a raspberry pi. A decade ago.
github.com/ocrmypdf/OCRmyPDF
curbstickle@anarchist.nexus 19 hours ago
You’d be wrong.
The fields aren’t all the same kinds of values, which requires relationship between the data to be evaluated for entry.
You’re assuming this is transposing contents, which was not the issue. Your example is what was initially planned and halted before transitioning to the approach I helped deploy.
midribbon_action@lemmy.blahaj.zone 19 hours ago
That’s how you sound.