Interesting; that would be much simpler. I’ll give that a shot in the morning, thanks!
Comment on Searching through a bulk of pdf files
tofu@lemmy.nocturnal.garden 6 days ago
The OCR thing is it’s own task but for just searching a string in PDFs, pdfgrep
is very good.
pdfgrep -ri CoolNumber69 /path/to/folder
Darkassassin07@lemmy.ca 6 days ago
hoppolito@mander.xyz 6 days ago
In case you are already using ripgrep (rg) instead of grep, there is also ripgrep-all (rga) which lets you search through a whole bunch of files like PDFs quickly. And it’s cached, so while the first indexing takes a moment any further search is lightning fast.
It supports a whole truckload of file types (pdf, odt, xlsx, tar.gz, mp4, and so on) but i mostly used it to quickly search through thousands of research papers. Takes around 5 minutes to index everything for my 4000 PDFs on the first run, then it’s smooth sailing for any further searches from there.
Darkassassin07@lemmy.ca 6 days ago
That works magnificently. I added -l so it spits out a list of files instead of listing each matching line in each file, then set it up with an alias. Now I can ssh in from my phone and search the whole collection for any string with a single command.
Thanks again!
tofu@lemmy.nocturnal.garden 5 days ago
Glad to hear that!