I’d be hesitant to trust it with “summarize this obtuse spec document” when half the time said documents are self-contradictory or downright wrong. Again, plausible bullshit isn’t suitable.
That’s why I have my doubts when people say it’s saving them a lot of time or effort. I suspect it’s planting bombs that they simply haven’t yet found. Like it generated code and the code seemed to work when they ran it, but it contains a subtle bug that will only be discovered later. And the process of tracking down that bug will completely wreck any gains they got from using the LLM in the first place.
Same with the people who are actually using it on human languages. Like, I heard a story of a government that was overwhelmed with public comments or something, so they were using an LLM to summarize those so they didn’t have to hire additional workers to read the comments and summarize them. Sure… and maybe it’s relatively close to what people are saying 95% of the time. But 5% of the time it’s going to completely miss a critical detail. So, you go from not having time to read all the public comments so not being sure what people are saying, to having an LLM give you false confidence that you know what people are saying even though the LLM screwed up its summary.
okwhateverdude@lemmy.world 5 months ago
I, too, work in fintech. I agree with this analysis. That said, we currently have a large mishmash of regexes doing classification and they aren’t bulletproof. It would be useful to see about using something a fine-tuned BERT model for doing classification for transactions that passed through the regex net without getting classified. And the PoC would be would be just context stuffing some examples for a few-shot prompt of an LLM and a constrained grammar (just the classification, plz). Because our finance generalists basically have to do this same process, and it would be nice to augment their productivity with a hint: “The computer thinks it might be this kinda transaction”