As someone who sometimes makes demos of our own AI products at work for internal use, you have no idea how much time I spend on finding demo cases where LLM output isn’t immediately recognizable as bad or wrong…
To be fair it’s pretty much only the LLM features that are like this. We have some more traditional AI features that work pretty well. I think they just tagged on LLM because that’s what’s popular right now.