Yes, this is very expected to me. What surprises me is the 30 USD per User per month price point. That’s very expensive. (I can make guesses as to why, but it ultimately doesn’t matter.)
Comment on Early Adopters of Microsoft’s AI Bot Wonder if It’s Worth the Money
JeeBaiChow@lemmy.world 9 months ago
Isn’t this pretty much the state of current gen ai, the hype overtaking the reality and all that?
Kissaki@feddit.de 9 months ago
kromem@lemmy.world 9 months ago
Like many tools, there’s a gulf between a skilled user and an unskilled user.
What ML researchers are doing with these models is straight up insane. The kinds of things I didn’t think I’d see in my lifetime, or maybe only in an old age home (a ways off).
If you gave someone who had never used a NLE application to edit a multi track video access to Avid for putting together some family videos, they might not be that impressed with the software.
Similarly, the average person interacting with the models often hits their shortcomings and doesn’t know how to get past them and assumes the software tool is shitty.
As an example, you can go ahead and try the following query to Bing using GPT-4:
Without searching, solve the following puzzle repeating the adjective for each noun: “A man has a vegetarian wolf, a carnivorous goat, and a cabbage. He needs to get them to the other side of a river but the boat which can cross can only take him and one object at a time. How can he cross without any of the objects eating another object?” Think carefully.
It will get it wrong, defaulting to the standard form solution where the goat is taken first. When GPT-4 first released, a number of people thought that this was because it couldn’t solve a variation of the puzzle, lacking the reasoning capabilities.
Turns out, it’s that the token similarity to the standard form trips it up and if you replace the wolf, goat, and cabbage in the prompt above with the emojis for each, it answers perfectly, having the vegetarian wolf go across first, etc. This means the model was fully able to process the context of the implicit relationship between a carnivorous goat eating the wolf and a vegetarian wolf eating the cabbage and adapt the classic form of the answer accordingly. It just couldn’t do it when the tokens were too similar to the original.
So if you assume it’s stupid, see a stupid answer and instead of looking deeper think it confirms your assumption, then you walk away thinking the models suck and are dumb, when really it’s just that like most tools there’s a learning curve to get the most out of them.
dee_dubs@lemmy.world 9 months ago
My problem with this is that your example replies on you already knowing the correct answer, so that you know it’s given you the wrong answer and you can go back and try to trick it into giving a different answer. If you’re asking it a question to which you don’t already know the answer, how would you know if this has happened?
kromem@lemmy.world 9 months ago
Don’t use LLMs in production for accuracy critical implementations without human oversight.
Don’t use LLMs in production for accuracy critical implementations without human oversight.
I almost want to repeat that a third time even.
They weirdly ended up being good at information recall in many cases, and as a result have been being used like that in cases where it really doesn’t matter much if they are wrong some of the time. But the infrastructure fundamentally cannot self-verify.
This is part of why I roll my eyes when I see employment of LLMs vs humans as an exclusionary binary. These are tools to extend and support human labor. Not replace humans in most cases.
So LLMs can be amazing at a wide array of tasks. Like I literally just saved myself a half hour of copying and pasting minor changes in a codebase by having Copilot automate generating methods using a parallel object as a template and the new object’s fields. But I also have unit tests to verify behavior and my own review of what was generated with over a decade of experience under my belt.
Someone who has never programmed using Copilot to spit out code for an idea is going to have a bad time. But they’d have a similar bad time if they outsourced a spec sheet to a code farm without having anyone to supervise deliverables.
Oh, and technically, my example doesn’t actually require you to know the correct answer before asking. It only requires you to recognize the correct answer when you see it. And the difference between those two usecases is massive.
jherazob@kbin.social 9 months ago
But that's what the marketers are selling, "this will replace a lot of workers!" and it just cannot
Lmaydev@programming.dev 9 months ago
I use it all the time at work as a programmer. Not that often for generating code but for learning new languages and frameworks quickly.
I noticed our juniors are able to get up to speed incredibly fast by leaning on it when picking up new things as well.
It is genuinely a game changer when used correctly. The issue I see is people trying to push it everywhere.
JeeBaiChow@lemmy.world 9 months ago
As a reference, I’d use a search engine first, but it’s a matter of personal preference. Usually I’m only short in syntax and a particular language’s native functions. The only benefit I could foresee is avoiding the rude, condescending snarky comments from the experienced developers on stackexchange and the like, but I almost never register to post, so avoid all that. I did see a benefit in the area of (real) language learning, when I can ask it to translate something. Then break down specific parts of the response for clarification, switching between my native and the language I’m trying to learn. That was mind blowing.
Lmaydev@programming.dev 9 months ago
I use it instead of a search engine now.
Rather than skimming a few blog posts looking for the particular info I want it pulls exactly what I need, summarizes it and provides sources and allows follow up questions.
TimeSquirrel@kbin.social 9 months ago
That's exactly it. I know HOW to program generically. I know what control flow is, how memory works, what a pointer and an object is. I just need some coaching on syntax because it's all just too much to memorize in one lifetime. But once I see it written and used in front of me, I can easily determine if it's any good or not.