Comment on Rabbit R1 AI box revealed to just be an Android app
De_Narm@lemmy.world 6 months ago1% correct is never “fairly high” wtf It’s all about context. Asking a bunch of 4 year olds about questions about trigonometry, 1% of answers being correct would be fairly high. ‘Fairly high’ basically only means ‘as high as expected’ or ‘higher than expected’.
Also if you want a computer that you don’t have to double check, you literally are expecting software to embody the concept of God. This is fucking stupid. Hence, it is useless. If I cannot expect it to be more or less always correct, I can skip using it and just look stuff up myself.
TrickDacy@lemmy.world 6 months ago
Obviously the only contexts that would apply here are ones where you expect a correct answer. Why would we be evaluating a software that claims to be helpful against 4 year old asked to do calculus? I have to question your ability to reason for insinuating this.
So confirmed. God or nothing. Why don’t you go back to quills? Computers cannot read your mind and write this message automatically, hence they are useless
De_Narm@lemmy.world 6 months ago
That’s the whole point, I don’t expect correct answers. Neither from a 4 year old nor from a probabilistic language model.
TrickDacy@lemmy.world 6 months ago
And you don’t expect a correct answer because it isn’t 100% of the time. Some lemmings are basically just clones of Sheldon Cooper
De_Narm@lemmy.world 6 months ago
I don’t expect a correct answer because I’ve used these models quite a lot last year. At least half the answers were hallucinated. And it’s still a common complaint about this product as well if you look at actual reviews (e.g., pretty sure Marques Brownlee mentions it).
FlorianSimon@sh.itjust.works 6 months ago
Something seems to fly above your head: quality is not optional and it’s good engineering practice to seek reliable methods of doing our work. As a mature software person, you look for tools that give less room for failure and want to leave as little as possible for humans to fuck up, because you know they’re not reliable, despite being unavoidable. That’s the logic behind automated testing, Rust’s borrow checker, static typing…
If you’ve done code review, you know it’s not very efficient at catching bugs. It’s not efficient because you don’t pay as much attention to details when you’re not actually writing the code. With LLMs, you have to do code review to ensure you meet quality standards, because of the hallucinations, just like you’ve got to test your work before committing it.
I understand the actual software engineers that care about delivering working code and would rather write it in order to be more confident in the quality of the output.