Comment

Comment on AI chatbots unable to accurately summarise news, BBC finds

I don’t think giving the temperature knob to end users is the answer.

Turning it to max for max correctness and low creativity won’t work in an intuitive way.

Sure, turning it down from the balanced middle value will make it more “creative” and unexpected, and this is useful for idea generation, etc. But a knob that goes from “good” to “sort of off the rails, but in a good way” isn’t a great user experience for most people.

Most people understand this stuff as intended to be intelligent. Correct. Etc. Or they At least understand that’s the goal. Once you give them a knob to adjust the “intelligence level,” you’ll have more pushback on these things not meeting their goals. “I clearly had it in factual/correct/intelligent mode. Not creativity mode. I don’t understand why it left our these facts and invented a back story to this small thing mentioned…”

Not everyone is an engineer. Temp is an obtuse thing.

source

Sort:hotnew top

brucethemoose@lemmy.world ⁨8⁩ ⁨months⁩ ago
Temperature isn’t even “creativity” per say, it’s more a band-aid to patch looping and dryness in long responses.

Lower temperature is much better with modern sampling algorithms, E.G., MinP, DRY, maybe dynamic temperature like mirostat and such. Ideally, structure output, too. Unfortunately, corporate APIs usually don’t offer this.

It can be mitigated with finetuning against looping/repetition/slop, but most models are the opposite, massively overtuned on their own output which “inbreeds” the model.

And yes, domain specific queries are best. Basically the user needs separate prompt boxes for coding, summaries, creative suggestions and such each with their own tuned settings (and ideally tuned models). You are right, this is a much better idea than offering a temperature knob to the user, but… most UIs don’t even do this for some reason?

What I am getting at is this is not a problem companies seem interested in solving.
source
Eheran@lemmy.world ⁨8⁩ ⁨months⁩ ago
This is really a non-issue, as the LLM itself should have no problem at setting a reasonable value itself. User wants a summary? Obviously maximum factual. He wants gaming ideas? Etc.

source
- brucethemoose@lemmy.world ⁨8⁩ ⁨months⁩ ago
  For local LLMs, this is an issue because it breaks your prompt cache and slows things down, without a specific tiny model to “categorize” text… which no one has really worked on.
  
  I don’t think the corporate APIs or UIs even do this.
  
  You are not wrong, but it’s just not done for some reason.
  
  source