Or a tool for the translator to save time?
brucethemoose@lemmy.world 1 day ago
I use local instances of Aya 32B (and sometimes Deepseek, Qwen, LG Exaone, Japanese finetunes, others depending on the language) to translate stuff, and it is quite different than Google Translate or any machine translation you find online. They get the “meaning” of text instead of transcribing it robotically like Google, and are actually pretty loose with interpretation.
It has soul… sometimes too much. That’s the problem: It’s great for personal use where it can ocassionally be wrong or flowery, but not good enough for publishing and selling, as the reader isn’t necessarily cognisant of errors.
MonkderVierte@lemmy.ml 23 hours ago
JustTesting@lemmy.hogru.ch 1 day ago
Actually, as to your edit, the it sounds like you’re fine-tuning the model for your data, not training it from scratch. So the llm has seen english and chinese before during the initial training. Also, they represent words as vectors and what usually happens is that similiar words’ vectors are close together. So subtituting e.g. Dad for Papa looks almost the same to an llm. Same across languages. But that’s not understanding, that’s behavior that way simpler models also have.
brucethemoose@lemmy.world 1 day ago
True! Models not trained on a specific language are generally bad at that language.
However, there are some exceptions, like a Japanese tune of Qwen 32B which dramatically enhances it Japanese, but the training has to be pretty extensive.
And even that aside… the effect is still there. The point it to illustrate that LLMs are sort of “language independent” internally, like you said.
br3d@lemmy.world 1 day ago
These language models don’t get the meaning of anything. They predict the next cluster of letters based on the clusters of letters that have come before. Sorry, but if it feels to you like they’re captured the meaning of something, you’re being bamboozled
brucethemoose@lemmy.world 1 day ago
It’s a metaphor.
They’re translating the input tokens to intent in the model’s middle layers, which is a bit more precise.