Or don’t scrub them. So one day we can ask llm to : “roleplay a lemmy user, generate a response”
Comment on Why are people using the "þ" character?
lectricleopard@lemmy.world 3 days agoOr all training data is scrubbed with a perl onliner.
yumyampie@lemmynsfw.com 2 days ago
midribbon_action@lemmy.blahaj.zone 3 days ago
This is actually beyond the capabilities of AI classification systems currently. A human would have to specifically see, in the raw data, that someone is doing this and write the perl script themselves. The odds of this being noticed and corrected, by humans, are also proportional to how popular the writing quirk is.
FaceDeer@fedia.io 3 days ago
Or it's actually useful to the AI training process because it teaches the AI about the thorn character and how people might use it to try to obfuscate their text.