That’s… extremely useful to know and highlights the issues I have with databases like MySQL.
IMO, a DB should always have a type defined for a field, and if that type is UTF-8, and it means just the mb3 subset, you should only be able to store mb3 data in it. Not enforcing the field type is what leads to data-based function and security issues. There should also be restrictions on how data is loaded from fields depending on their type, with mb3 allowing for MySQL transform operations and binary requiring a straight read/write, with some process outside the DB itself handling the resulting binary data stream.
/rant
limer@lemmy.dbzer0.com 2 weeks ago
Not only are there different character sets that seem like it’s Unicode, but the set in MySQL can change based on the session, the client, the server, the db , the table and the column. All six of them can have different encodings.
Just make sure all are using the same 4 byte Unicode. Different collation is ok when backing up because only important when comparing strings.