Not only are there different character sets that seem like it’s Unicode, but the set in MySQL can change based on the session, the client, the server, the db , the table and the column. All six of them can have different encodings.
Just make sure all are using the same 4 byte Unicode. Different collation is ok when backing up because only important when comparing strings.
modeler@lemmy.world 1 year ago
This is the right answer. I had the job of planning a schema update to fix this shitty design.
Saying that, unicode and character formats are incredibly complex things that are not easily implemented. For example two strings in utf-8 can contain the same number of characters but be hugely different in size (up to 3-4x different!). It’s well worth reading through some articles to get a feel of the important points.