MySQL / MariaDB and utf8

Posted by Twisted Bytes on 4 March 2015

Who has not experienced it: your text is not in the database as you have put it in. A é suddenly becomes a à © ...

Anyone who has worked with MySQL and PHP knows the problem and has war stories. Say the word character-set, and the survivors will tell you horror stories. And the culprit in this story is not PHP.

Anyone who has used MySQL will have seen sometimes this text somewhere: latin1_swedish_ci. MySQL was originally created by Michael Widenius and did so in Sweden. And when it is made, we did not do much better than that fits a letter in 8 bits. Obviously this is not true. Therefore, UTF-8 came up and actually default. And everyone uses it, anyway .... ???

By default, we enable our database servers, databases, and database client on UTF8. And we thought we were okay with that. Until we read this blog: https://mathiasbynens.be/notes/mysql-utf8mb4.

It shows that MySQL and MariaDB thus used in the UTF8 character set up only 3 bytes per character to store it in the database. But each character is sometimes used four bytes of utf8. And then it goes wrong in MySQL.

We were not good with our default setting for utf8, we had to use utf8mb4. We have this adjustment, where it could, reflected on our servers and MariaDB by default, all new MariaDB servers set with utf8mb4.

This allows you to be confident that à © not be long coming. You do still have the UTF-8 meta tag in the HTML stand up?

<meta charset = "utf-8">

To ensure that your development databases also make use of the good character set allows you to make these settings on database servers:

[mysqld]
...
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
...

And always use utf8mb4 when creating tables and columns.

 

Tags: , ,