Archive for January, 2010

Upgrading from MySQL 4 – user stories

Thursday, January 7th, 2010

Earlier Peter at Percona posted an interesting article about MySQL upgrades. As a database consultant, it’s not uncommon for me to have enterprise-level customers that still run MySQL 4.0 or 4.1  - for a number of reasons that I will enumerate later, this kind of migration can be tricky. I  just finished such a major upgrade so I will share my opinion on the subject.

The Methodology

If you are in a replicated environment you may want to migrate one of your slaves first as a prototype. Dumping and reloading is the recommended way because of many storage file format changes between 4.0 and 5.0 versions. For heavy databases I usually choose the parallel dump approach to save some precious time, either maatkit’s mk-parallel-dump or domas mituzas’ mydumper tools will do the trick.

Note that binary upgrades are still possible between 5.0 and 5.1 since the storage definitions haven’t changed.

Replication issues

If you follow the MySQL manual to the letter, MySQL supports replication from one major version to the next higher major version. However in practice it’s quite different.

In the case I’ve treated, replication was working fine between 4.0 and 5.0. The major changes concern timezone handling so if your servers are in different TZ’s you might want to look into the code.

MySQL replication was working also between 4.0 and 5.1, but a recent commit broke it – and they’re not going to fix it :) You may still use 5.1.37 or domas’ patch if you like.

Also in practice, replication may work between previous major versions, provided that the binlog format is the same. I had 4.1 to 4.0 replication working – do that at your own risk since it’s unsupported and may not function in your specific case.

I’ve been looking into Tungsten Replicator also, it looks like a promising project but you can only start replicating from a fresh install, there’s nothing such as a CHANGE MASTER TO instruction right now so it’s kind of useless from already existing replication setups.

Data storage issues

The biggest issue that will hurt you when migrating from 4.x to 5.x is the change made to the DECIMAL field. Prior to 5.0, the values in a DECIMAL field were stored as a string, including the sign. That means that an unsigned DECIMAL field could actually allow one digit in excess, using the space reserved for the minus sign. Ex. DECIMAL(5,2) could actually contain ‘1000.00′. MySQL 5.0 reverts to strict SQL compliance and the extra digit is not allowed anymore.

If you migrate your schema and data to 5.0, all the values in excess in DECIMAL fields will be truncated to the lowest possible value (i.e., in our example to ‘999.99′). If your customer is storing financial data then you might be in trouble when the time of calculation is coming.

Also a minor problem which comes to mind is the issue of trailing spaces in VARCHAR fields. Before 5.x the trailing spaces would be stored. MySQL 5.0 and later again reverts to SQL compliance and the field would be trimmed before being stored; that means that if your master is in the lower version, a LENGTH(field) will return a different value on the slave.

Syntax issues, why your SELECTs might fail

MySQL 5.0 introduces JOIN syntax compliant to the SQL standard, so depending on the level of respect from the developers to the standard, a number of queries might fail. If your application is planning multi-table joins i.e. SELECT … FROM (t1, t2) INNER JOIN t3 ON … the tables to be joined must be inside brackets or the query will fail with a syntax error. Also the order of referenced tables is relevant in 5.0 – the parser will analyze the query from left to right, so if  your query is messed up that’s another syntax error.

You may want to log a sample of queries on your older setup and replay them on your new setup to catch the errors. Peter suggested using mk-upgrade in the maatkit tool which is a very sensible choice, also on my end I decided to catch live errors using mysql-proxy and the capture filter function.

Performance and optimizer issues

From my own experience, 4.x versions were always faster than 5.x on a simple workload. MySQL 5.0 introduced some overhead due to a heavier parser with the implementation of stored procedures and functions. From one version to the other you will almost always see regressions especially when the new version is fresh out of the box and regressions haven’t been fixed. This was a notable case with 5.1 release but since integration of the InnoDB plugin has solved many regressions.

In my case, upgrade tests (query log replaying) were 1.5x times faster using mysql 4. But introducing mysql 5 fixed a lot of pending problems, notably on slow queries – many optimizations have been done around DISTINCT or GROUP BY operations, for instance. So usually in the long run it’s a winning upgrade.

Regarding the optimizer I noticed some different behaviors that were easily fixed. My customer was using indexes on VARCHAR fields (that’s usually a bad practice, but due to the historical nature of the schema, difficult to fix). MySQL 5 is extending the length of VARCHAR, so will be the key length for the optimizer. It will behave differently in 5.x because the optimizer will probably look for the “shortest path” i.e. it may chose another index based on the total key length. I solved this by altering the VARCHAR fields to CHAR – actually the data stored inside the field was already fixed size. This is also a reason to remind not to use text fields as join keys when possible.

When you should upgrade

To conclude about upgrades and when you should do them, though it’s usually smart not to upgrade when a version is just out in General Availability, you should monitor potential MySQL developments and experience returns. Also lifecycle of a MySQL version is 2 to 3 years so you should take care not to lag behind versions. MySQL 5.1 is GA for one year already and it’s now very safe, both bugwise and performance-wise to migrate now. Keep in mind that you can also try upgrades on development or staging platforms and check if your app has any regressions on it.

Entries (RSS)