Archive for the ‘English’ Category

Upgrading from MySQL 4 – user stories

Thursday, January 7th, 2010

Earlier Peter at Percona posted an interesting article about MySQL upgrades. As a database consultant, it’s not uncommon for me to have enterprise-level customers that still run MySQL 4.0 or 4.1  - for a number of reasons that I will enumerate later, this kind of migration can be tricky. I  just finished such a major upgrade so I will share my opinion on the subject.

The Methodology

If you are in a replicated environment you may want to migrate one of your slaves first as a prototype. Dumping and reloading is the recommended way because of many storage file format changes between 4.0 and 5.0 versions. For heavy databases I usually choose the parallel dump approach to save some precious time, either maatkit’s mk-parallel-dump or domas mituzas’ mydumper tools will do the trick.

Note that binary upgrades are still possible between 5.0 and 5.1 since the storage definitions haven’t changed.

Replication issues

If you follow the MySQL manual to the letter, MySQL supports replication from one major version to the next higher major version. However in practice it’s quite different.

In the case I’ve treated, replication was working fine between 4.0 and 5.0. The major changes concern timezone handling so if your servers are in different TZ’s you might want to look into the code.

MySQL replication was working also between 4.0 and 5.1, but a recent commit broke it – and they’re not going to fix it :) You may still use 5.1.37 or domas’ patch if you like.

Also in practice, replication may work between previous major versions, provided that the binlog format is the same. I had 4.1 to 4.0 replication working – do that at your own risk since it’s unsupported and may not function in your specific case.

I’ve been looking into Tungsten Replicator also, it looks like a promising project but you can only start replicating from a fresh install, there’s nothing such as a CHANGE MASTER TO instruction right now so it’s kind of useless from already existing replication setups.

Data storage issues

The biggest issue that will hurt you when migrating from 4.x to 5.x is the change made to the DECIMAL field. Prior to 5.0, the values in a DECIMAL field were stored as a string, including the sign. That means that an unsigned DECIMAL field could actually allow one digit in excess, using the space reserved for the minus sign. Ex. DECIMAL(5,2) could actually contain ‘1000.00′. MySQL 5.0 reverts to strict SQL compliance and the extra digit is not allowed anymore.

If you migrate your schema and data to 5.0, all the values in excess in DECIMAL fields will be truncated to the lowest possible value (i.e., in our example to ‘999.99′). If your customer is storing financial data then you might be in trouble when the time of calculation is coming.

Also a minor problem which comes to mind is the issue of trailing spaces in VARCHAR fields. Before 5.x the trailing spaces would be stored. MySQL 5.0 and later again reverts to SQL compliance and the field would be trimmed before being stored; that means that if your master is in the lower version, a LENGTH(field) will return a different value on the slave.

Syntax issues, why your SELECTs might fail

MySQL 5.0 introduces JOIN syntax compliant to the SQL standard, so depending on the level of respect from the developers to the standard, a number of queries might fail. If your application is planning multi-table joins i.e. SELECT … FROM (t1, t2) INNER JOIN t3 ON … the tables to be joined must be inside brackets or the query will fail with a syntax error. Also the order of referenced tables is relevant in 5.0 – the parser will analyze the query from left to right, so if  your query is messed up that’s another syntax error.

You may want to log a sample of queries on your older setup and replay them on your new setup to catch the errors. Peter suggested using mk-upgrade in the maatkit tool which is a very sensible choice, also on my end I decided to catch live errors using mysql-proxy and the capture filter function.

Performance and optimizer issues

From my own experience, 4.x versions were always faster than 5.x on a simple workload. MySQL 5.0 introduced some overhead due to a heavier parser with the implementation of stored procedures and functions. From one version to the other you will almost always see regressions especially when the new version is fresh out of the box and regressions haven’t been fixed. This was a notable case with 5.1 release but since integration of the InnoDB plugin has solved many regressions.

In my case, upgrade tests (query log replaying) were 1.5x times faster using mysql 4. But introducing mysql 5 fixed a lot of pending problems, notably on slow queries – many optimizations have been done around DISTINCT or GROUP BY operations, for instance. So usually in the long run it’s a winning upgrade.

Regarding the optimizer I noticed some different behaviors that were easily fixed. My customer was using indexes on VARCHAR fields (that’s usually a bad practice, but due to the historical nature of the schema, difficult to fix). MySQL 5 is extending the length of VARCHAR, so will be the key length for the optimizer. It will behave differently in 5.x because the optimizer will probably look for the “shortest path” i.e. it may chose another index based on the total key length. I solved this by altering the VARCHAR fields to CHAR – actually the data stored inside the field was already fixed size. This is also a reason to remind not to use text fields as join keys when possible.

When you should upgrade

To conclude about upgrades and when you should do them, though it’s usually smart not to upgrade when a version is just out in General Availability, you should monitor potential MySQL developments and experience returns. Also lifecycle of a MySQL version is 2 to 3 years so you should take care not to lag behind versions. MySQL 5.1 is GA for one year already and it’s now very safe, both bugwise and performance-wise to migrate now. Keep in mind that you can also try upgrades on development or staging platforms and check if your app has any regressions on it.

Testing InnoDB and MySQL 5.1 performance in real workload conditions

Friday, May 2nd, 2008

Since Heikki Tuuri and Peter Zaitsev’s presentations at the MySQL User Conference 2008, I really wanted to try MySQL 5.1 with InnoDB plugin in production work to see how it compares to my current setups (MySQL 5.0.48 with integrated InnoDB).

First of all, the upgrade to MySQL 5.1.23-rc went without any particular quirks – I did not have to dump and reload tables, a simple mysql_upgrade script run was sufficient to put my DB to MySQL 5.1-readiness.

When starting a new MySQL instance you know that you have to do some warm-up work before getting any indicators – the buffer pool will be empty of indexes and data so most operations in the first hour of uptime will be IO-bound. You can also use clever queries to speed warm-up time as indicated by Peter Zaitsev on his blog.

Workload Characteristics

It’s always good to know that benchmarks and system or software upgrades can give different results depending of whether you are I/O-bound or CPU-bound and of your hardware platform characteristics.

In our particular test case, the workload is mostly I/O bound (available InnoDB buffer size is about 1/8 of exploitable dataset size) and the hardware platform is composed of 2 x dual-Core Xeon 5148 with 16G of DDR, and 8×15k SAS drives in RAID-0 mode (BBU controller, write-back enabled).

Average thread concurrency is always around the 32 threads mark which is good for our benchmark case because we want to see how MySQL 5.1 and InnoDB perform in superior workload conditions. The queries’ workload is mostly read-oriented : we’re in a case of a traditional web 2.0 database with very few writes and lots of reads.

Production test

The first indicator that really interests me in this case is how the database performs with current workload – QPS (Query per Second) is generally a good indicator and is available in many tools like innotop.

  • MySQL 5.0.40 + built-in InnoDB: 2.0k QPS avg.
  • MySQL 5.1.23-rc + InnoDB plugin 1.0: 1.5k QPS avg.

As you can see there is a serious performance drop with MySQL 5.1 (about 25% less QPS) which corroborates Peter Zaitsev’s slides from the MySQL UC, noting some potential regressions in 5.1 which lead to a noticeable decrease of performance.

Another indicator which was disappointing was the number of SELECT queries per second, which we graph using RRDTool.

For MySQL 5.0 the number of SELECT per second is usually steady around the 1.2k mark.

With this new installation of MySQL 5.1 and InnoDB, this number was no higher than 900k SELECTs, and the database showed signs of thread congestion at some times : performance dropped to half this number during a few minutes before attaining peak again.

Table compression test

After this rather disappointing start, I wanted to try a new feature of InnoDB plugin : table compression. I went for a 8k compressed page size according to Venu Anuganti’s tests and compressed a 6GB table which is one of the most used tables in our application. The table compression rate was interesting at 73% (I used the method recommended by Ken Jacobs at the MySQL UC: gzip a n.ibd file and see if the resulting gzip file is more than 50% compressed)

In production, the results were promising in CPU usage since we switched from an I/O-bound workload to a totally CPU-bound workload.

  • Before (no compression) : 30% user CPU, 30% iowait
  • After (using 8k compression on the main table) : 87% user CPU, 3% iowait

Impressive numbers! But there was an outstanding problem with compression: some requests greatly underperformed.

Actually, we have in our application, some requests which I will call “external sorts” which are used to build various rankings. i.e. we do this kind of joins:

mysql> SELECT id FROM main INNER JOIN category USING (id) INNER JOIN ranking ORDER BY ranking.total LIMIT XX;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: category
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 750465
Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: ranking
type: eq_ref
possible_keys: id
key: id
key_len: 4
ref: id
rows: 1
Extra:
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: main
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: id
rows: 1
Extra: Using where; Using index

The optimizer will usually reverse table execution order because of the ORDER BY clause and will fetch all of the ids in the category table, then do a temporary and filesort to build the final resultset. This will usually work fine with no compression (a few secs) and will be OK on the application side provided that you store the resultset in memcached.

With compression, I guess that the performance problem is derived from the fact that MySQL will not reading adjacent pages of data in the main table, but that it will be forced to fetch and decompress many non-adjacent pages due to the fact that the main table is not sorted by index, but rather by filesort algorithm. I know that filesort + temporary is not very good query design on large pages of data but for us it was a tradeoff which worked without using compression – looks like I’ll have to make more benchmarking tests on filesorts + compressed data to assess if the decrease of performance is at this particular spot.

Entries (RSS)