MySQL: Pagination - SQL_CALC_FOUND_ROWS vs COUNT()-Query

Permanent Link: MySQL: Pagination - SQL_CALC_FOUND_ROWS vs COUNT()-Query 17. Juni 2009 Comment No Comment

When using pagination for let's say a list of offers that where never clicked, you have to know the exact amount of offers (which were never clicked) in order to know how many pages you have. Now there are 2 possible ways of calculating the exact amount of offers: You can use either SQL_CALC_FOUND_ROWS or you can setup a second query with a COUNT() in it. I did the tests with SQL_NO_CACHE in order to get the best results possible. The clicks table has about 18.000.000 rows, the offer table about 800.000. Let's start with some time results.

Using SQL_CALC_FOUND_ROWS:

mysql> SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 50,50;
50 rows in set (30.84 sec)

mysql> SELECT FOUND_ROWS();
1 row in set (0.00 sec)

Using a second query with COUNT():

mysql> SELECT SQL_NO_CACHE o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 50,50;
50 rows in set (0.03 sec)

mysql> SELECT SQL_NO_CACHE COUNT(o.offer_id)
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL;
1 row in set (30.97 sec)

At a first glance they look equally fast, both taking about 30 seconds. But: They are only equally fast, when query caching is turned off. Let's assume we're on a high-traffic website where performance matters, so we turn the query cache on. MySQL Query caching is like a key-value cache with the key being the EXACT query and the resultset being the value. Once we turn on the cache, the pagination is way faster with the second query using COUNT().

Why?

When using SQL_CALC_FOUND_ROWS the application has to calculate the found rows every single time we turn the page, because the query changes, while the COUNT()-Query always remains the same, meaning that its result comes from the query cache from the second time on. Let's emulate:

Using SQL_CALC_FOUND_ROWS:

mysql> SELECT SQL_CALC_FOUND_ROWS o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 50,50;
50 rows in set (31.13 sec)

mysql> SELECT FOUND_ROWS();
1 row in set (0.00 sec)

mysql> SELECT SQL_CALC_FOUND_ROWS o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 100,50;
50 rows in set (30.71 sec)

mysql> SELECT FOUND_ROWS();
1 row in set (0.00 sec)

Using a second query with COUNT():

mysql> SELECT o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 50,50;
50 rows in set (0.03 sec)

mysql> SELECT COUNT(o.offer_id)
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL;
1 row in set (31.11 sec)

mysql> SELECT o.offer_id
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL
-> LIMIT 100,50;
50 rows in set (0.04 sec)

mysql> SELECT COUNT(o.offer_id)
-> FROM offer AS o
-> LEFT JOIN clicks AS c
-> ON (o.offer_id = c.offer_id)
-> WHERE c.offer_id IS NULL;
1 row in set (0.00 sec)

Et Voilà! While the SQL_CALC_FOUND_ROWS Queries took more than one minute together, the queries with the second COUNT() query only took a bit more than 30 seconds together, meaning they are twice as fast.

I'm pretty sure there are situations where SQL_CALC_FOUND_ROWS is the way to go, but in cases like this one you definately wanna go for the COUNT()-Query.

Don't use NOW() in MySQL SELECT Queries

Permanent Link: Don't use NOW() in MySQL SELECT Queries 18. März 2009 Comment No Comment

You maybe want to show all the products that have been added to your database in the last 7 days on your high-traffic website. You are using MySQL and your query probably looks like this:

SELECT *
FROM product
WHERE created >= NOW() - INTERVAL 7 DAY

Although being correct, this query is a bad query, because it is using NOW(). NOW() returns the current date and time to the split second. In other words: This query will never land in the cache query!

The better way would be to read the current day in PHP and pass it on to the query:

$date = date('Y-m-d');
$sql = '
SELECT *
FROM product
WHERE created >= "%s" - INTERVAL 7 DAY
';
$sql = sprintf($sql, $date);

Now we have a SQL query that perfectly lands in the cache query.

Useful Firefox addons for web developers

Permanent Link: Useful Firefox addons for web developers 9. Februar 2009 Comment No Comment

Here's a short list of useful Firefox addons for web developers:

Firebug

Probably the most useful addon around. You cannot only change the whole page on the fly (HTML and CSS) but you also have the very useful Javascript console which can even be used by your application for debugging. Furthermore you can see all AJAX activities. Download at https://addons.mozilla.org/de/firefox/addon/1843

FirePHP

FirePHP allows you to send debug messages to Firebug through a PHP Script. Download at https://addons.mozilla.org/de/firefox/addon/6149

Greasemonkey

Greasemonkey allows you to create JavaScript scripts for specific or all websites (Use of wildcard * possible!). This can be very useful to either test how new scripts would integrate into your website or to change the look and behaviour of any other webseite. Download at https://addons.mozilla.org/de/firefox/addon/748

Stylish

Same as Greasemonkey only for CSS. This way you can easily test new styles on a website before integrating them. Download at https://addons.mozilla.org/de/firefox/addon/2108

Web Developer

The classic one. Allows you to easily change behaviour of your browser (caching, JavaScript, etc.) and has a whole lot of other useful tools like showing document size, the styles, submitting the page to the W3C Validator, Browserframe resize (to test your application for specific resolutions), viewing response headers, outlining specific elements and so on. Download at https://addons.mozilla.org/de/firefox/addon/60

Search Status

Although it also shows the page rank (not very reliable), I only use this addon to display all the nofollow links on a page, which is really extremely useful. Download at https://addons.mozilla.org/de/firefox/addon/321

DNS Cache

My own Firefox extension that allows you to disable the DNS caching of Firefox, which comes in quite handy when you have to check your webservers quickly. For a more detailed description see here. Download at https://addons.mozilla.org/de/firefox/addon/5914

JavaScript loops profiled

Permanent Link: JavaScript loops profiled 20. Januar 2009 Comment No Comment

Today I was curious and wanted to know which way of looping in JavaScript fastest. So far, I always use for var i in array, since someone once told me, it is the fastest way. For testing I created an Array with 10000 elements:

ids = [];
for (var i = 1; i <= 10000; i++) {
ids.push(i);
}

I used the JavaScript Profiler of Firebug for profiling. The testsystem was: Intel Dual Core T2500 @ 2.00 GHz, 2 GB RAM, Ubuntu 8.04, Firefox 3.0.5 (only installed addon is Firebug). I profiled each loop variation 5 times and took the average time for comparison. The loop variations did nothing but loop and were the following:

Loop 1:

for(var i = 0; i < ids.length; i++) {}

Loop 2:

for (var i in ids) {}

Loop 3:

function process(element, index, array) {}
ids.forEach(process);

When I started the test, I didn't think there would be such huge differences in the performance of those three:

JavaScript loops profiled

Here are the profiling results in detail, in case you're interested (all times in ms):

Loop 1:
1,428
0,842
0,987
0,833
0,831

Loop 2:
6,084
4,471
6,040
6,256
6,509

Loop 3 (number in brackets is the profiled runtime of the function process() - see declaration above)
33,055 (16,439)
33,262 (16,489)
33,792 (17,044)
34,682 (17,312)
35,875 (17,637)

Using count() in for()-loops

Permanent Link: Using count() in for()-loops 16. Januar 2009 Comment No Comment

Since I just stumbled across it again, here's a common "mistake" often done, which can be avoided very easily. Situation is an array, that is looped over using for(). It might look something like this:

$elements = array(
'Element 1',
'Element 2',
'Element 3',
'Element 4',
'Element 5',
);

for ($i = 1; $i <= count($elements); $i++) {
print('Processing Element #' .$i . PHP_EOL);
}

This is bad coding style because the count() is in the loop header. In this case everytime the loop moves on to the next element, the function count() is called (here 5 times), although the size of the array never changes. The right (and more performant) way to do this, would be:

$elementsCount = count($elements);
for ($i = 1; $i <= $elementsCount; $i++) {
print('Processing Element #' .$i . PHP_EOL);
}

Now count() is only called once and the code does exactly the same.