MySQL vs PHP: Sorting Data Using MySQL Variables

Background: One site offers its members to vote to elect the member of the month. Each month the counters are reset to zero.
On a dedicated page, the site displays the history of the top 5 members of each month over a period of one year.

We could have no problem extracting information from the database and finding the 5 best scores of each month with a PHP script. But what will happen to the generation time of the page when the volume of votes reaches a few million lines over a year.

As a reminder, databases are not only used to store data, but also to sort it. And on this last point they are often the best placed from a performance point of view.

Writing the SQL query that will allow us to calculate the score of each member each month is relatively simple:

SELECT id_membre, MONTH( date_vote ) AS month, AVG( note ) AS moyenne
FROM vote
WHERE date_vote BETWEEN '2011-01-01' AND '2011-12-31'
GROUP BY month, id_membre
ORDER BY month ASC , moyenne DESC;
 


The problem is to limit the result to the 5 best 'average' of each month.

Let's disregard the language and analyze the procedure to follow.

 I have to calculate the average of each profile for each month
 For each month I have to sort the profiles by their average (descending order on the average)
 I only keep the first 5 profiles per month.

Point 1 is ensured by the SQL query above.

Point 2 is the most delicate, I have to order my profiles. For this I am going to write a query that will use the return of the query from point 1 as a source of information. The aim of the game is to use an IF condition in my SQL query that will create for each profile its position. 1 for the best, 2 for the second, 3...
To do this I will need to increment a counter that will be reset to zero at each change of month. So I'm going to need 2 user variables, one for the meter and the other for the current month.

SET @month :=0, @rownum :=0;
SELECT IF( month = @month , @rownum := @rownum +1, @rownum :=1 ) AS rank, @month := temp.month, temp . *
FROM (
    SELECT id_membre, MONTH( date_vote ) AS month, AVG( note ) AS moyenne
    FROM vote
    WHERE date_vote BETWEEN '2011-01-01' AND '2011-12-31'
    GROUP BY month, id_membre
    ORDER BY month ASC , moyenne DESC
) temp;


Some explanations: The first line allows you to define 2 user variables @month and @rownum. Then I use an IF to create the rank column. If you are not familiar with IF in MySQL this one behaves like a ternary write. If the condition is true, it executes the first instruction, otherwise it executes the second. The ',' acts as a separator.

We get the same result as with the first query but embellished with a 'rank' field which is the order of the profiles.
For point 3 it is enough to redo an SQL query exploiting this result and limit the return to the 'rank' lower than or equal to 5. Which gives us:

 
SET @month :=0, @rownum :=0;
SELECT rank, id_membre, month, average
FROM (
 
    SELECT IF( month= @month, @rownum := @rownum +1, @rownum :=1 ) AS rank, @month := temp.month, temp . *
    FROM (
 
       SELECT id_membre, MONTH( date_vote ) AS month, AVG( note ) AS moyenne
       FROM vote
       WHERE date_vote BETWEEN '2011-01-01'AND '2011-12-31'
       GROUP BY month, id_membre
       ORDER BY month ASC , moyenne DESC
 
   ) temp
 
) temp2
WHERE temp2.rank <=5;


And here is the result. It's still prettier than a big PHP algo right?
Performance point of view on my test server (Asus eeebox), with a table of 2 million rows I am under the second, rather honorable.

Be careful, do not forget to set the variables otherwise the result will be unexpected.
You also need the variables to be set in the same transaction as the query or everything goes into the same mysql_query() if you're not using transactions.

Add a comment