Apply CI / CD principles to SQL migrations with Jenkins and some bash

Writing unit or functional tests on code has almost become a standard. Having a server that automates the verification of these tests and deploys the code is the goal of many development teams. Despite this, these good practices are rarely implemented for database schema migrations.

I try to use these best practices for SQL changes because I find it very time consuming. The idea is stop using slack, mail or jira issue to ask a MySQL migration and replace it by git repository with use of pull request and have some bash script in Jenkins to test and deploy the change.

Git repository

The structure of the repository is simple, top level directory is the database name. Under top level we find directory for each environment. The idea is to write each migration in a sql file and store it in the directory that map the database and the environment.

|-db1
|   |-preprod
|   |   |-20181001_migrations1.sql
|   |   |-20181015_migrations2.sql
|   |-prod
|
|-db2
|   |-preprod
|   |-prod
 

Migration file should only exist in one environment. For example, once the migration file is deploy on preprod env, it should be move to prod environment to be deploy on that env. Every add or update of migration file is done with dedicated git branch and use pull request to be merged.

Test SQL query

I have long search for SQL linter to detect errors early, unfortunately there doesn't seem to be any. So to test SQL query, the only way I found is to run it. Of course I want to run the query on test environment and not directly on the target environment.

First constraint: time. If I need 3 hours to run a query because table contains millions of lines, no one will use this process. Since I have never seen the case of ALTER TABLE failed because of data, my test environment only need table structure which will make every test fast.
My second constraint is to keep environment test as close as possible with the target environment. So I decided to dump target's structure for each test.

This script is automatically run by Jenkins as soon as a change is pushed in the repository.

#!/usr/bin/env bash
 
set -euo pipefail
 
DEV_HOST="preprod"
DEV_USER="root"
DEV_PASSWD="ulrich"
PROD_HOST="moncode.net"
PROD_USER="migration"
PROD_PASSWD="something"
PROD_DEST="/home/migration"
 
GIT_DIFF=$(git diff @^1 @ --name-only --no-renames -M)
for FILE in $GIT_DIFF; do
    if [ -f "$FILE" ]
    then
        DB=$(echo "$FILE" | awk -F "/" '{print $1}')
        ENV=$(echo "$FILE" | awk -F "/" '{print $2}')
        SANDBOX="jenkins_sandbox_""$DB"
        
        echo "Test for $ENV environnement"
        
        if [ "$ENV" == "preprod" ]
        then
            #first delete sandbox
            echo "Reset sandbox"
            mysql -h "$DEV_HOST" -u"$DEV_USER" -p"$DEV_PASSWD" -e "DROP DATABASE IF EXISTS $SANDBOX;"
 
            #then create new sandbox with actual schema
            mysql -h "$DEV_HOST" -u"$DEV_USER" -p"$DEV_PASSWD" -e "CREATE DATABASE $SANDBOX;"
            mysqldump -h "$DEV_HOST" -u"$DEV_USER" -p"$DEV_PASSWD" -d "$DB" > "$SANDBOX.sql"
            mysql -h "$DEV_HOST" -u"$DEV_USER" -p"$DEV_PASSWD" "$SANDBOX" < "$SANDBOX.sql"
 
            #then try migration
            echo "Apply migration $FILE"
            mysql -h "$DEV_HOST" -u"$DEV_USER" -p"$DEV_PASSWD" "$SANDBOX" < "$FILE"
        elif [ "$ENV" == "prod" ]
        then
            #first delete sandbox
            echo "Reset sandbox"
            ssh user@production "mysql -h $PROD_HOST -u$PROD_USER -p$PROD_PASSWD -e 'DROP DATABASE IF EXISTS $SANDBOX;'"
 
            #then create new sandbox with actual schema
            ssh user@production "mysql -h $PROD_HOST -u$PROD_USER -p$PROD_PASSWD -e 'CREATE DATABASE $SANDBOX;'"
            ssh user@production "mysqldump -h $PROD_HOST -u$PROD_USER -p$PROD_PASSWD -d $DB > $PROD_DEST$SANDBOX.sql"
            ssh user@production "mysql -h $PROD_HOST -u$PROD_USER -p$PROD_PASSWD $SANDBOX < $PROD_DEST$SANDBOX.sql"
            ssh user@production "rm $PROD_DEST$SANDBOX.sql"
 
            FILEONLY=$(echo "$FILE" | awk -F "/" '{print $3}')
            scp "$FILE" user@production:"$PROD_DEST"
            QUERY="mysql -h $PROD_HOST -u$PROD_USER -p$PROD_PASSWD $SANDBOX < $PROD_DEST$FILEONLY"
            ssh user@production "$QUERY"
            ssh user@production "rm $PROD_DEST$FILEONLY"
        fi
    fi
done
At the top of the script we find login and password to connect to different environments, of course you must adapt this section to the security rules you use.

I use the git diff command to get list of migration files to test. By cutting the file names by / I get the name of the database and its environment. For the example, I consider that the preprod MySQL server is accessible from the Jenkins server. On the other hand, this is not the case for production. So I pass the commands through ssh.

If a command fails, the script stops and returns the exit code of the command that failed, allowing the Jenkins build to be in error.

Percona toolkit support

If your database is big, it's interesting to use pt-online-schema-change to run schema migration. I use file with .pt extension, they are bash script inside of which I  declare 2 variables, one for the table name and other for schema change.

#!/usr/bin/env bash
 
PT_TABLE="user"
PT_ALTER="ADD COLUMN active tinyint(1) NOT NULL DEFAULT 0, ADD COLUMN avatar varchar(126) DEFAULT NULL"

I include these files at runtime to use variables in pt-online-schema-change command that already contains a lot of options.

#!/usr/bin/env bash
 
set -euo pipefail
 
[...]
        
        if [ "$ENV" == "preprod" ]
        then
            [...]
 
            #then try migration
            if [ $(echo "$FILE" | grep -c ".sql") == 1 ]
            then
                echo "Apply migration $FILE"
                mysql -h "$DEV_HOST" -u"$DEV_USER" -p"$DEV_PASSWD" "$SANDBOX" < "$FILE"
            elif [ $(echo "$FILE" | grep -c ".pt") == 1 ]
            then
                source "$FILE"
                pt-online-schema-change --alter "$PT_ALTER" D=$SANDBOX,t=$PT_TABLE --host="$DEV_HOST" --user="$DEV_USER" --password="$DEV_PASSWD" --dry-run
            fi
        elif [ "$ENV" == "prod" ]
        then
            [...]
 
            if [ $(echo "$FILE" | grep -c ".sql") == 1 ]
            then
                FILEONLY=$(echo "$FILE" | awk -F "/" '{print $3}')
                scp "$FILE" user@production:"$PROD_DEST"
                QUERY="mysql -h $PROD_HOST -u$PROD_USER -p$PROD_PASSWD $SANDBOX < $PROD_DEST$FILEONLY"
                ssh user@production "$QUERY"
                ssh user@production "rm $PROD_DEST$FILEONLY"
            elif [ $(echo "$FILE" | grep -c ".pt") == 1 ]
            then
                source "$FILE"
                QUERY="pt-online-schema-change --alter \"$PT_ALTER\" D=$SANDBOX,t=$PT_TABLE --host=$PROD_HOST --user=$PROD_USER --password=$PROD_PASSWD --dry-run"
                ssh user@production "$QUERY"
            fi
        fi
[...]

Deploy SQL change

Once the pull request is reviewed and merge on master (or stable depends on your naming convention), jenkins will run another script that will depoy the migration. This script is almost the same as previous one without test database and use of option --execute in pt-online-schema-change command.

#!/usr/bin/env bash
 
set -euo pipefail
 
DEV_HOST="preprod"
DEV_USER="root"
DEV_PASSWD="ulrich"
PROD_HOST="moncode.net"
PROD_USER="migration"
PROD_PASSWD="something"
PROD_DEST="/home/migration"
 
GIT_DIFF=$(git diff @^1 @ --name-only --no-renames -M)
for FILE in $GIT_DIFF; do
    if [ -f "$FILE" ]
    then
        DB=$(echo "$FILE" | awk -F "/" '{print $1}')
        ENV=$(echo "$FILE" | awk -F "/" '{print $2}')
 
        echo "Run for $ENV environnement"
 
        if [ "$ENV" == "preprod" ]
        then
            if [ $(echo "$FILE" | grep -c ".sql") == 1 ]
            then
                echo "Apply migration $FILE"
                mysql -h "$DEV_HOST" -u"$DEV_USER" -p"$DEV_PASSWD" "$DB" < "$FILE"
            elif [ $(echo "$FILE" | grep -c ".pt") == 1 ]
            then
                source "$FILE"
                pt-online-schema-change --alter "$PT_ALTER" D=$DB,t=$PT_TABLE --host="$DEV_HOST" --user="$DEV_USER" --password="$DEV_PASSWD" --execute
            fi
        elif [ "$ENV" == "prod" ]
        then
            if [ $(echo "$FILE" | grep -c ".sql") == 1 ]
            then
                FILEONLY=$(echo "$FILE" | awk -F "/" '{print $3}')
                scp "$FILE" user@production:"$PROD_DEST"
                QUERY="mysql -h $PROD_HOST -u$PROD_USER -p$PROD_PASSWD $DB < $PROD_DEST$FILEONLY"
                ssh user@production "$QUERY"
                ssh user@production "rm $PROD_DEST$FILEONLY"
            elif [ $(echo "$FILE" | grep -c ".pt") == 1 ]
            then
                source "$FILE"
                QUERY="pt-online-schema-change --alter \"$PT_ALTER\" D=$DB,t=$PT_TABLE --host=$PROD_HOST --user=$PROD_USER --password=$PROD_PASSWD --execute"
                ssh user@production "$QUERY"
            fi
        fi
    fi
done

I use this procedure for a few months now on different projects and I found a lot of advantage. The round trip with the requester in case of error is simpler, he get feedback very fast thanks to test build in jenkins and can correct its migration, discussion via pull request is easy. Moreover it is easy to know on which environment is the migration.
Another interesting point, if in your processes you do regular bdd synchronization since production, the git repository can be used to automatically replay migrations by environment.

Attention, to keep the script understandable as much as possible I managed only 2 environments but especially I reduced the security to the simplest. This is not necessarily ideal, depending on Jenkins configuration, the rights to the scripts and many other parameters the passwords can be exposed. Pay attention to this subject according to your needs and I advise you to use a dedicated user in MySQL or MariaDB.

Add a comment