Incremental Database Synchronization

In the world of heterogeneous information systems many applications and business processes require to synchronize databases. The most challenge of sync procedure is to update modified data in near to real time way since both source and destination databases may be in use constantly.

Classical approach to database synchronization is based on full scanning the original and target databases to find the data being changed. Then necessary actions are applied over this data:

Insert all rows from data source that are missing in the destination
Delete from destination database all rows missing in the source
Update existing rows

The problem of such method is long overall duration of synchronization process. For example, there are two databases - SQL Server and MySQL, both having over one million of rows. Average performance of classical synchronization is about 100 rows per second. It cannot be improved by using high speed database access components or techniques since this method does not allow bulk manipulations over the data. Consequently, each pass of classical synchronization takes more than 2,5 hours for considered databases that makes impossible to run it on hourly basis.

Another approach to the problem is to iterate only those rows that has been modified since the previous run of synchronization. It can be implemented according to these algorithm:

At the first run the program does classical database synchronization as it is described above
Also, the program creates triggers on insert, delete and update for every table being synchronized. Each of these triggers writes information about modified row into special service table
Starting from the second run, the program extracts information about modified rows from the service table and updates only those rows

This technique is known as incremental or trigger-based database synchronization. Let us consider how incremental method may improve timeframe of database synchronization on the example above. Even with record-by-record processing, incremental synchronization can be run in near to real time mode if it is scheduled properly. But it can also implement bulk rows processing technique since all modified data is collected in the single place grouped by type of operation: insert, delete or update. Bulk rows processing gives more than 10 times higher performance of the synchronization process.

Incremental or trigger-based database synchronization applies the following requirements:

Access to synchronized databases with sufficient privileges for creation triggers and tables
Each of synchronized tables must have primary key defined

In order to evaluate benefits of incremental synchronization MS SQL - MySQL Sync tool can be used.

Have questions? Contact us