Merging two identical databases with different data content altogether


Scenario:

You have two identical databases (structure wise) and need to merge (add) the data from one to the other.

This article shows a general principle on how to add all data, from one database to another (notice: even with colliding primary keys you can merge the databases together!).

It does not address how to not add data that already exists in the other database (as this is simple enough and would confuse the examples), but you will be able to find help for this in other sources or through our assistance.

To show the generic principles we need an example database:

Merge Add Data Database Schema Diagram

The "Visited" table is a relation between 2 things a "Person" and a "Location". To add more dependency to our example a "Person" is registered on a "Location" as the one person adding the Location (imagine this occurs in the app where this database is used).

To simplify the example, I will use "identity" on primary key id fields. (This also enforces the impression that it is very difficult to merge the data in the databases! And it is, but we can simply by following a standard procedure!)

The database you can create in SQL Server via the script below:


CREATE DATABASE [DemoSource]
GO
USE [DemoSource]
GO
CREATE TABLE [dbo].[Location](
	[LocationId] [int] IDENTITY(1,1) NOT NULL,
	[LocationName] [nvarchar](250) NOT NULL,
	[AddedByPersonId] [int] NOT NULL,
 CONSTRAINT [PK_Location] PRIMARY KEY CLUSTERED 
(
	[LocationId] ASC
)
)

GO

CREATE TABLE [dbo].[Person](
	[PersonId] [int] IDENTITY(1,1) NOT NULL,
	[PersonName] [nvarchar](250) NOT NULL,
 CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED 
(
	[PersonId] ASC
)
)
GO

CREATE TABLE [dbo].[Visited](
	[VisitedId] [int] IDENTITY(1,1) NOT NULL,
	[PersonId] [int] NOT NULL,
	[LocationId] [int] NOT NULL,
 CONSTRAINT [PK_Visited] PRIMARY KEY CLUSTERED 
(
	[VisitedId] ASC
)
)

GO



ALTER TABLE [dbo].[Location]  WITH CHECK ADD  CONSTRAINT [FK_Location_Person] 
    FOREIGN KEY([AddedByPersonId])
    REFERENCES [dbo].[Person] ([PersonId])
GO
ALTER TABLE [dbo].[Visited]  WITH CHECK ADD  CONSTRAINT [FK_Visited_Location] 
    FOREIGN KEY([LocationId])
    REFERENCES [dbo].[Location] ([LocationId])
GO
ALTER TABLE [dbo].[Visited]  WITH CHECK ADD  CONSTRAINT [FK_Visited_Person] 
    FOREIGN KEY([PersonId])
    REFERENCES [dbo].[Person] ([PersonId])
GO

Also create a target database by replacing the text "DemoSource" with "DemoTarget" and run the script again. Now you have source and target databases.


And next, to have some data to merge from the source database to the target database we add this to the source database:


USE [DemoSource]
GO

SET IDENTITY_INSERT dbo.Person ON;  
INSERT INTO dbo.Person (PersonId, PersonName) VALUES (1 /* PersonId */, 'Person 1' /* PersonName */);
INSERT INTO dbo.Person (PersonId, PersonName) VALUES (2 /* PersonId */, 'Person 2' /* PersonName */);
INSERT INTO dbo.Person (PersonId, PersonName) VALUES (3 /* PersonId */, 'Person 3' /* PersonName */);
SET IDENTITY_INSERT dbo.Person OFF;  

SET IDENTITY_INSERT dbo.Location ON;  
INSERT INTO dbo.Location (LocationId, LocationName, AddedByPersonId) VALUES (1 /* LocationId */, 'Location 1' /* LocationName */, 1 /* AddedByPersonId */);
INSERT INTO dbo.Location (LocationId, LocationName, AddedByPersonId) VALUES (2 /* LocationId */, 'Location 2' /* LocationName */, 1 /* AddedByPersonId */);
SET IDENTITY_INSERT dbo.Location OFF;  

SET IDENTITY_INSERT dbo.Visited ON;  
INSERT INTO dbo.Visited (VisitedId, PersonId, LocationId) VALUES (1 /* VisitedId */, 1 /* PersonId */, 1 /* LocationId */);
INSERT INTO dbo.Visited (VisitedId, PersonId, LocationId) VALUES (2 /* VisitedId */, 1 /* PersonId */, 1 /* LocationId */);
INSERT INTO dbo.Visited (VisitedId, PersonId, LocationId) VALUES (3 /* VisitedId */, 2 /* PersonId */, 1 /* LocationId */);
INSERT INTO dbo.Visited (VisitedId, PersonId, LocationId) VALUES (4 /* VisitedId */, 2 /* PersonId */, 2 /* LocationId */);
INSERT INTO dbo.Visited (VisitedId, PersonId, LocationId) VALUES (5 /* VisitedId */, 3 /* PersonId */, 2 /* LocationId */);
SET IDENTITY_INSERT dbo.Visited OFF;  
GO



And now, let's get started:

1.
Click "Connect Database"
Connect to a database

2.
Fill in values for the 2 databases you just created. To be sure all is ok you can hit the Test buttons.
Connect to a database

3.
When prompted to scan the database, allow it and see 3 items are created, one for each table initially. Close the connection dialog. If you say no to scan, you can also do it (scan for tables) after closing the dialog as well, via "Refresh Project".

4.
To see some data now click the [Open Source] and [Open target] buttons. Yayh. We have data to merge.

First look at data

Next on we wish that the ordering of Merge Items (tables) will reflect the ordering in which we need to insert data (to take dependencies in data into account).

If you have the Enterprise version you can click the [Arrange order by reference depth] and the ordering of the merge items will arrange to the dependency of the tables in the database automatically. If you do not have the Enterprise version you can do this manually via the “Move” in the top menu. For our project the ordering is: Person, Location, Visited.

Hereafter the ordering in the project will reflect how connected the tables are. Those that are most needed in respect to other tables pointing to them via foreign keys, well they will come first.



If we try to merge now and some of the keys in the source database also is used (for other items) in the target we will corrupt our data. So the following is necessary, this is the trick:

Decorate database

To help us we need to add a little some-extra-stuff to the database during the merge. These things can be removed again afterwards.

1.
In each table we need an alternative unique key. Maybe there is one already in your data, but what we like to have now is a uniqueidentifier (Guid) key. If you are not familiar with Guid fields, a quick explanation here: it is a random number so big that it is almost indefinitely impossible to ever make a duplicate (they ARE unique in the universe, but they aren’t, but they are!). So if we add a value to our database it is certain that this value will not exist in the other database. We need this 'facility'. Actually there can be reason in having primary keys in new database be Guids, but there is a (smallish) storage and speed penalty AND it is cumbersome to notate when testing/debugging/remembering id's, so in my opinion it is not a good idea. But it is a good supplement sometimes to have an additional id field in the form of a Guid (e.g. when logged SQL script must be usable across instances of databases. That was a nice tip by the way!).

Script to add Guid fields to our 3 tables:

ALTER TABLE dbo.Person ADD
	PersonGuid uniqueidentifier NOT NULL 
        CONSTRAINT DF_Person_LocationGuid DEFAULT NEWSEQUENTIALID()
GO
ALTER TABLE dbo.Location ADD
	LocationGuid uniqueidentifier NOT NULL 
        CONSTRAINT DF_Location_LocationGuid DEFAULT NEWSEQUENTIALID()
GO
ALTER TABLE dbo.Visited ADD
	VisitedGuid uniqueidentifier NOT NULL 
        CONSTRAINT DF_Visited_LocationGuid DEFAULT NEWSEQUENTIALID()
GO

Notice we automatically assign a new id if it is not explicitly provided in inserts... (via the “DEFAULT NEWSEQUENTIALID()” part.)

2.
Now, for the sake of merging, our source data should no longer include the integer id of foreign key fields, but instead the Guid version.
E.g. "select LocationId, LocationName, AddedByPersonGuid from Location"

For this purpose let’s add a simple function for each table used in a foreign key in our database. That’s “Person” and “Location”.

First let’s create a separate schema, to keeps things a bit separated.

CREATE SCHEMA sync;
GO

That just helps to see stuff we added for this and more easily remove it again, that's all.

create function [sync].[PersonIdToGuid](@PersonId int) returns uniqueidentifier
as
/*
Test:
	select sync.PersonIdToGuid(1)
*/
begin
	declare @Result uniqueidentifier;
	set @Result = (select T.PersonGuid from Person T where T.PersonId = @PersonId);
	return @Result;
end;
GO

create function [sync].[LocationIdToGuid](@LocationId int) returns uniqueidentifier
as
/*
Test:
	select sync.LocationIdToGuid(1)
*/
begin
	declare @Result uniqueidentifier;
	set @Result = (select T.LocationGuid from Location T where T.LocationId = @LocationId);
	return @Result;
end;
GO

create function [sync].[VisitedIdToGuid](@VisitedId int) returns uniqueidentifier
as
/*
Test:
	select sync.VisitedIdToGuid(1)
*/
begin
	declare @Result uniqueidentifier;
	set @Result = (select T.VisitedGuid from Visited T where T.VisitedId = @VisitedId);
	return @Result;
end;
GO

They just lookup the Guid based id...

Modified source selects

Now having these functions we can replace e.g. the PersonId in the "select from Location":

So go from (for "Source"):

select * from dbo.Location    

To:

select 
    LocationGuid
    , AddedByPersonGuid = sync.PersonIdToGuid(AddedByPersonId)
    , LocationName 
from dbo.Location

Example replaced source query

And for Visited:

select 
    VisitedGuid
    , PersonGuid = sync.PersonIdToGuid(PersonId)
    , LocationGuid = sync.LocationIdToGuid(LocationId) 
from dbo.Visited

For "Person" we just need this (to avoid getting the Primary key field selected):

select PersonName, PersonGuid from dbo.Person

3.
Finally we need to be able to control how the data is inserted and we do this by using "instead of" triggers.
For this we first need helper views (in the target database), both to represent the dataschema, and to attach the trigger to:

CREATE VIEW Location_View
AS
    SELECT 
        AddedByPersonGuid = cast(null as uniqueidentifier)
        , T.*
    FROM Location T
;
GO

CREATE VIEW Visited_View
AS
    SELECT 
        PersonGuid = cast(null as uniqueidentifier)
        , LocationGuid = cast(null as uniqueidentifier)
        , T.*
    FROM Visited T
;
GO

In a real-world scenario I would probably replace the T.* part with listing the actual fields I need.
(never use * in a view, it sucks ;-)


You need only 2, not one for Person as there are no foreign keys from that table.

Now use those view in the “target” sql for the 2 tables.

But wait, first we need functions to convert back (now based on the actual value in the target)

create function [sync].[PersonGuidToId](@PersonGuid uniqueidentifier) returns int
as
/*
Test:
	select sync.PersonGuidToId('')
*/
begin
	declare @Result int;
	set @Result = (select T.PersonId from Person T where T.PersonGuid = @PersonGuid);
	return @Result;
end;
GO

create function [sync].[LocationGuidToId](@LocationGuid uniqueidentifier) returns int
as
/*
Test:
	select sync.LocationGuidToId('')
*/
begin
	declare @Result int;
	set @Result = (select T.LocationId from Location T where T.LocationGuid = @LocationGuid);
	return @Result;
end;
GO

Having these function we can improve the views a bit, to be able to recompare after merging and get the correct result when done:

ALTER VIEW Location_View
AS
	SELECT
		LocationGuid 
		, AddedByPersonGuid = sync.PersonIdToGuid(T.AddedByPersonId)
		, LocationName
	FROM Location T
;
go

ALTER VIEW Visited_View
AS
	SELECT
		VisitedGuid 
		, PersonGuid = sync.PersonIdToGuid(T.PersonId)
		, LocationGuid = sync.LocationIdToGuid(T.LocationId)
	FROM Visited T
;
go

And NOW the "instead of" triggers, wrapping it all up:

CREATE TRIGGER Location_View_Insert 
    on Location_View
    INSTEAD OF INSERT
AS
BEGIN
	INSERT INTO Location (LocationGuid, LocationName, AddedByPersonId)
	SELECT 	
		I.LocationGuid
		, I.LocationName
		, AddedByPersonId = sync.PersonGuidToId(I.AddedByPersonGuid)
	FROM inserted I
END;
GO

CREATE TRIGGER Visited_View_Insert 
    on Visited_View
    INSTEAD OF INSERT
AS
BEGIN
	INSERT INTO Visited (VisitedGuid, PersonId, LocationId)
	SELECT 
		I.VisitedGuid
		, PersonId = sync.PersonGuidToId(I.PersonGuid)
		, LocationId = sync.LocationGuidToId(I.LocationGuid)
	FROM inserted I
END;
GO

Yes! there it was, please take in the revelation... have a close look.

Adjust project

Let's adjust our project in SQLMerger and get rolling.

1a. For the merge item “Location” replace the setting “Target Table Name” with “Location_View”.

1b. And change the “Target SQL Query” to use the view for selecting. Target settings finally


2a. For the merge item “Visited” replace the setting “Target Table Name” with “Visited _View” in the same manner as above.

2b. And change the “Target SQL Query” to use the view for selecting.


3. To be able to locate after things are merged you might as well for all the MergeItems replace the “Identify by Field Name” to use the new Guid fields! (do this for all 3 tables)


Now we are ready.. The preparation is almost the whole thing, like when you need to paint something, you know the preparation is the main thing. (Maybe you want to save your project before proceeding.)



Hit the [Create script automatically] button.
Target settings finally

That will bring up the "Create script automatically" window where the operation can be adjusted. Just hit the [Ok] button.
Create script window previewing the operation

And …drumroll… you get the script.
Screenshot with database change script

That's what's adds all the records to the target database. Even if there are records there already and those records uses some of the same Id as was used in the source database for other items!!

Now finally execute the script to have it all inserted in the target.
Execute sql script button

If you open the target afterwards you will see the data is equal to the source (as long as you don't include the original primary keys field).
Two tables equal after merge


Hope you enjoyed the walk through and is able to scale to your actual scenario.



By the way, here is the resulting SQL the program created, that actually added the content from the one database to the other :-) (an almost impossible task, right!):

/* Statements for Merge Item "Person". (Generated by SQLMerger.) */
INSERT INTO dbo.Person (PersonName, PersonGuid)
    VALUES ('Person 1' /* PersonName */, '{7627E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* PersonGuid */)  
        /* Primary key database generated! */;
INSERT INTO dbo.Person (PersonName, PersonGuid) 
    VALUES ('Person 2' /* PersonName */, '{7727E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* PersonGuid */)  
        /* Primary key database generated! */;
INSERT INTO dbo.Person (PersonName, PersonGuid) 
    VALUES ('Person 3' /* PersonName */, '{7827E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* PersonGuid */)  
        /* Primary key database generated! */;

/* Statements for Merge Item "Location". (Generated by SQLMerger.) */
INSERT INTO dbo.Location_View (LocationGuid, AddedByPersonGuid, LocationName) 
    VALUES ('{7927E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* LocationGuid */
        , '{7627E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* AddedByPersonGuid */
        , 'Location 1' /* LocationName */);
INSERT INTO dbo.Location_View (LocationGuid, AddedByPersonGuid, LocationName) 
    VALUES ('{7A27E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* LocationGuid */
        , '{7627E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* AddedByPersonGuid */
        , 'Location 2' /* LocationName */);

/* Statements for Merge Item "Visited". (Generated by SQLMerger.) */
INSERT INTO dbo.Visited_View (VisitedGuid, PersonGuid, LocationGuid) 
    VALUES ('{7B27E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* VisitedGuid */
        , '{7627E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* PersonGuid */
        , '{7927E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* LocationGuid */);
INSERT INTO dbo.Visited_View (VisitedGuid, PersonGuid, LocationGuid) 
    VALUES ('{7C27E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* VisitedGuid */
        , '{7627E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* PersonGuid */
        , '{7927E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* LocationGuid */);
INSERT INTO dbo.Visited_View (VisitedGuid, PersonGuid, LocationGuid) 
    VALUES ('{7D27E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* VisitedGuid */
        , '{7727E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* PersonGuid */
        , '{7927E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* LocationGuid */);
INSERT INTO dbo.Visited_View (VisitedGuid, PersonGuid, LocationGuid) 
    VALUES ('{7E27E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* VisitedGuid */
        , '{7727E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* PersonGuid */
        , '{7A27E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* LocationGuid */);
INSERT INTO dbo.Visited_View (VisitedGuid, PersonGuid, LocationGuid) 
    VALUES ('{7F27E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* VisitedGuid */
        , '{7827E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* PersonGuid */
        , '{7A27E31C-1CB7-E811-BB0C-7CB0C2E5A311}' /* LocationGuid */);

GO


This tutorial was using the database merging tool SQLMerger.
Read more about SQLMerger here, download free version etc..
If you have suggestions to this article please let us know.



You can hire us:

We offer assistance in several ways with your task, either by remote consultancy (which you pay up front by ordering Guru addons, please ask for details)

You can also get consultancy if you live in Europe and especially Denmark, hire us for a short while to get our expert assistance on your data merging/washing etc. project.

If not in Denmark normally minimum contract/stay is 1 week.