EXPLORING UNCERTAINTY !!!: December 2021

Purpose: To make sure data consistency in upgraded environment.

Thoughts : Dump data base before and after upgrade and check data diff for these 2 data bases.

Available Tools:

dbForge Data Compare for SQL Server
DBDiff
Pgdatadiff tool

Environment Setup:

1. dbForge Data Compare for SQL Server

Steps 1: Dump sql file before / after upgrade

docker exec -t vaidio pg_dump --format=c -U ainvr> dump_`date +%d-%m-%Y"_"%H_%M_%S`.sql

docker exec -t your-db-container pg_dump --format=c -U ainvr > dump_`date +%d-%m-%Y"_"%H_%M_%S`.sql

Steps 2: Create Empty Database

createdb -U postgres -W ainvr;
createdb -U postgres -W ainvr2;

Steps3 : Restore data to created empty database

psql -h localhost -p 5433 -U postgres -f D:\dump_18-10-2021_13_23_57_before.sql ainvr
psql -h localhost -p 5433 -U postgres -f D:\dump_18-10-2021_14_33_09_after.sql ainvr2

2. DBDiff Tool

DBDiff is an automated database schema and data diff tool.

○ It compares two databases, local or remote, and produces a migration file of the differences automatically.

○ Command-Line API

Source Code: https://dbdiff.github.io/DBDiff/

Comparison result

Executed Command:

dbdiff postgres://postgres:@localhost:5433/ainvr postgres://postgres:@localhost:5433/ainvr2

Result:

-- DROP TABLE "public"."scene_object_2021_07_26";

-- DROP TABLE "public"."scene_object_2021_07_27";

-- DROP TABLE "public"."scene_object";

-- DROP TABLE "public"."scene_object_2021_07_28";

3. Pgdatadiff Tool

Firstly, compares the row count in both tables.
If the row count is the same, it instructs postgres to create MD5 sums from data.

The MD5 sums are based on the data being cast to varchar.

If you have data types that don't cast to varchar properly then the behavior probably not reliable.

Source Code: https://github.com/dmarkey/pgdatadiff

Comparison result

Executed command:

pgdatadiff --firstdb=postgres://postgres:@localhost:5433/ainvr --seconddb=postgres://postgres:@localhost:5433/ainvr2

Future

Integrate with Jenkins through ssh set up environment
Execute db diff scripts from Jenkins

Recommdation

Pgdatadiff Tool

Firstly it compares the row count in both tables.

If the row count is the same, it instructs postgres to create MD5 sums of "chunks" of the table in both DBs and compares them. This way no data is actually read directly by pgdatadiff, it also means that pgdatadiff is relatively fast but is puts a moderate amount of pressure on the DB as it calculates the MD5 sums of large amounts of data.

EXPLORING UNCERTAINTY !!!

Search Here!

Wednesday, December 8, 2021

Upgrade data migration validation !

Followers

Evil Tech!