Purpose: To make sure data consistency in upgraded environment.
Thoughts : Dump data base before and after upgrade and check data diff for these 2 data bases.
Available Tools:
- dbForge Data Compare for SQL Server
- DBDiff
- Pgdatadiff tool
Steps 1: Dump sql file before / after upgrade
docker exec -t
vaidio pg_dump --format=c -U ainvr>
dump_`date +%d-%m-%Y"_"%H_%M_%S`.sql
docker exec -t
your-db-container pg_dump --format=c -U ainvr > dump_`date
+%d-%m-%Y"_"%H_%M_%S`.sql
Steps 2:
Create Empty Database
createdb -U postgres -W ainvr;
createdb -U postgres -W ainvr2;
Steps3 : Restore data to created empty database
psql -h localhost -p 5433 -U postgres -f
D:\dump_18-10-2021_13_23_57_before.sql ainvr
psql -h localhost -p 5433 -U postgres -f
D:\dump_18-10-2021_14_33_09_after.sql ainvr2
2. DBDiff Tool
DBDiff is an automated database schema and data diff tool.
○
It compares two databases,
local or remote, and produces a migration file of the differences
automatically.
○
Command-Line API
Source Code: https://dbdiff.github.io/DBDiff/
Comparison result
Executed Command:
dbdiff postgres://postgres:@localhost:5433/ainvr
postgres://postgres:@localhost:5433/ainvr2
Result:
-- DROP TABLE "public"."scene_object_2021_07_26";
-- DROP TABLE "public"."scene_object_2021_07_27";
-- DROP TABLE "public"."scene_object";
-- DROP TABLE "public"."scene_object_2021_07_28";
3. Pgdatadiff Tool
- Firstly, compares the row count in both tables.
- If the row count is the same, it instructs postgres to create MD5 sums from data.
The MD5 sums are based on the data being cast to varchar.
If you have data types that don't cast to varchar properly then the behavior
probably not reliable.
Source Code: https://github.com/dmarkey/pgdatadiff
Comparison result
Executed command:
pgdatadiff --firstdb=postgres://postgres:@localhost:5433/ainvr
--seconddb=postgres://postgres:@localhost:5433/ainvr2
Future
- Integrate with Jenkins through ssh set up environment
- Execute db diff scripts from Jenkins
Recommdation
- Pgdatadiff Tool
Firstly it compares the row count in both tables.
If the row
count is the same, it instructs postgres to create MD5 sums of
"chunks" of the table in both DBs and compares them. This way no data
is actually read directly by pgdatadiff
, it also means that pgdatadiff
is relatively fast but is puts a moderate amount of
pressure on the DB as it calculates the MD5 sums of large amounts of data.