I lately faced a problem: we are working on a complex ETL project that aims to migrate a database from DB2 to SQL Server.
DB2 stands for “it was made in the ’80s” when PK, FK and constraints were in the code and because we are stealing a customer from a concurrent we have no cooperation from those who have the code in their hands.
So how can we create an Entity Relationship Diagram without constraints and that isn’t there in the first place?
If you search on Google for “sql server reverse engineer diagram” you will stumble upon:
- Navicat (…perform reverse/forward engineering processes…)
- SQL Power Architect (…Forward/reverse engineers PostgreSQL, Oracle, MS SQL Server & more…)
- Microsoft Visio
- Toad Data Modeler
- ERBuilder (You can reverse engineer an existing database by connecting to it directly)
- DbDesigner (Reverse & Forward Engineer)
- Erwin Data Modeler (Forward- and reverse-engineering of database code and model exchange ensures efficiency,)
- Visual Paradigm ERD Tools (…Design, reverse and generate database trigger.)
- SQL Database Modeler
I have contacted each one of those and even if their website state that the do Reverse Engineering the reality is that “…they cannot if there are no constraints“.
I don’t know you but if there are constraints (PK, FK) already in place what sort of Reverse Engineering should we do then? To me Reverse Engineering means that we have no idea of how the software works… and we need to find out how the software works. Of course you can create an ERD if you have PK and FK.
I had to start working on the code by myself.
When you do hacking or reverse engineering you need to try to put yourself in the shoes of those who create the code. So I start thinking, maybe we can start by assuming that:
- the same column name in different tables might be a symptom of a relation
- If two column from different table have the same name and the same data type, they are probably a constraint
- hopefully who developed the database avoided implicit conversions
Don’t get me wrong, I’m the first who says that this reverse engineering is based on assumptions but the same is for blind SQL Injection, right? So I will call it Database Blind Reverse Engineering.
I posted a first proto-query on StackOverflow and Larnu came to help and gave me a query that could help me list not only the column name and the data type but also max length, precision, scale and number of duplicates:
WITH ColumnCount AS( SELECT s.name AS SchemaName, t.name AS TableName, c.name AS ColumnName, ct.[name] AS DataType, c.max_length, c.precision, c.scale, COUNT(c.column_id) OVER (PARTITION BY c.[name], ct.[name], c.max_length, c.precision, c.scale) AS Duplicates FROM sys.schemas s JOIN sys.tables t ON s.schema_id = t.schema_id JOIN sys.columns c ON t.object_id = c.object_id JOIN sys.types ct ON c.user_type_id = ct.user_type_id) SELECT * FROM ColumnCount CC WHERE CC.Duplicates > 1 ORDER BY CC.ColumnName, CC.SchemaName, CC.TableName;
That is a smoking hot query!
This are the results on AdventureWorks2019:
But I would like to see the big picture on Power BI.
One of the visual that first came to my mind is Chord that allows you to see at a glance the relations between objects.
In order to work Chord needs 3 things: From, To, Relation.
So let’s modify the query again and make it nicer:
;WITH ColumnCount AS ( SELECT s.name AS SchemaName ,t.name AS TableName ,c.name AS ColumnName ,ct.[name] AS DataType ,c.max_length ,c.precision ,c.scale ,COUNT(c.column_id) OVER ( PARTITION BY c.[name] ,ct.[name] ,c.max_length ,c.precision ,c.scale ) AS Duplicates FROM sys.schemas s JOIN sys.tables t ON s.schema_id = t.schema_id JOIN sys.columns c ON t.object_id = c.object_id JOIN sys.types ct ON c.user_type_id = ct.user_type_id ) SELECT cc.TableName + '.' + cc.ColumnName AS ForeignTableName ,cd.TableName + '.' + cd.ColumnName AS PrimaryTableName ,cd.TableName AS TargetTable ,cd.ColumnName AS TargetColumn ,1 Relationship --,cc.ColumnName, cc.DataType, cc.Duplicates FROM ColumnCount CC /*pk only joins - take ths out if you want all joins between fields, not just pk to fk */ --INNER JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE Col ON col.COLUMN_NAME = cc.ColumnName -- AND col.TABLE_NAME = cc.TableName --INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS Tab ON Col.Constraint_Name = Tab.Constraint_Name -- AND Col.Table_Name = Tab.Table_Name -- AND Constraint_Type = 'PRIMARY KEY' /*end of pk only joins */ LEFT JOIN ColumnCount Cd ON cc.ColumnName = cd.ColumnName AND cc.DataType = cd.DataType AND cc.TableName != cd.TableName WHERE CC.Duplicates > 1 ORDER BY CC.ColumnName ,CC.SchemaName ,CC.TableName;
This way it can nicely be digested by Power BI.
You now need to download the Chord Visual from Power BI and then set PrimaryTableName, ForeignTableName and Relationship as in picture.
I have then added a filter to the right that helps me target the table and the column name I want:
Once again my career and the wellness of my family are based on the existence of the StackOverflow community.
There are a bunch of liar out there that sell software solutions as Reverse Engineering but they simply sell you an ERD out of the box when the constraints are provided. Without constraints you are left alone.
I hope this might help you.