How to blind Reverse Engineer SQL Server database to ERD

Problem

I lately faced a problem: we are working on a complex ETL project that aims to migrate a database from DB2 to SQL Server.

DB2 stands for “it was made in the ’80s” when PK, FK and constraints were in the code and because we are stealing a customer from a concurrent we have no cooperation from those who have the code in their hands.

So how can we create an Entity Relationship Diagram without constraints and that isn’t there in the first place?

If you search on Google for “sql server reverse engineer diagram” you will stumble upon:

I have contacted each one of those and even if their website state that the do Reverse Engineering the reality is that “…they cannot if there are no constraints“.

I don’t know you but  if there are constraints (PK, FK) already in place what sort of Reverse Engineering should we do then? To me Reverse Engineering means that we have no idea of how the software works… and we need to find out how the software works. Of course you can create an ERD if you have PK and FK.

Solution

I had to start working on the code by myself.

When you do hacking or reverse engineering you need to try to put yourself in the shoes of those who create the code. So I start thinking, maybe we can start by assuming that:

  • the same column name in different tables might be a symptom of a relation
  • If two column from different table have the same name and the same data type, they are probably a constraint
  • hopefully who developed the database avoided implicit conversions

Don’t get me wrong, I’m the first who says that this reverse engineering is based on assumptions but the same is for blind SQL Injection, right? So I will call it Database Blind Reverse Engineering.

I posted a first proto-query on StackOverflow and Larnu came to help and gave me a query that could help me list not only the column name and the data type but also max length, precision, scale and number of duplicates:

WITH ColumnCount AS(
    SELECT s.name AS SchemaName,
           t.name AS TableName,
           c.name AS ColumnName,
           ct.[name] AS DataType,
           c.max_length,
           c.precision,
           c.scale,
           COUNT(c.column_id) OVER (PARTITION BY c.[name], ct.[name], c.max_length, c.precision, c.scale) AS Duplicates
    FROM sys.schemas s
         JOIN sys.tables t ON s.schema_id = t.schema_id
         JOIN sys.columns c ON t.object_id = c.object_id
         JOIN sys.types ct ON c.user_type_id = ct.user_type_id)
SELECT *
FROM ColumnCount CC
WHERE CC.Duplicates > 1
ORDER BY CC.ColumnName,
         CC.SchemaName,
         CC.TableName;

That is a smoking hot query!

This are the results on AdventureWorks2019:

SQL Server reverse engineer ERD

But I would like to see the big picture on Power BI.

One of the visual that first came to my mind is Chord that allows you to see at a glance the relations between objects.

In order to work Chord needs 3 things: From, To, Relation.

So let’s modify the query again and make it nicer:

;WITH ColumnCount
AS (
	SELECT s.name AS SchemaName
		,t.name AS TableName
		,c.name AS ColumnName
		,ct.[name] AS DataType
		,c.max_length
		,c.precision
		,c.scale
		,COUNT(c.column_id) OVER (
			PARTITION BY c.[name]
			,ct.[name]
			,c.max_length
			,c.precision
			,c.scale
			) AS Duplicates
	FROM sys.schemas s
	JOIN sys.tables t ON s.schema_id = t.schema_id
	JOIN sys.columns c ON t.object_id = c.object_id
	JOIN sys.types ct ON c.user_type_id = ct.user_type_id
	)
SELECT cc.TableName + '.' + cc.ColumnName AS ForeignTableName
	,cd.TableName + '.' + cd.ColumnName AS PrimaryTableName
	,cd.TableName AS TargetTable
	,cd.ColumnName AS TargetColumn
	,1 Relationship --,cc.ColumnName, cc.DataType, cc.Duplicates 
FROM ColumnCount CC
/*pk only joins - take ths out if you want all joins between fields, not just pk to fk */
--INNER JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE Col ON col.COLUMN_NAME = cc.ColumnName
--	AND col.TABLE_NAME = cc.TableName
--INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS Tab ON Col.Constraint_Name = Tab.Constraint_Name
--	AND Col.Table_Name = Tab.Table_Name
--	AND Constraint_Type = 'PRIMARY KEY' 
	/*end of pk only joins */
LEFT JOIN ColumnCount Cd ON cc.ColumnName = cd.ColumnName
	AND cc.DataType = cd.DataType
	AND cc.TableName != cd.TableName
WHERE CC.Duplicates > 1
ORDER BY CC.ColumnName
	,CC.SchemaName
	,CC.TableName;

This way it can nicely be digested by Power BI.

You now need to download the Chord Visual from Power BI and then set PrimaryTableName, ForeignTableName and Relationship as in picture.

SQL Server power BI ERD

I have then added a filter to the right that helps me target the table and the column name I want:

Conclusions

Once again my career and the wellness of my family are based on the existence of the StackOverflow community.

There are a bunch of liar out there that sell software solutions as Reverse Engineering but they simply sell you an ERD out of the box when the constraints are provided. Without constraints you are left alone.

I hope this might help you.

Related

Leave a Reply

You have to agree to the comment policy.