One size fits all Dynamic Data Masking

One size doesn’t fit all

Dynamic Data Masking

I can now define myself as an expert in Dynamic Data Masking.

I’ve used all aspect of it and I know the good, the bad and the ugly:

  • The Good: Data Classification is highly automated, as long as you tune the Information Protection Policy JSON file
  • The Bad: Data Masking is poorly automated. You can easily classify the data with a click… but you cannot apply the actual masking with a click. And once in place you cannot remove it with a click!
  • The Ugly:
      • If you remove the database AdventureWorks2019 and restore a new AdventureWorks2019 the Dynamic Data Masking is kept and automatically applied to the new database šŸ˜® .
      • If you apply a new Data Masking on a column which already has Data Masking in place the new Data Masking is applied without any warning (sure they will call this a feature)
      • On a column likeĀ [AdventureWorks2019].[Person].[PersonPhone] if you try to apply 'random(1, 12)' you will receive the error The data type of column ‘PhoneNumber’ does not support data masking function ‘random’. This means that there is a check on the content of the data (I suppose the parenthesis are not numbers hence they stop Data Masking). But if you then try to apply 'email()' there is no problem…
      • Dynamic Data Masking cannot be applied on constraint column. Why Microsoft? It doesn’t make any sense…. the Dynamic Data Masking is not recorded on the actual database. The proof is that if you restore a new database with the same name the Dynamic Data Masking is still there! Is just a mere masking that applies on SELECT. So why not allowing DDM on constraint columns?
      • Only 4 masking choices:
Function How it fits Examples
Default It can apply to any data type. It replace the the original value with XXXXXXX Example of alter syntax: ALTER COLUMN Gender ADD MASKED WITH (FUNCTION = 'default()')
Email It can apply to any data type. It replace the the original value with [email protected]. Example of alter syntax: ALTER COLUMN Email ADD MASKED WITH (FUNCTION = 'email()')
Random It can apply to any data type. It replace the the original value with a random value within a specified range. Example of alter syntax: ALTER COLUMN [Month] ADD MASKED WITH (FUNCTION = 'random(1, 12)')
Custom String It can apply to any data type. It replace the the original value with prefix,[padding],suffix Example of alter syntax: ALTER COLUMN [Phone Number] ADD MASKED WITH (FUNCTION = 'partial(1,"XXXXXXX",0)')

…and you cannot nest them together.

Let’s say you want to mask an IBAN number like CH5604835012345678009. The best way would be to keep the first 2 letters ('partial(2,"XXXXXXX",0)') and then create a random number of 19 digits ('random(1000000000000000000, 9999999999999999999)').
If we could nest them together the result should be something likeĀ  'partial(2,random(1000000000000000000, 9999999999999999999),0)' .
But there is no such way to do so…

Let’s at least fix the Bad

There is no one size fits all for Dynamic Data Masking but the closer I could get is this.

You can fix the Bad in SQL Server 2017 and above with this query:

SELECT 
    schema_name(O.schema_id) AS schema_name,
    O.NAME AS table_name,
    C.NAME AS column_name,
		 [Type] = 
    CASE 
      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
      ELSE ct.[name]
    END,
	information_type,
	label,
	rank,
	rank_desc
	,CASE
	/*	If we are targeting a National ID or a SSN number	*/
		WHEN information_type = 'National ID' or information_type = 'SSN'
		   THEN 'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'  
				ALTER COLUMN '+C.NAME+' '+
				    CASE 
				      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
				      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
				      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
				      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
				      ELSE ct.[name]
				    END
				+' MASKED WITH (FUNCTION = ''random(1, 12)'');'
	/*	If we are targeting a Name	*/
		WHEN information_type = 'Name'
		   THEN 'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'  
				ALTER COLUMN '+C.NAME+' '+
				    CASE 
				      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
				      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
				      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
				      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
				      ELSE ct.[name]
				    END
				+' MASKED WITH (FUNCTION = ''partial(1,".XXXX.XXXX.",1)'');'
	/*	If we are targeting an E-mail	*/
		WHEN information_type = 'Contact Info'
		   THEN 'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'  
				ALTER COLUMN '+C.NAME+' '+
				    CASE 
				      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
				      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
				      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
				      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
				      ELSE ct.[name]
				    END
				+' MASKED WITH (FUNCTION = ''email()'');'
	/*	For everything else */
		ELSE 'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'  
				ALTER COLUMN '+C.NAME+' '+
				    CASE 
				      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
				      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
				      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
				      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
				      ELSE ct.[name]
				    END
				+' MASKED WITH (FUNCTION = ''default()'');'
					END  as Add_Masking
	/*	Remove Masking	*/
		,'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'   
ALTER COLUMN '+C.NAME+' DROP MASKED; ' as Remove_Masking
FROM sys.sensitivity_classifications sc
    JOIN sys.objects O
    ON  sc.major_id = O.object_id
	JOIN sys.columns C 
    ON  sc.major_id = C.object_id  AND sc.minor_id = C.column_id
	JOIN sys.types ct ON C.user_type_id = ct.user_type_id
	--AND EP.minor_id = C.column_id
	order by 1,2,3

And for SQL Server 2016 and below you have this:

SELECT schema_name(O.schema_id) AS schema_name
	,O.NAME AS table_name
	,C.NAME AS column_name
	--,ct.[name] AS DataType
	--, c.max_length,
	--  c.precision,
	--  c.scale
	, [Type] = 
    CASE 
      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
      ELSE ct.[name]
    END
	,information_type
	,sensitivity_label
	,CASE
	/*	If we are targeting a National ID or a SSN number	*/
		WHEN information_type = 'National ID' or information_type = 'SSN'
		   THEN 'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'  
				ALTER COLUMN '+C.NAME+' '+
				    CASE 
				      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
				      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
				      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
				      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
				      ELSE ct.[name]
				    END
				+' MASKED WITH (FUNCTION = ''random(1, 12)'');'
	/*	If we are targeting a Name	*/
		WHEN information_type = 'Name'
		   THEN 'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'  
				ALTER COLUMN '+C.NAME+' '+
				    CASE 
				      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
				      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
				      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
				      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
				      ELSE ct.[name]
				    END
				+' MASKED WITH (FUNCTION = ''partial(1,".XXXXX.",1)'');'
	/*	If we are targeting an E-mail	*/
		WHEN information_type = 'Contact Info'
		   THEN 'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'  
				ALTER COLUMN '+C.NAME+' '+
				    CASE 
				      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
				      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
				      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
				      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
				      ELSE ct.[name]
				    END
				+' MASKED WITH (FUNCTION = ''email()'');'
	/*	For everything else */
		ELSE 'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'  
				ALTER COLUMN '+C.NAME+' '+
				    CASE 
				      WHEN ct.[name] IN ('varchar', 'char') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length AS VARCHAR(25))) + ')' 
				      WHEN ct.[name] IN ('nvarchar','nchar') THEN ct.[name] + '(' + IIF(c.max_length = -1, 'max', CAST(c.max_length / 2 AS VARCHAR(25)))+ ')'      
				      WHEN ct.[name] IN ('decimal', 'numeric') THEN ct.[name] + '(' + CAST(c.precision AS VARCHAR(25)) + ', ' + CAST(c.scale AS VARCHAR(25)) + ')'
				      WHEN ct.[name] IN ('datetime2') THEN ct.[name] + '(' + CAST(c.scale AS VARCHAR(25)) + ')'
				      ELSE ct.[name]
				    END
				+' MASKED WITH (FUNCTION = ''default()'');'
					END  as Add_Masking
	/*	Remove Masking	*/
		,'ALTER TABLE '+schema_name(O.schema_id)+'.'+O.NAME+'   
ALTER COLUMN '+C.NAME+' DROP MASKED; ' as Remove_Masking
		
FROM (
	SELECT IT.major_id
		,IT.minor_id
		,IT.information_type
		,L.sensitivity_label
	FROM (
		SELECT major_id
			,minor_id
			,value AS information_type
		FROM sys.extended_properties
		WHERE NAME = 'sys_information_type_name'
		) IT
	FULL OUTER JOIN (
		SELECT major_id
			,minor_id
			,value AS sensitivity_label
		FROM sys.extended_properties
		WHERE NAME = 'sys_sensitivity_label_name'
		) L ON IT.major_id = L.major_id
		AND IT.minor_id = L.minor_id
	) EP
JOIN sys.objects O ON EP.major_id = O.object_id
JOIN sys.columns C ON EP.major_id = C.object_id
JOIN sys.types ct ON C.user_type_id = ct.user_type_id
	AND EP.minor_id = C.column_id
	order by 1,2,3


As result you will have two columns: Add_Masking and Remove_Masking. Pretty clear what they do, right?

Dynamic Data Masking automation

While working on it I’ve also stumble upon a bug around the sys.sensitivity_classifications .

Let’s all go to bed now.

 

Related

Leave a Reply

You have to agree to the comment policy.