Obfuscating data through Data Scramble in Bulk

PROBLEM

SQL Server Data ScrambleAnd we arrive to the last chapter of this session about how to mask and obfuscate data.

This time we are going to learn how to do that through Data Scramble which basically means mixing up the data in the row.

As result you’ll have a some sort of unrecognizable data in your database.

Take it like a Lorem ipsum for databases.

SOLUTION

In the most hidden meander of the internet I found a very nice piece of code that can easily adapt to most of the data type.

You first need to create the function that we are going to use to scramble the data:

-------------------------------------------------

--A view to give you Random Values
CREATE VIEW dbo.random(value) AS SELECT RAND();

GO
-------------------------------------------------

--Randomization Procedure
CREATE FUNCTION dbo.fnRandomizedText (
@OldValue AS VARCHAR(MAX)
)RETURNS VARCHAR(MAX)

BEGIN

  DECLARE @NewValue AS VARCHAR(MAX)
  DECLARE @nCount AS INT
  DECLARE @cCurrent AS CHAR(1)
  DECLARE @cScrambled AS CHAR(1)
  DECLARE @Random AS REAL
 
  SET @NewValue = ''
  SET @nCount = 0
   WHILE (@nCount <= LEN(@OldValue))
  BEGIN
    SELECT @Random = value FROM random
    SET @cCurrent = SUBSTRING(@OldValue, @nCount, 1)
     IF ASCII(@cCurrent) BETWEEN ASCII('a') AND ASCII('z')
       SET @cScrambled = CHAR(ROUND(((ASCII('z') - ASCII('a') - 1) * @Random + ASCII('a')), 0))
    ELSE IF ASCII(@cCurrent) BETWEEN ASCII('A') AND ASCII('Z')
       SET @cScrambled = CHAR(ROUND(((ASCII('Z') - ASCII('A') - 1) * @Random + ASCII('A')), 0))
    ELSE IF ASCII(@cCurrent) BETWEEN ASCII('0') AND ASCII('9')
       SET @cScrambled = CHAR(ROUND(((ASCII('9') - ASCII('0') - 1) * @Random + ASCII('0')), 0))
    ELSE
       SET @cScrambled = @cCurrent

    SET @NewValue = @NewValue + @cScrambled
    SET @nCount = @nCount + 1

  END
   RETURN LTRIM(RTRIM(@NewValue))
END
GO 
-------------------------------------------------

Like in my previous post you first need to complete the Data Classification and once your column are classified you will find the query to execute in the last column.

This is the code for SQL Server 2017 and above:

SELECT 
    schema_name(O.schema_id) AS schema_name,
    O.NAME AS table_name,
    C.NAME AS column_name,
    information_type,
	label,
	rank,
	rank_desc
	,'UPDATE ' + schema_name(O.schema_id) + '.' + O.NAME + ' SET ' + C.NAME + ' = dbo.fnRandomizedText(' + C.NAME + ')'
FROM sys.sensitivity_classifications sc
    JOIN sys.objects O
    ON  sc.major_id = O.object_id
	JOIN sys.columns C 
    ON  sc.major_id = C.object_id  AND sc.minor_id = C.column_id

And this is for SQL Server 2016 and below:

SELECT schema_name(O.schema_id) AS schema_name
	,O.NAME AS table_name
	,C.NAME AS column_name
	,information_type
	,sensitivity_label
	,'UPDATE ' + schema_name(O.schema_id) + '.' + O.NAME + ' SET ' + C.NAME + ' = dbo.fnRandomizedText(' + C.NAME + ')'
FROM (
	SELECT IT.major_id
		,IT.minor_id
		,IT.information_type
		,L.sensitivity_label
	FROM (
		SELECT major_id
			,minor_id
			,value AS information_type
		FROM sys.extended_properties
		WHERE NAME = 'sys_information_type_name'
		) IT
	FULL OUTER JOIN (
		SELECT major_id
			,minor_id
			,value AS sensitivity_label
		FROM sys.extended_properties
		WHERE NAME = 'sys_sensitivity_label_name'
		) L ON IT.major_id = L.major_id
		AND IT.minor_id = L.minor_id
	) EP
JOIN sys.objects O ON EP.major_id = O.object_id
JOIN sys.columns C ON EP.major_id = C.object_id
	AND EP.minor_id = C.column_id
order by 1,2,3

This is the result, as you can see the last columns contains a list of queries that will make the job done for you.

SQL Server Data Scramble in bulk

As always you might stumble upon Cannot update identity column and this is normal, I let you fix it with:

  1. set identity_insert YourTable ON
  2. Paste your query
  3. set identity_insert YourTable OFF

CONCLUSION

Is like eggs and bacon, not everyone likes it in the morning but it definitely fill up your stomach.

If I have time I will write a very last query that can dynamically adapt to all data types and bypass constraints.

…but I don’t want to promise you that. Here in Switzerland is snowing and the ski resorts are opening. Holidays are coming and I want to dedicate a bit of time to improve my snowboarding skills.

I’m not any more in my twenties like Marcus Kleveland but I would like to try some park around here, we apparently have 410Km of track.

Related

Leave a Reply

You have to agree to the comment policy.