Obfuscating data through Data Shuffle in Bulk

PROBLEM

When Dynamic Data Masking is not an option you can go for Data Shuffle.

Data Shuffle is when you keep the data of your columns but you mix them together like in a card deck.

This will allow you to have as result real data that cannot be really backtracked to the original ones.

In the example below you can see how the order of the columns name, phone and country where mixed up. Who can tell what’s Martin’s phone and country?

SQl Server Data Shuffle

This obfuscation is perfect for User Acceptance Testing (UAT) environments because testers can still play with close-to-reality data.

It would be a blessing to have a query that could do that in bulk right?

SOLUTION

If you have already setup Data Classification on SQL Server 2017 or older you can now apply data shuffle in bulk thanks to this query:

And if you are on SQL Server 2016 or below you can execute this:

The last column scripts out the code for you for each column you have classified.

Keep in mind that you might receive error like Cannot update identity column and in that case you need to :

  1. set identity_insert YourTable ON
  2. Paste your query
  3. set identity_insert YourTable OFF

I preferred not to add this to the bulk scripts so you can choose what to do in those cases.

CONCLUSION

Is good if you need to have realistic data but keep in mind that if you apply data shuffle to columns like phone or e-mail those fields are still visible and they represent a data leak.

 

 

Related

Leave a Reply

Your email address will not be published. Required fields are marked *