About a year ago I wrote about a new feature in SSMS that allows you to add a data classification to your columns. Today we’ll discuss what happens behind the scene and how to look at the classifications later.
I have been working with in health care since September and data security is a very important topic; so, it’s always good to identify any potential PHI that you may have overlooked. It’s also a good idea to tag columns so others can easily understand which columns contain PHI and which do not. This is where the data classification feature comes in handy.
So you’ve run the report and accepted the recommendations and saved. Now what? How do you view them? Run the report again? What if I just want to get details about an specific column?
Before we get there I’d like to show what happens behind the scene. When you click “SAVE” on the classification report a few queries are run for each column you’ve accepted. What queries you ask? Let’s see:
exec sp_addextendedproperty @name=N’sys_information_type_name’,@level0type=N’schema’,
exec sp_addextendedproperty @name=N’sys_information_type_id’,@level0type=N’schema’,
exec sp_addextendedproperty @name=N’sys_sensitivity_label_name’,@level0type=N’schema’,
@level2type=N’column’,@level2name=N’City’,@value=N’Confidential – GDPR’
exec sp_addextendedproperty @name=N’sys_sensitivity_label_id’,@level0type=N’schema’,
That’s right. It’s adding the classification to the extended properties, which is good news because we can query for those. You’ll notice that there are 4 properties added and depending on if you use extended properties already may just add more noise. If you query SYS.EXTENDED_PROPERTIES, you’ll find it’s not overly helpful.
Here’s a quick query that you may find useful to get the columns and their classification:
SELECT TABLE_CATALOG as DBName,
TABLE_SCHEMA as SchemaName,
TABLE_NAME as TableName,
COLUMN_NAME as ColumnName,
ex.value as Classification,
CASE WHEN DATA_TYPE like ‘%char’ THEN CONCAT(DATA_TYPE, ‘(‘, isc.CHARACTER_MAXIMUM_LENGTH, ‘)’) ELSE DATA_TYPE END as DataType
FROM INFORMATION_SCHEMA.COLUMNS ISC
OUTER APPLY ::fn_listextendedproperty(‘sys_sensitivity_label_name’,’Schema’, isc.TABLE_SCHEMA, ‘Table’, isc.TABLE_NAME, ‘Column’, isc.COLUMN_NAME) ex
Now that we can query the classifications, you may consider using them to apply security for reports or to de-identify exported data.
Before you read too far, this is going to be a deeply personal post.Where do I begin? As many of you know I’ve been working on a UWP app to help query tuning in SQL Server and showed quite a few people an alpha of the tool at PASS Summit. My plans were to have the tool completed and published to the Windows Store by Christmas.
Shortly after Summit my Mom passed away unexpectedly. Since that time I’ve taken a bit of a break and have intended to post more blogs though they never seemed to get written. Blogging takes quite a bit of work and determination and I truly salute bloggers out there like Kenneth Fisher (@SQLStudent144).
My Mom raised me from 8th grade on and because she was a single mother we didn’t get to spend much time together to become close. In fact, we really weren’t too close at all. My oldest son unfriended her on Facebook when he was twelve because her posts on his embarrassed him. She’d often call or text me at 3 AM to say it’s raining in some random city. In all fairness she knew I traveled for work and she just wanted to reach out but these things are easy to push away from. At the funeral, my siblings all shared similar stories and it’s no wonder why only my youngest sibling was close to her.
It took a few days to set in and then it hit me. I told my wife I’d never be able to resync with her to let her know we all do weird things but we still love each other. She would often do strange things. For example: every year my mom would put wrapped chocolates in our shoes for St Nick’s day. I’ve always thought it silly and have not done this for my own children; however, this year we added this odd act to the tradition.
I suppose all I can say is that this has been difficult to write but also therapeutic. I want her to know that she was a great mom and I hope she felt so. I wish everyone in the #sqlfamily the best and want you to know that it’s never a good time to wait if you want to tell someone you care. Reach out to them and give a hug.
In case you aren’t familiar with #MSIgnite, it’s a huge event where Microsoft debuts all the new shiny software that you’re going to want your hands on immediately and the conference is happening right now and goes until Friday.
You can watch and learn all about the exciting new things at this link:
I’ve read a lot of things lately pointing to scalar functions as if they were the devil. In this blog I’m going to explore if that’s the case. Let’s have a look.
It’s true that in many situations a scalar function is often a performance bottleneck; but, is there a situation where they could be responsibly used?
What if you had a lookup table that almost never changed? Is it worth doing a join on the lookup to get the data you need?
Let’s examine a simple join between a customer address and a state lookup table.
We have the following tables
To join these you’d usually use a very simple inner join to get the state code, which could also be the state name which is longer than the 4 bytes for int. You may also include other metrics on the States table like geo data or capitol. For now let’s pretend like Abbr is much larger than State (int) so it would make sense to dump it to a lookup table.
Here’s what the join query may look like
To test the value of scalar functions we created the the following code (note that it cuts off but you get the point)
Now that we have a function to test with, we want to create a query that returns the same results as our join query. Let’s have a look.
How do these compare with an actual plan?
Wow! The query with the function has much less cost! Incredible results, right? Not so fast! Sometimes cost can be misleading. To perform a true test we’ll need to do a bit more. Let’s turn off the plan and turn on STATISTICS TIME.
206 ms total elapsed time for the join query.
Only 92 ms for the funtion query. That’s good!
Let’s examine the query stats to see how these two compare.
Here we find some unexpected results. The execution time was less for this single run; however, CPU was greater by a significant amount.
We should dig deeper. You may have not noticed in the query plans that the function isn’t represented in the plan at all. Note that this is SQL Server 2017 and the database is running in the latest compatibility level.
To find out more we’re going to look at the function stats DMV.
SELECT db_name(database_id) as DB,
object_name(object_id) as fn,
What we find is the function was executed 192 times. This is because the function must run once for each row in the result and it’s primarily this reason why scalar functions aren’t usually a great choice.
One final test is to run the queries 100 times. To do this, we’re going to use GO 100.
What have we learned? While the function may look great in the query plan it doesn’t stack up to actual runtime metrics. If there’s one thing a DBA should always say it’s “It depends” but in this case it doesn’t and you should instead add this quote to your toolbox, “You should never fully trust cost in query plans”.
Hello Louisiana! It’s that time of year again. I’ll be speaking on Saturday August 11th in Baton Rouge with many other exciting speakers from around the country. My topic is a performance troubleshooting guide for DBAs; but, all are welcome to attend. This year is going to be amazing so be sure to register and block your calendar. Check out the registration here:
It always seems that when I give a talk on performance there are 100+ people packed in the room but when it comes to security there’s 10 to 15 people. No one likes patches, unless you’re talking about a puppy named patches or something that’s not related to updates; yet, it’s so important that we all do them.
SQL is a stout language and SQL Server has so many features that it’s impossible to be an expert in everything. I see a lot of people gravitate towards performance; but, that’s not the best place to focus. If you don’t know the fundamentals of the language then you will never be the best performance tuner.
Today we’re going to look at GROUPING SETS. This is a neat feature that I have never seen anyone use. Perhaps because it’s value is limited for any application purposes where you can simply group in a report or data grid. Where I think I would have seen people using this is for admin and discovery related tasks.
Let’s say that your manager wants to know how much space is used per volume and by data file type. How would you proceed? One simple way is to query sys.master_files cut & paste the data to Excel. From there you’d need to do a few more operations and you’d have your report. Another way may be to use REPORT BUILDER to quickly generate a report. But both of these are much slower than just using SQL to get the groupings.
Here’s a quick example to get a total usage in MB by file type for each volume (when not using mount points):
SELECT type_desc as FileType,
LEFT(physical_name, 1) as Volume,
SUM(size*8/1024) as [Size MB]
GROUPING SETS (
(type_desc, left(physical_name, 1)),
Using GROUPING SETS gives us a subtotal for each grouping set. Here’s the result:
You could also use CUBE but the result would include an additional subtotal, which may be helpful.
SELECT type_desc as FileType, left(physical_name, 1) as Volume,
SUM(size*8/1024) as [Size MB]
GROUP BY CUBE(type_desc, left(physical_name, 1))