Loading...

Follow Thomas LaRock - Author, Speaker, Data Expert, .. on Feedspot

Continue with Google
Continue with Facebook
or

Valid

SQL injection is a common form of data theft. I am hopeful we can make SQL injection protection more common.

The 2018 TrustWave Global Security Report listed SQL Injection as the second most common technique for web attacks, trailing only cross-site scripting (XSS) attacks. This is a 38% increase from the previous year. That same report also shows SQL Injection ranked fifth on a list of vulnerabilities that can be identified through simple penetration testing.

You may look at the increase and think “whoa, attacks are increasing”. But I believe that what we are seeing is a rising awareness in security. No longer the stepchild, security is a first-class citizen in application design and deployment today. As companies focus on security, they deploy tools and systems to help identify exploits, leading to more reporting of attacks.

SQL Injection is preventable. That’s the purpose of this post today, to help you understand what SQL Injection is, how to identify when it is happening, and how to prevent it from being an issue.

SQL Injection Explained

SQL injection is the method where an adversary appends a SQL statement to the input field inside a web page or application, thereby sending their own custom request to a database. That request could be to read data, or download the entire database, or even delete all data completely.

The most common example for SQL injection attacks are found inside username and password input boxes on a web page. This login design is standard for allowing users to access a website. Unfortunately, many websites do not take precautions to block SQL injection on these input fields, leading to SQL injection attacks.

Let’s look at a sample website built for the fictional Contoso Clinic. The source code for this can be found at https://github.com/Microsoft/azure-sql-security-sample.

On the Patients page you will find an input field at the top, next to a ‘Search’ button, and next to that a hyperlink for ‘SQLi Hints’.

Clicking on the SQLi Hints link will display some sample text to put into the search field.

I’m going to take the first statement and put it into the search field. Here is the result:

This is a common attack vector, as the adversary can use this method to determine what version of SQL Server is running. This is also a nice reminder to not allow your website to return such error details to the end user. More on that later.

Let’s talk a bit about how SQL injection works under the covers.

How SQL Injection works

The vulnerability in my sample website is the result of this piece of code:

return View(db.Patients.SqlQuery
("SELECT * FROM dbo.Patients
WHERE [FirstName] LIKE '%" + search + "%'
OR [LastName] LIKE '%" + search + "%'
OR [StreetAddress] LIKE '%" + search + "%'
OR [City] LIKE '%" + search + "%'
OR [State] LIKE '%" + search + "%'").ToList());

This is a common piece of code used by many websites. It is building a dynamic SQL statement based upon the input fields on the page. If I were to search the Patients page for ‘Rock’, the SQL statement sent to the database would then become:

SELECT * FROM dbo.Patients
WHERE [FirstName] LIKE '%Rock%'
OR [LastName] LIKE '%Rock%'
OR [StreetAddress] LIKE '%Rock%'
OR [City] LIKE '%Rock%'
OR [State] LIKE '%Rock%'

In the list of SQLi hints on that page you will notice that each example starts with a single quote, followed by a SQL statement, and at the end is a comment block (the two dashes). For the example I chose above, the resulting statement is as follows:

SELECT * FROM dbo.Patients
WHERE [FirstName] LIKE '%' OR CAST(@@version as int) = 1 --%'
OR [LastName] LIKE '%' OR CAST(@@version as int) = 1 --%'
OR [StreetAddress] LIKE '%' OR CAST(@@version as int) = 1 --%'
OR [City] LIKE '%' OR CAST(@@version as int) = 1 --%'
OR [State] LIKE '%' OR CAST(@@version as int) = 1 --%'

This results in the conversion error shown above. This also means that I can do interesting searches to return information about the database. Or I could do malicious things, like drop tables.

Chance are you have code like this, somewhere, right now. Let’s look at how to find out what your current code looks like.

SQL Injection Discovery

Discovering SQL injection is not trivial. You must examine your code to determine if it is vulnerable. You must also know if someone is actively trying SQL injection attacks against your website. Trying to roll your own solution can take considerable time and effort.

There are two tools I can recommend you use to help discover SQL injection.

Test Websites with sqlmap

One method is to use sqlmap, an open-source penetration testing project that will test websites for SQL injection vulnerabilities. This is a great way to uncover vulnerabilities in your code. However, sqlmap will not tell you if someone is actively using SQL injection against your website. You will need to use something else for alerts.

Azure Threat Detection

If you are using Azure SQL Database, then you have the option to enable Azure Threat Detection. This feature will discover code vulnerabilities as well as alert you to attacks. It also checks for anomalous client login, data exfiltration, and if a harmful application is trying to access your database.

(For fairness, I should mention that AWS WAF allows for SQL injection detection, but their process is a bit more manual that Azure).

If you try to roll your own discovery, you will want to focus on finding queries that have caused errors. Syntax errors, missing objects, permission errors, and UNION ALL errors are the most common. You can find a list of the common SQL Server error message numbers here.

It warrants mentioning that not all SQL injection attacks are discoverable. But when it comes to security, you will never eliminate all risk, you take steps to lower your risk. SQL injection discovery is one way to lower your risk.

SQL Injection Protection

Detection of SQL Injection vulnerabilities and attacks are only part of the solution. In an ideal world, your application code would not allow for SQL Injection. Here’s a handful of ways you can lower your risk of SQL injection attacks.

Parameterize Your Queries

Also known as ‘prepared statements’, this is a good way to prevent SQL injection attacks against the database. For SQL Server, prepared statements are typically done using the sp_executesql() system stored procedure.

Prepared statements should not allow an attacker to change the nature of the SQL statement by injecting additional code into the input field. I said “should”, because it is possible to write prepared statements in a way that would still be vulnerable to SQL injection. You must (1) know what you are doing and (2) learn to sanitize your inputs.

Traditionally, one argument against the use of prepared statements centers on performance. It is possible that a prepared statement may not perform as well as the original dynamic SQL statement. However, if you are reading this and believe performance is more important than security, you should reconsider your career in IT before someone does that for you.

Use Stored Procedures

Another method available are stored procedures. Stored procedures offer additional layers of security that prepared statements may not allow. While prepared statements require permissions on the underlying tables, stored procedures can execute against objects without the user having similar direct access.

Like prepared statements, stored procedures are not exempt from SQL injection. It is quite possible you could put vulnerable code into a stored procedure. You must take care to compose your stored procedures properly, making use of parameters. You should also consider validating the input parameters being passed to the procedure, either on the client side or in the procedure itself.

Use EXECUTE AS

You could use a security method such as EXECUTE AS to switch the context of the user as you make a request to the database. As mentioned above, stored procedures somewhat act in this manner by default. But EXECUTE AS can be used directly for requests such as prepared statements or ad-hoc queries.

Remove Extended Stored Procedures

Disabling the use of extended stored procedures is a good way to limit your risk with SQL injection. Not because you won’t be vulnerable, but because you limit the surface area for the attacker. By disabling these system procedures you limit a common way that an attacker can get details about your database system.

Sanitize Error Messages

You should never reveal error messages to the end user. Trap all errors and redirect to a log for review later. The less error information you bubble up, the better.

Use Firewalls

Whitelisting of IP addresses is a good way to limit activity from anomalous users. Use of VPNs and VNETs to segment traffic can also reduce your risk.

Summary

The #hardtruth here is that every database is susceptible to SQL injection attacks. No one platform is more at risk than any other. The weak link here is the code being written on top of the database. Most code development does not emphasize security enough, leaving themselves open to attacks.

When you combine poor database security techniques along with poor code, you get the recipe for SQL Injection.

REFERENCES

2018 TrustWave Global Security Report
Contoso Clinic Demo Application
sqlmap: Automatic SQL injection and database takeover tool
Azure SQL Database threat detection
Working with SQL Injection Match Conditions
How to Detect SQL Injection Attacks
sp_executesql (Transact-SQL)
EXECUTE AS (Transact-SQL)
Server Configuration Options (SQL Server)

The post SQL Injection Protection appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Building upon my earlier post, today I want to share with you the updated graphic and links for the analytics and big data services offered by Microsoft Azure and Amazon Web Services.

It is my hope that this post will be a starting guide for you when you need to research these analytic and big data services. I have included relevant links for each service, along with some commentary, in the text of this post below. I’ve done my best to align the services, but there is some overlap between offerings. (Click image to embiggen)

I’m not going to do a feature comparison here because these systems evolve so quickly I’d spend all day updating the info. Instead, you get links to the documentation for everything and you can do your own comparisons as needed. I will make an effort to update the page as frequently as I am able.

Data Warehouse

Azure offerings: SQL Data Warehouse

AWS offerings: Redshift

It feels like these two services have been around forever. That’s because, in internet years, they have. Redshift goes back to 2012, and SQL DW goes back to 2009. That’s a lot of time for both Azure and AWS to learn about data warehousing as a service.

Data Processing

Azure offerings: HDInsight

AWS offerings: Elastic MapReduce

Both services are built upon Hadoop, and both are built to hook into other platforms such as Spark, Storm, and Kafka.

Data Orchestration

Azure offerings: Data Factory, Data Catalog

AWS offerings: Data Pipeline, AWS Glue

These are true enterprise-class ETL services, complete with the ability to build a data catalog. Once you try these services you will never BCP data again.

Data Analytics

Azure offerings: Stream Analytics, Data Lake, Databricks

AWS offerings: Lake Formation, Kinesis Analytics, Elastic MapReduce

I didn’t list Event Hubs here for Azure, but if you want to stream data you are likely going to need that service as well. And Kinesis is broken down into specific streams, too. (In other words, “Analytics” is an umbrella term, and is one of the most difficult things to compare between Azure and AWS).

Data Visualization

Azure offerings: PowerBI

AWS offerings: QuickSight

I saw some demos of QuickSight while at AWS re:Invent last fall, and it looks promising. It also looks to be slightly behind PowerBI at this point. Of course, we all know most people are still using Tableau, but that is a post for a different day.

Search

Azure offerings: Elasticsearch, Azure Search

AWS offerings: Elastisearch, CloudSearch

Elastisearch for both is just a hook to the Elastisearch open source platform. For Azure, you have to get that from their marketplace (that’s what I link to because I can’t find it anywhere else). One of the biggest differences I know between the services is the number of languages supported. AWS CloudSearch claims to support 34, and Azure Search claims to support 56.

Machine Learning

Azure offerings: Machine Learning Studio, Machine Learning Service

AWS offerings: SageMaker, DeepLens

DeepLens is a piece of hardware, but I wanted to call it out because you will hear it mentioned. When you use DeepLens you use a handful of AWS services such as SageMaker, Lambda, and S3 storage. I enjoyed using Azure Machine Learning Studio during my data science and big data certifications. But the same thing is true, you use associated services. This make price comparisons difficult.

Data Discovery

Azure offerings: Data Catalog, Data Lake Analytics

AWS offerings: Athena

Imagine a library without a card catalog and you need to find one book. That’s what your data looks like right now. I know you won’t believe this, but not all data is tracked or classified in any meaningful way. That’s why services like Athena and Data Catalog exist.

Pricing

Azure Pricing calculator: https://azure.microsoft.com/en-us/pricing/calculator/

AWS Pricing Calculator: https://calculator.aws/

Same as the previous post, you will find it difficult to do an apples-to-apples comparison between services. Your best bet is to start at the pricing pages for each and work your way from there.

Summary

I hope you find this page (and this one) useful for referencing the many analytic and big data service offerings from both Microsoft Azure and Amazon Web Services. I will do my best to update this page as necessary, and offer more details and use cases as I am able.

The post Updated Analytics and Big Data Comparison: AWS vs. Azure appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Last year I wrote a post comparing the data services offered by both AWS and Microsoft Azure. Well, there’s been some changes since, so it was time to provide an updated graphic and links.

Since both Microsoft Azure and Amazon Web Services offer many data services, I thought it worth the time to create a graphic to help everyone understand the services a bit more. Essentially, I wanted to build a cheat sheet for any data services comparison (click to embiggen):

You might notice that there is no Data Warehouse category. That category is located in the Analytics and Big Data comparison chart which I will share in a future post.

It is my hope that this post will be a starting guide for you when you need to research cloud data services. I’m not going to do a feature comparison here because these systems evolve so quickly I’d spend all day updating the info. Instead, you get links to the documentation for everything and you can do your own comparisons as needed. I hope to have future posts that help break down features and costs, but for now let’s keep it simple.

Relational

Azure offerings: SQL Database, Database for MySQL, Database for PostgreSQL, Database for MariaDB

AWS offerings: RDS, Aurora

RDS is an umbrella term, as it is six engines in total, and it includes Amazon Aurora, MySQL, MariaDB, Oracle, Microsoft SQL Server, and PostgreSQL. I’ve listed Aurora as a distinct offering because it is the high-end service dedicated to MySQL and PostgreSQL. Since Azure also offers those distinct services it made sense to break Aurora out from RDS. (Or, to put it another way, if I didn’t call out Aurora here you’d finish this post and say ‘what about Aurora’, and now you don’t have to ask that question.)

NoSQL – Key/Value

Azure offerings: Cosmos DB, Table Storage

AWS offerings: DynamoDB, SimpleDB

Cosmos DB is the major NoSQL player for Azure, as it does everything (key/value, document, graph) except relational. DynamoDB is a workhorse for AWS. SimpleDB is still around, but there are rumors it will be going away. This might be due to the fact that you cannot create a SimpleDB service using the AWS Console. So, short story, look for this category to be just Cosmos DB and DynamoDB in the future.

NoSQL – Document

Azure offerings: Cosmos DB

AWS offerings: DocumentDB

Azure used to offer DocumentDB, but that platform was sunset when Cosmos DB came alive. AWS recently launched DocumentDB with MongoDB compatibility in what some people see as a major blow to open source.

NoSQL – Graph

Azure offerings: Cosmos DB

AWS offerings: Neptune

As of May 2019, Neptune is in Preview, so the documentation is likely to change in the coming weeks months years (well, that’s my assumption, because Neptune has been in Preview since November 2018.) Cosmos DB uses a Gremlin API for graph purposes.

In-Memory

Azure offerings: Cache for Redis

AWS offerings: ElastiCache

Both of these services are built upon Redis, so the real question here is if you want to use Redis-as-a-service from a 3rd party provider as opposed to just using it Redis itself.

Time Series

Azure offerings: Time Series Insights

AWS offerings: Timestream

If you are in need of a time series database for your IoT collections, then both Azure and AWS have a service to offer. Azure Time Series Insights was launched in early 2017, and AWS announced Timestream in late 2018. In other words, the world of data services is moving fast, and the two major cloud providers are able to roll out services to meet growing demand.

Ledger

Azure offerings: [Sad Trombone]

AWS offerings: Quantum ledger Database

Setting aside the silliness of using the buzzword ‘Quantum’ in the name of this product, AWS does have a ledger database service available. As of May 2019, Azure does not offer a similar service.

Pricing

Azure Pricing calculator: https://azure.microsoft.com/en-us/pricing/calculator/

AWS Pricing Calculator: https://calculator.aws

I like using pricing as a way to start any initial comparison between data services. These calculators will help you focus on the important details. Not just costs, but how the technology works. For example, Azure SQL Database focuses on the concept of a DTU, which has no meaning in AWS. Using the calculators forces you to learn the differences between the two systems. It’s a great starting point.

That being said, trying to compare the data services offered by AWS and Azure can be frustrating. Part of me thinks this is done on purpose by both companies in an effort to win our favor without giving away more information than is necessary. This is a common practice, and I’m not bashing either company for doing what has been done for centuries. I’m here to help others figure out how to make the right choice for their needs. At the end of the day, I believe both Amazon and Microsoft want the same thing: happy customers.

By starting at the pricing pages I can then dive into the specific costs, and use that as a first level comparison between the services. If you start by looking at resource limits and maximums you will spend a lot of time trying to compare apples to oranges. Just focus on costs, those resources, throughput, and DR. That should be a good start to help you determine the cost, benefit, and risk of each service.

Summary

I hope you find this page useful for referencing the many data service offerings from both Microsoft Azure and Amazon Web Services. I will do my best to update this page as necessary, and offer more details and use cases as I am able.

The post Updated Data Services Comparison: AWS vs. Azure appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Not a day, week, or month goes by without news of yet another data breach.

And the breaches aren’t the result of some type of Mission Impossible heist. No, it’s often an unprotected S3 bucket, maybe some SQL Injection, or files left behind when relocating to a new office. Silly, fundamental mistakes made by people that should know better.

After decades of reviewing data breaches I have arrived at the following conclusion:

Data security is hard because people are dumb.

Don’t just take my word for it though. Do a quick search for “common password list” and you’ll see examples of passwords scraped from breaches. These are passwords often used by default to secure systems and data.

Chances are, these passwords are in your environment, right now.

Here’s what you can do to protect your data.

Use PWDCOMPARE() to Find SQL Logins With Weak Passwords

SQL Server ships with an internal system function, PWDCOMPARE(), that we can use to find SQL logins with weak passwords. We can combine this function, along with a list of weak passwords, and some PowerShell to do a quick check.

First, let’s build a list. I’ll store mine as a text file and it looks like this:

I can import that file as an array into PowerShell with one line of code:

$pwdList = Get-Content .\password_list.txt

And with just a few lines of code, we can build a query and execute against our instance of SQL Server:

foreach ($password in $pwdList) {
$SQLText = "SELECT name FROM sys.sql_logins WHERE PWDCOMPARE('$password', password_hash) = 1;"
Invoke-Sqlcmd -Query $SQLText -ServerInstance $SQLServer
}

And we find that the ITSupport login has a weak password:

As Dark Helmet once said, “Now you see that evil will always triumph, because good is dumb.”

Preventing Weak Passwords for SQL Logins

One of the easiest things you can do is to enable the CHECK_POLICY for SQL logins. By default, enabling the CHECK_POLICY option will also force the password expiration by enabling the CHECK_EXPIRATION flag. In other words, you can have passwords for SQL logins expire as if they were windows logins, and you can enforce complex passwords.

However, even with those checks enabled, I would advise you still do a manual check for weak passwords. Do not assume that by enabling the password policy checks that you are secure. In fact, you should do the opposite. You should take a stance of assume compromise. This is a fundamental aspect of modern Cybersecurity practices.

As a side note, I also want to point out that Troy Hunt has collected the passwords from many data breaches, and he has made the passwords searchable. Do yourself a favor and take some of the passwords you’ve used throughout the web and see if they have been exposed at some point.

Summary

SQL Server offers system functions to help you search for weak passwords, as well as policies to enforce complex passwords and password expiration. You should adopt a stance of “assume compromise” and be proactive about checking the passwords in your environment to make certain they are not considered weak.

[Hey there, dear reader, if you liked this post about passwords and data security, then you might also like the full day training session I am delivering with Karen Lopez in two weeks at SQL Konferenz. The title is Advanced Data Protection: Security and Privacy in SQL Server, and you’ll learn more about how to protect your data at rest, in use, and in motion.]

The post Use PWDCOMPARE() to Find SQL Logins With Weak Passwords appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The 28th of January is Data Privacy Day. Originally started as Data Protection Day in Europe in 2007, Data Privacy Day started here in the USA in 2008. The National Cyber Security Alliance (NCSA) leads the effort in making this a recognized day each year. You don’t need me to remind you that the 28th of January is the anniversary of the Council of Europe’s Convention 108 for the Protection of individuals with regard to automatic processing of personal data. In other words, the formation of what was to become GDPR today.

If you read this post about GDPR and data privacy you will understand that I adore the idea of having a Data Privacy Day. I plan on celebrating by rotating passwords and drilling holes in my old hard drives and devices. Seriously.

I decided that in the spirit of celebration for Data Privacy Day I would share with you some common tips, tricks, and advice on keeping your data private. You’re welcome.

24 Ways to Protect Your Data on Data Privacy Day

– Assume your data is at risk, at all times.

– Think about worse-case scenarios. What would happen if someone stole your AppleID? Walk through the scenario and take steps to minimize your risk.

– Think twice before discussing private things in public spaces, such as a coffee shop. You never know who might be listening.

– Private messages aren’t. They are stored on many servers. Everything you do online is recorded, many times, in many places.

– If it’s not a picture you want to share publicly, don’t take it. Assume everything on your phone is stored elsewhere and discoverable.

– Free email services aren’t free. You pay for the service when you consent to allow your emails to be used for targeted ads and spam.

– Free public WiFi is not secure.

– If you are using public WiFi then use a VPN. Make sure your communication channels are encrypted.

– Use a password manager such as 1Password to help manage and rotate your passwords.

– Don’t reuse passwords across multiple sites.

– Two-factor authentication is your friend and a better friend than just a strong password.

– Use different usernames across the internet. Reusing usernames allows for someone to track you and possibly gather enough data to hack into an account and even steal your online identity.

– You know those security questions that ask about your favorite food? It’s OK to make up an answer. Better yet, use a strong password for the answer.

– Your data is valuable, you have the right to keep it private. When Best Buy asks for your phone number just say no.

– Read the privacy notices for social media sites and look for the words “share” and “use”, this will tell you what they are doing with your data.

– Resist the urge to overshare. You don’t need to tell everyone where you are, where you are going, and for how long.

– Delete online accounts that you are not using. It’s OK to say goodbye to Tumblr and Flickr. In fact, if the site can’t spell, they aren’t likely to be protecting your privacy. Just avoid them.

– Avoid websites that don’t have HTTPS. It’s 2018, there’s no excuse for a website to not be using SSL.

– Think twice before clicking on the link in an email. Think three times before visiting shady (or seedy) websites.

– Shared drive services such as OneDrive make it easy to share documents. But if you email a link to one person, and someone gets access to that email, then they could have access to your files, too.

– Double check the names on that email, make sure you are sending it to the right people.

– If sending to a large group of people, consider using BCC to hide the email addresses.

– Use applications and extensions such as Disconnect.me to help avoid your online activities being tracked.

– When you dispose of old computers and devices, drill a hole through the hard drives.

Summary

Data privacy and security is important for everyone. It’s also hard work staying on top of everything necessary to protect your data. Unless you are going to move to a shack in Montana, you need to be willing to put in the effort. Otherwise, you should expect that at some point you will find yourself a victim.

The post Happy Data Privacy Day! appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The conservation of quantum information theory states information can neither be created nor destroyed. Stephen Hawking used this theory to explain how a black hole does not consume photons like a giant cosmic eraser. It is clear to me that neither Stephen Hawking, nor any quantum physicist, has ever worked in IT.

Outside the realm of quantum mechanics we have the physical world of corporate offices. And in the physical world information is generated, curated, and consumed at an accelerated pace with each passing year. The similarity between both realms? Data is never destroyed.

We are now a nation, and a world, of data hoarders.

Thanks to popular processes such as DevOps, we obsess over telemetry and observability. System administrators are keen to collect as much diagnostic information as possible to help troubleshoot servers and applications when they fail. And the Internet of Things has a billion devices broadcasting data to be easily consumed into Azure and AWS.

All of this data hoarding is leading to an accelerated amount of ROT (Redundant, Outdated, Trivial information).

Stop the madness.

It’s time to shift our way of thinking about how we collect data. We need to become more data-centric and do less data-hoarding.

Becoming data-centric means you define goals or problems to solve BEFORE collecting or analyzing data. Once defined, you begin the process of collecting the necessary data. You want to collect the right data to help you make informed decisions about what actions are necessary.

Three Ways to Become Data-Centric

Here are three things you can start today in an effort to become data-centric. No matter what your role, these three ways will help put you on the right path.

Start with the question you want answered. This doesn’t have to be a complicated question. Something simple as, “How many times was this server rebooted?” is a fine question to ask. You could also ask, “How long does it take for a server to reboot?” These examples are simple questions, yes. But I bet your current data collections do not allow for simple answers without a bit of data wrangling.

Have an end-goal statement in mind. Once you have your question(s) and you have settled on the correct data to be collected, you should think about the desired output. For example, perhaps you want to put the information into a simple slide deck. Or maybe build a real-time dashboard inside of Power BI. Knowing the end goal may influence how you collect your data.

Learn to ask good questions. Questions should help to uncover facts, not opinions. Don’t let your opinions affect how you collect or analyze your data. It is important to understand how assumptions form the basis for many questions. It’s up to you to decide if those assumptions are safe. To me, assumptions based upon something measurable are safe. For example, your gut may tell you that server reboots are a result of O/S patches applied too often. Instead of asking, “How often are patches applied?” a better question would be, “How many patches need a reboot?” then compare that number to the total number of server reboots.

Summary

When it comes to data, no one is perfect. These days, data is easy to come by, making it a cheap commodity. When data is cheap, attention becomes a premium. By shifting to a data-centric nature, you can avoid data hoarding and the amount of ROT in your enterprise. With just a little bit of effort, you can make things better for yourself, your company, and help set the example for everyone else.

The post Three Ways to Become Data-Centric appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

At VMworld in Barcelona this year there arose a question regarding SQL Server Standard edition and if it is NUMA aware. I was certain the answer was “yes”, but it was pointed out to me that the documentation says otherwise.

Sure enough, here is the relevant piece of information from https://docs.microsoft.com/en-us/sql/sql-server/editions-and-components-of-sql-server-2016?view=sql-server-2017:

This was a topic for discussion because I’m always reminding people about the benefits of being able to run a SQL Server workload inside of a single NUMA node when possible. So I was taken aback when people were pointing out that SQL Server Standard edition was not NUMA aware.

It didn’t take long for me to find some relevant links about SQL Server and NUMA, because I’ve got a list of posts regarding SQL Server 2016. At the bottom of that post is a link to this post by Bob Ward:

How It Works (It Just Runs Faster): Auto Soft NUMA

Bob clearly talks about SQL Server Standard edition and soft NUMA in the post. However, there is also a quote in there that is worth noting:

“Standard Edition and CAL based licensing can restrict how many processors SQL Server can use.”

Bob Ward

And thus, we start to understand why the documentation suggests that SQL Server Standard edition is not NUMA aware. It’s because Standard has limits on the amount of hardware available.

This is leading to confusion for SQL Server customers. It would be better for Microsoft to update the documentation to reflect that SQL Server Standard is NUMA aware. Perhaps add an additional footnote, as they have footnotes for other features in that same section.

I like that idea so much I decided to do my first pull request for Microsoft documentation.

Here’s hoping they like my suggestion enough to consider updating the documentation and remove the confusion for customers.

The post Yes, SQL Server Standard Edition is NUMA Aware appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

During his keynote at AWS re:Invent, Andy Jassy made some statements that seemed…questionable. Well, questionable to me, at least. Not surprising, the questionable statements focused on databases, data services, and storage.

If you are interested in watching the keynote for yourself, you can see it here: https://youtu.be/ZOIkOnW640A

AWS re:Invent 2018 - Keynote with Andy Jassy - YouTube

The keynote is 2 hours and 44 minutes. It’s not action packed, so I recommend you adjust the speed to 1.5x. Doing that will save you an hour of viewing time. YouTube offers a transcript as well, making it easy to grab the quotes.

Now, I’m not writing this post to make Jassy or AWS look like fools in any way. The keynote is long, filled with a lot of wonderful information. AWS is doing wonderful things with databases and data services. I’m a fan of all things data.

What I have here today are a handful of statements, out of a very long keynote. I found these statements to be unfair. As someone who works in marketing, I know how keynotes work. But as a data professional, and Azure fanbois, I don’t like seeing bad information presented as truth.

Thus, today’s post is my effort to fact-check the statements that irked me the most.

You’re welcome.

Let’s get started.

AWS has 11 relational and non-relational databases. Which is much more than you’ll find anywhere else, nobody has close to half of that.

Well, AWS has 13 databases now, because later in the keynote Jassy announced Timestream and QLDB. But let’s focus on the original 11, and the statement that “nobody has close to half that”.

The 11 databases that Jassy refers to are listed on the screen behind him are as follows: RDS (MySQL, PostgreSQL, MariaDB, Oracle, SQL Server), Aurora (MySQL, PostgreSQL), DynamoDB, ElastiCache (Memcached, Redis), and Neptune.

That’s a confusing list to me, because it does not include SimpleDB, or Redshift. And Aurora is counted twice, but Aurora is really just RDS but at a higher performance tier. I don’t see how Jassy can count Aurora as something different, but he’s probably using SKU math that folks with MBAs like to use.

So, let’s count up the databases available in Azure today. And to keep it fair, I will also go by SKU, and leave off the Azure SQL Data Warehouse service.

Azure SQL Database
Azure Database for MySQL
Azure Database for PostgreSQL
Azure Database for MariaDB
Azure Cosmos DB
Azure Cache for Redis

So, that’s six. Last I checked, 6 is more than half of 11. But we are not done yet.

Cosmos DB is really three engines, and the AWS equivalent to Cosmos DB are two engines: (Dynamo, Neptune). I wrote about this earlier in 2018, for reference. So, Azure offers you one SKU, and AWS offers you two. But if we break Cosmos DB out, then Azure has 8 database services. And 8 is also more than half of 11.

More to the point, what does it matter if AWS has two databases (Dynamo, Neptune) and Azure only has Cosmos DB? I fail to understand why the number of databases offered is as important as the functionality that those services offer. At the end of the day, functionality is what should matter most for those “builders” that AWS is coveting.

I get that counting the number of databases is a convenient metric. It’s also useless.

Ok, let’s move on to the next.

In AWS, it’s the only place where you have a database migration service that allows you to switch from SQL to NoSQL or actually be able to migrate your data warehouse.

Well, Jassy certainly makes it sound easy to switch between relational and non-relational. Just a few clicks, export tables to JSON, and you are done, right? Maybe….maybe not.

The AWS Data Migration Service (DMS) documentation doesn’t talk about this “SQL toNoSQL” functionality. I did, however, find this other documentation that states you can use DynamoDB as a target for DMS, and have a relational database as a source. And then this page describes that you can use DMS to extract your database to S3 buckets, which are then imported into Redshift.

So, yeah, his statement is true. He just doesn’t talk about the nightmare of deconstructing your relational database prior to the migration. Note he didn’t use the phrase “only” here, as Azure offers a robust Data Migration Service, along with a playbook for data migrations that includes sources such as Cassandra and Access (sources not offered by AWS, by the way).

AWS has 11 different ways to get your data into the cloud depending on the nature of your data and your application. Nobody else has a little bit more than half of that.

This is the list of data transfer services on stage when Jassy makes this statement:

AWS Direct Connect
AWS Snowball
AWS Snowball Edge
AWS Snowmobile
AWS Storage Gateway
Amazon Kinesis Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Amazon S3 Transfer Acceleration
AWS DataSync
AWS Transfer for SFTP

My first issue here is counting Kinesis three times. That seems to be a bit of a stretch, but OK. Oh, and Kinesis is listed under “Analytics”, not with the migration products. Warrants mentioning.

Now, let’s consider similar offerings from Azure. I’ll use the same method of accounting that AWS did for that slide.

Azure Data Box
Azure Data Box Disk
Azure Data Box Heavy
Azure Data Factory
Azure Event Hubs
SQL Server Stretch Database
Azure StorSimple
Azure VPN Gateway

That’s 8, and 8 is more than half of 11.

Amazon Neptune which we launched here a year ago and it’s off to really a raring start.

Amazon Neptune is currently ranked #129 on the DB-Engines rankings. Not exactly a fiery meteor cutting a path to the top of the leaderboard. But watch out Db4o, Neptune has you in their sights!

S3 is the most secure object store. It’s the only object store that allows you to audit any access to an object.

I don’t know what Jassy means by “most secure”. And the phrase “audit” can mean many different things. But Azure offers a lot of security features as well as logging.

S3 is the only object store that allows you to do cross region replication.

This is false. Azure Storage has offered this feature for years. No, I don’t know why or how a statement this false was allowed in the keynote. It’s disappointing.

The world of databases in the Old Guard commercial grade databases has been a miserable world for the last couple decades.

I won’t argue otherwise, but I would say that it’s not just the world of databases.

Warrants mentioning that those miserable databases are the exact ones that AWS wants to host for you. In other words, “commercial databases are miserable unless you are using them in our cloud”.

Seems legit.

Summary

OK AWS, listen up. You’ve got a great set of services for data and databases. And a lot of stuff said in the keynote is true, too. For example, you have the most powerful GPU offering on the market. You are a leader in many areas of cloud computing.

You don’t need to resort to these tactics, where you stretch the truth in order to make a point. Just focus on the awesome stuff you have. Talk about the wonderful support you offer your customers. You’re better than this.

When I hear statements like the ones above, it makes me think twice about all of the messages that are coming out of AWS.

I know that these are a handful of statements in a long keynote. But I still believe this was a poor effort on your part. A simple 5 minutes of research to compare and contrast services would have fixed everything above.

Hugs.

(Please don’t read this and decide to delay delivery of our Christmas gifts.)

References

https://azure.microsoft.com/en-us/services/
https://aws.amazon.com/products/
https://db-engines.com/en/ranking
https://thomaslarock.com/2018/03/azure-versus-aws-data-services-comparison/
https://thomaslarock.com/2018/03/azure-vs-aws-analytics-and-big-data-services-comparison/

The post Fact Checking Some Statements from the AWS re:Invent Keynote appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

“Too many secrets.” – Martin Bishop

One of the pivotal moments in the movie Sneakers is when Martin Bishop realizes that they have a device that can break any encryption methodology in the world.

Now 26 years old, the movie was ahead of its time. You might even say the movie predicted quantum computing. Well, at the very least, the movie predicts what is about to unfold as a result of quantum computing.

Let me explain, starting with some background info on quantum computing.

Quantum computing basics

To understand quantum computing, we must first look at how traditional computers operate. No matter how powerful, standard computing operates on binary units called “bits.” A bit is either a 1 or a 0, on or off, true or false. We’ve been building computers based on that architecture for the past 80 or so years. Computers today are using the same bits that Turing invented to crack German codes in World War II.

That architecture has gotten us pretty far. (In fact, to the moon and back.) But it does have limits. Enter quantum computing, where a bit can be a 0, a 1, or a 0 and a 1 at the same time. Quantum computing works with logic gates, like classic computers do. But quantum computers use quantum bits, or qubits. With one qubit, we would have a matrix of four elements: {0,0}, {0,1}, {1,0}, or {1,1}. But with two qubits, we get a matrix with 16 elements, and at three qubits we have 64. For more details on qubits and gates, check out this post: Demystifying Quantum Gates—One Qubit At A Time.

This is how quantum computers outperform today’s high-speed supercomputers. This is what makes solutions to complex problems possible. Problems today’s computers can’t solve. Things like predicting weather patterns years in advance. Or comprehending the intricacies of the human genome.

Quantum computing brings these insights, out of reach today, into our grasp.

It sounds wonderful! What could go wrong?

Hold that thought.

Quantum Supremacy

Microsoft, Google, and IBM all have working quantum computers available. There is some discussion about capacity and accuracy, but they exist.

And they are getting bigger.

At some point in time, quantum computers will outperform classical computers at the same task. This is called “Quantum Supremacy.”

The following chart shows the linear progression in quantum computing for the past 20 years.

(SOURCE: Quantum Supremacy is Near, May 2018)

There is some debate about the number qubits necessary to achieve Quantum Supremacy. But many researchers believe it will happen within the next eight years.

So, in a short period of time, quantum computers will start to unlock answers to many questions. Advances in medicine, science, and mathematics will be within our grasp. Many secrets of the Universe are on the verge of discovery.

And we are not ready for everything to be unlocked.

Quantum Readiness

Quantum Readiness is the term applied to define if current technology is ready for quantum computing impacts. One of the largest impacts to everyone, on a daily basis, is encryption.

Our current encryption methods are effective due to the time necessary to break the cryptography. But quantum computing will reduce that processing time by an order of magnitude.

In other words, in less than ten years, everything you are encrypting today will be at risk.

Everything.

Databases. Emails. SSL. Backup files.

All of our data is about to be exposed.

Nothing will be safe from prying eyes.

Quantum Safe

To keep your data safe, you need to start using cryptography methods that are “Quantum-safe.”

There’s one slight problem—the methods don’t exist yet. Don’t worry, though, as we have “top men” working on the problem right now.

The Open Quantum Safe Project, for example, has some promising projects underway. And if you want to watch mathematicians go crazy reviewing research proposals during spring break, the PQCrypto conference is for you.

Let’s assume that these efforts will result in the development of quantum-safe cryptography. Here are the steps you should be taking now.

First, calculate the amount of time necessary to deploy new encryption methods throughout your enterprise. If it takes you a year to roll out such a change, then you had better get started at least a year ahead of Quantum Supremacy happening. Remember, there is no fixed date for when that will happen. Now is your opportunity to take inventory of all the things that require encryption, like databases, files, emails, etc.

Second, review the requirements around your data retention policies. If you are required to retain data for seven years, then you will need to apply new encryption methods on all of that older data. This is also a good time to make certain that data older than your policy is deleted. Remember, you can’t leave your data lying around—it will be discovered and decrypted. It’s best to assume that your data will be compromised and treat it accordingly.

One thing worth mentioning is that some data, such as emails, are (possibly) stored on the servers they touch as they traverse the internet. We will need to trust that those responsible for the mail servers are going to apply new encryption methods. Security is a shared responsibility, after all. But it’s a reminder that there are still going to be things outside your control. And maybe reconsider the data that you are making available and sharing in places like private chat messages.

Summary

Don’t wait until it’s too late. Data has value, no matter how old. Just look at the spike in phishing emails recently, where they show you an old password and try to extort money. Scams like that work, because the data has value, even if it is old.

Start thinking how to best protect that data. Build yourself a readiness plan now so that when quantum cryptography happens, you won’t be caught unprepared.

Otherwise…you will have no more secrets.

The post Quantum Computing Will Put Your Data at Risk appeared first on Thomas LaRock.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

I’m 98% confident if you ask three data scientists to define Artificial Intelligence (AI), you will get five different answers.

The field of AI research dates to the mid-1950s, and even earlier when you consider the work of Alan Turing in the 1940s. So, the phrase “AI” has lasted for 60+ years, or roughly the amount of time since the last Cleveland Browns championship.

My preference for a definition to AI is this one, from Elaine Rich in the early 1990s:

“The study of how to make computers do things which, at the moment, people do better.”

But there is also this quote from Alan Turing, in his effort to describe computer intelligence:

“A computer would deserve to be called intelligent if it could deceive a human into believing it was human.”

Here’s my take.

Defining Artificial Intelligence

When I try to define AI, I combine the two thoughts:

“Anything written by a human that allows a machine to do human tasks.”

This, in turn, allows humans to find more tasks for machines to do on our behalf. Because we’re driven to be lazy.

Think about the decades spent finding ways to build better programs, and the automation of traditional human tasks. We built robots to build cars, vacuum our house, and even flip burgers.

It the world of IT, alerting is one example of where automation has shined. We started building actions, or triggers, to fire in response to alert conditions. We added triggers until we reached a point where human intervention was necessary. And then we would spend time trying to figure out a way to remove the need for a person.

This means if you ever wrote a piece of code with IF-THEN-ELSE logic, you’ve written AI. Any computer program that follows rule-based algorithms is AI. If you ever built code that has replaced a human task, then yes, you built AI.

But for many in the field of AI research, AI means more than simple code logic. It also means things like image recognition, text analysis, or a fancy “Bacon/Not-Bacon” app on your phone. AI also means talking robots, speech translations, and predicting loan default rates.

AI means so many different things to different people because AI is a very broad field. The field contains both Machine Learning and Deep Learning, as shown in this diagram:

That’s why you can find one person who thinks of AI as image classification, but another person who thinks AI is as simple as a rules-based recommendation engine. So, let’s talk about those subsets of AI called Machine Learning and Deep Learning.

Machine Learning for Mortals

Machine Learning (ML) is a subset of AI. ML offers the ability for a program to apply statistical techniques to a dataset and arrive at a determination. We call this determination a prediction, and yes, this is where the field of predictive analytics resides.

The process is simple enough: you collect data, you clean data, you classify your data, you do some math, you build a model. This model is necessary to make predictions upon similar sets of data. This is how Netflix knows what movie you want to watch next, or how Amazon knows what additional products you would want to add to your cart.

But ML requires a human to provide the input. It’s a human task to define the features used in building the model. Humans are the ones to collect and clean the data used to build the model. As you can imagine, humans desire to shed themselves of some tasks that are better suited for machines, like determining if an image is a chihuahua or a muffin.

Enter the field of Deep Learning.

Deep Learning Demystified

The first rule of Deep Learning (DL) is this: You don’t need a human to input a set of features. DL will identify features from large sets of data (think hundreds of thousands of images) and build a model without the need for any human intervention thankyouverymuch. Well, sure, some intervention is needed. After all, it’s a human that will need to collect the data, in the example above some pictures of chihuahuas, and tell the DL algorithm what each picture represents.

But that’s about all the human needs to do for DL. Through the use of Convoluted Neural Networks, DL will take the data (an image, for example), break it down into layers, do some math, and iterate through the data over and over to arrive at a predictive model. Humans will adjust the iterations in an effort to tune the model and achieve a high rate of accuracy. But DL is doing all the heavy lifting.

DL is how we handle image classifications, handwriting recognition, and speech translations. Tasks once suited for humans are now reduced to a bunch of filters and epochs.

Summary

Before I let you go, I want to mention one thing to you: beware companies that market their tools as being “predictive” when they aren’t using traditional ML methods. Sure, you can make a prediction based upon a set of rules; that’s how Deep Blue worked. But I prefer tools that use statistical techniques to arrive at a conclusion.

It’s not that these companies are knowingly lying, it’s just that they may not know the difference. After all, the definitions for AI are muddy at best, so it is easy to understand the confusion. Use this post as a guide to ask some probing questions.

As an IT pro, you should consider use cases for ML in your daily routine. The best example I can give is the use of linear regression for capacity planning. But ML would also help to analyze logs for better threat detection. One caveat though: if the model doesn’t include the right data due to a specific event not observed, then the model may not work as expected.

That’s when you realize that the machines are only as perfect as the humans that program them.

And this is why I’m not worried about Skynet.

This post originally appeared on Orange Matter.

The post Why I’m Not Worried About Skynet appeared first on Thomas LaRock.

Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview