In this age of data breaches and cyber-threats, the best way to secure your environment is through layers. Fortunately, this multi-faceted solution doesn’t have to be complicated. Syncsort has released their “Layers of Data Security” white paper to show some potential layers that an organization may choose to protect when thinking about a security plan.
To be effective and as close as possible to foolproof, data security must be layered. Some types of threats can be blocked only at certain layers, and omitting the relevant layers will result in there being no protection against those types of threats.
See how the most critical IT systems shouldn’t have to rely on a single approach to security. The strategies and research here can help your company stay resilient by removing the most common security vulnerabilities.
Welcome back to Syncsort Summer School. Take advantage of this traditionally slower business season to catch up on new technologies or trends to help you perform your job better and justify your next raise! We’re picking up this series of blog posts where we left, starting with everything you need to know about Capacity Management.
If you’re just getting started, we’ve got the perfect primer for you! This article covers the basics of capacity management. You’ll also walk away with a better understanding of how it helps align IT resources with larger business goals.
Capacity Management is Good for Business
In the age of Big Data, you need to consider tools that can improve business efficiency. Capacity management helps business to achieve two main objectives:
Capacity management delivers a vast amount of information about IT resources and their utilization, including enabling machine learning programs to perform analytics in the background for the reporting of “Time to Live” until a resource is exhausted. The key to the performance of this analysis is the setting of thresholds, whether those thresholds are static or self-learned. (Read more on capacity management’s role in Machine Learning and AI)
Extra credit: Check out our on-demand webcast, “The Changing Landscape of Capacity Management for the Mainframe”
Keeping Your Cloud in Check
One of the benefits of cloud storage is the idea of endless space. Some assume managing capacity is no longer necessary because if you hit your limit, you just add in more capacity from this infinitely elastic cloud. But even with physical limitations removed, you’re likely still facing a financial ceiling.
Cloud capacity management can help you balance your technical and financial needs to ensure you’re getting what you need and not paying more than you should.
Implementing or maturing a capacity management process takes executive buy-in, proper planning and the tools to make it possible – and it helps when you get to enjoy a significant return on investment from the process!
Download our eBook to discover what defines a mature capacity management process and key takeaways to become best in class.
Cross-Platform Capacity Management Software
Capacity management software, such as Syncsort’s Athene, provides automation around key processes and requires little-to-no mainframe expertise to operate in a cost-effective manner.
As a cross-platform solution, Athene allows organizations to bring data into a centralized Capacity Management Information System (CMIS) from all components that comprise a service and provides a 360° view of those services in a single dashboard. It also has predictive analytics that give organizations insight into how the mainframe is performing today and will perform in the future, even with changes in the hardware or the workload.
You know a data quality strategy is important. But do you know how to assess how good of a job your company is doing at achieving data quality? Keep reading for tips on developing a data quality scorecard for your organization.
Data quality refers to the ability of a given set of data to serve a specific purpose. Achieving data quality is important because, without it, you will struggle to put your data to work for you. In fact, a lack of data quality could mean that your data causes you more headaches than it’s worth.
Data quality problems result from issues like inconsistent data formatting, redundant or missing entries within databases and a lack of data structure.
Implementing Data Quality Strategy
In order to maximize data quality, your company should have an overall data quality strategy in place. Although the technical dimensions of data quality control can usually be addressed only by engineers, there should be a plan for enforcing best practices related to data quality throughout the organization.
After all, virtually every employee comes into contact with data in one form or another these days. That’s why data quality is everyone’s responsibility.
Assessing Data Quality
When you make data quality the responsibility of the entire organization, it’s important to assess on an ongoing basis, how well the organization is doing at maximizing data quality. Otherwise, you have no way of knowing how much benefit you are reaping from your data quality strategy, or determining how to make it better.
You can evaluate the effectiveness of your data quality operations by tracking the following metrics:
Data analytics failure rates
The most obvious and direct measure of data quality is the rate at which your data analytics processes are successful. Success can be measured both in terms of technical errors during analytics operations, as well as in the more general sense of failure to achieve meaningful insight from a dataset even if there were no technical hiccups during analysis. The main purpose of a data quality plan is to enable effective data analytics, so fewer analytics failures mean you are doing a good job on the data quality front.
Database entry problems
In cases where you are working with structured datasets, you can track the number of database entry problems that exist within the datasets. For example, you might use data quality tools to assess the number of missing or redundant database entries. A decrease in the number of such errors within raw datasets means you are doing a good job of achieving high data quality at the time of data collection — which is great because the fewer data quality problems you have to start with, the faster you can turn your data into value.
How long data quality tools take to analyze data
Another obvious way to assess your data quality strategy’s effectiveness is to track how long automated data quality tools take to complete their operations. Although the tools’ execution time can be affected by a number of factors that are unrelated to data quality, in general, higher-quality data can be processed more quickly.
How long it takes to migrate data
Data migration times can be a proxy for measuring data quality. The reason why is that low-quality data is harder to transform when you migrate it; your data transformation tools will struggle to work effectively with data that they encounter in unexpected formats, or that they cannot interpret because it lacks a consistent structure. Other factors can affect migration time, of course, such as disk I/O. But if you can control for those variables in your assessments, data migration time is a good way of measuring overall data quality.
How much data you are processing
Your ability to process ever-larger volumes of data is one reflection of your ability to maintain data quality. If you perform poorly on the data quality front, you are unlikely to be able to sustain a high volume of data processing and analytics.
How much your employees know about data quality
In addition to tracking the technical data points listed above, you might consider quizzing your employees periodically to ask how much they know about data quality and your data quality strategy. Can they define data quality and identify common data quality mistakes? Assessing this knowledge will help you measure how well your organization understands and adheres to your data quality strategy.
The frequency at which you analyze the metrics listed above will vary depending on your organization’s needs, of course. As a rule of thumb, it might make sense to perform an analysis every six to twelve months, although analyzing your data quality effectiveness on a more frequent basis could be helpful when you have just implemented a data quality plan for the first time.
Regardless of how often you assess your data quality strategy, your end goal should be to strive for continuous improvement. The importance of data quality, and the amount of data you have to process will only increase with time at most organizations. Continually improving your ability to maintain data quality will help keep you prepared for the data analytics requirements of the future.
How long can you live without your data? This article explains how to answer that question, which is essential for designing a data availability strategy capable of meeting your needs.
Data Is Like Hot Water
In a sense, data is like your home’s hot water heater or electrical service. When either of those things fail, it results in an immediate inconvenience. But at first, you’ll be able to tolerate the failure. A few hours without electricity or a day without hot water are usually acceptable to most people, provided the issue is resolved after that point.
If you were to go days without hot water or electricity, however, things would be different. Your ability to continue your normal lifestyle would break down. Most people can only go so long without taking a hot shower or running the fridge.
You can think of data availability in the same way. When a failure happens, your business is usually OK — at first. You’ll be able to continue normal operations for some amount of time — perhaps minutes, perhaps hours, perhaps days, depending on its needs — even without access to your data.
But sooner or later, the lack of data availability will become a problem for your business. Operations will break down. Customers will disappear. Contractual guarantees will go unfulfilled. Ultimately, your business could collapse.
In a perfect world, everyone would be able to achieve complete data availability, meaning that data would always be accessible and usable. But in the real world, of course, even the best-managed data infrastructure sometimes fails for various reasons.
In order to prepare for this eventuality, it’s important to assess how long your business can survive without access to its data before it suffers a serious interruption to its ability to operate normally.
Calculating Acceptable Data Downtime
When determining how long your business can tolerate unavailable data, consider the following questions:
How central is data to your business? These days, virtually all businesses depend on digital data to one extent or another. But data dependencies can vary widely. A doctor’s office probably can’t function long at all if it can’t access digital patient records, for example. But a restaurant could probably survive longer: It may depend on digital databases for things like payroll, but not hour-to-hour operations.
Do you have offline data backups? In other words, is there a way for you to access hard-copy versions of your data in the event that digital records become unavailable? Hard copies are less convenient to work with, but having them on hand can help increase your tolerance for a data availability disruption.
How does time of day impact your data needs? If you have a small business with local customers, you are probably in a better position to work through a data availability disruption than you are if you operate globally and need 24/7 access to data.
Do you have regulatory data requirements? If a regulatory framework like the GDPR applies to your company, you may have a legal mandate to guarantee certain levels of data availability. This decreases your ability to tolerate a disruption.
Data Availability Metrics
Answering the questions above will help you to calculate the following metrics, which lay the foundation for a data availability strategy:
RTO, or Recovery Time Objective, is a measure of the amount of time that your business can maintain normal operations without access to its data. It’s a direct reflection of your answers to the questions above.
MTTR, or Mean-Time-to-Repair, measures how quickly you can restore data to normal availability following a disruption. It reflects factors like how effectively your IT team can respond to an infrastructure failure and how quickly data can be moved from backup archives to production servers.
You want to ensure that your MTTR is shorter than your RTO. That way, you’ll be able to restore data following a failure before your business gets to the point where it can no longer keep operating.
Again, data is like hot water or electricity. Most folks can survive for some time without their data, their hot water or their power service. But the exact length of disruption that they can tolerate will vary.
If you really, really like your hot showers and lights, you may want to have a backup hot water tank and generator installed so that you won’t have to suffer any disruption in the event that your main infrastructure fails. Similarly, if you have very high data availability needs, you’ll want to have the proper data backup and restoration processes in place to ensure a very low MTTR.
A little planning will go far to prevent an unexpected disruption from turning into a catastrophe.
A while back, Syncsort’s VP of Data Integration R&D, Fernanda Tavares, wrote a blog about Overcoming Technical Challenges in Large Mainframe to Hadoop Implementations where she explains how Syncsort leads the way in helping enterprises leverage their mainframe data in Hadoop. This article expands on that topic by presenting a new feature that helps refresh the Hadoop data lake with data from growing mainframe data sets faster and more efficiently.
While more and more applications running on the mainframe have been using RDBMS like Db2 as a data repository, data set (non-RDBMS storage systems) repositories are still prevalent in both legacy and new mainframe applications. Given that data in data sets cannot always be queried and analyzed as quickly and easily as data in RDBMS systems, there is a growing need to keep these data sets in sync with a system where queries and analytics can be performed quickly, easily and at effective cost. Recognizing this need Syncsort has built DMX CDC, a Change Data Capture (CDC) add-on to its flagship Big Data integration tool, DMX-h. DMX CDC captures changes in near-real time data from IBM Db2 for z/OS and VSAM sources. The changes can then be applied to Hive and Impala, or stored in HDFS or the cloud in different file formats for further processing. In this article we will focus on the challenges with VSAM data on the mainframe and the solutions and benefits offered by Syncsort’s DMX CDC.
The challenges with VSAM #data on the #mainframe and the solutions offered by @Syncsort’s DMX CDC. Click To Tweet
VSAM, Virtual Storage Access Method, is undisputedly the most used data set type in mainframe enterprise applications. The term VSAM is used both as an access method and as a data set type. As an access method it provides a very efficient, high-performance and complex mechanism to manage records on disk. As a data set type it can exist in four different organization schemes also known as VSAM types:
Key Sequence Data Set (KSDS) the most commonly used type where records are indexed by keys and can be retrieved/inserted/update/deleted by key value.
Entry Sequence Data Set (ESDS) where records are kept in sequential order and accessed as such.
Relative Record Data Set (RRDS) where record numbers are used as keys to access records.
Linear Data Set (LSDS), a byte-stream data set in a traditional z/OS file. It is rarely used in applications.
Challenges with VSAM data
In addition to being commonly used as backend storage for a lot of enterprise mainframe batch applications, VSAM data sets are also very commonly used as backend storage for CICS applications. CICS (Customer information Control System) is IBM’s z/OS transaction processing subsystem that provides a transaction service for running applications online. CICS applications process tons of commercial transactions per day, including bank and ATM transactions. The amount of data landing to VSAM data sets is increasing more and more. However good storage backend capabilities VSAM data sets provide to mainframe applications, they also come with some challenges:
They still occupy precious disk storage on the mainframe; archiving them to tapes is not always a desired goal given the I/O speed of tapes.
They cannot easily be queried like data in RDBMS systems using a query language like SQL.
Applications running on the mainframe that perform analytics or report from VSAM data sets can consume precious CPU cycles and increase operational costs.
VSAM data sets are often not normalized. If records in a VSAM data set were to be migrated to a normalized database, dozens or even hundreds of tables would have to be created from one VSAM data set.
For the purpose of this article, all DMX CDC VSAM features and benefits discussed here also apply to IAM data sets. IAM, Innovation Access Method, is a reliable, high-performance indexed access method alternative to VSAM. It implements VSAM API and supports KSDKS, ESDS, RSDS and Alternative index. Like VSAM it can be updated by batch applications and CICS.
DMX CDC Change Data Capture (CDC) Solutions and Benefits.
DMX CDC can capture record changes made to VSAM data sets in near-real time whether they are managed by CICS or being updated by a batch application. Changes to CICS-managed VSAM data sets are always captured in real time. When changes are applied to VSAM data sets via batch applications CICS VR (VSAM Recovery) can be used to capture the changes in near-real time, otherwise the changes can be captured on demand using a diff utility tool provided as part of DMX CDC installation. In all cases DMX CDC uses the z/OS system logger logging facility to keep track of the changes. This works in the following way:
VSAM data sets are created or altered with LOGREPLICATE enabled. A log stream is assigned to the VSAM data set.
In CICS, assuming a VSAM-backed CICS application, all associated VSAM data sets have a name entry in the FCT (File Control Table) that can be up to 8 bytes long.
In such case, when CICS operates on a record, the log stream associated with the VSAM data set also logs the record and what happened to it using the FCT name as identifier of the source of the change.
Like in CICS case, LOGREPLICATE is enabled and a log stream is assigned to the VSAM data set.
When a batch application operates on a record, the log stream associated with the VSAM data set also logs the record and what happened to it using the DDNAME name as the identifier of the source of the change.
DMX CDC Diff Utility
Two VSAM data sets are required, a base data set and a changing data set.
The utility is run on the two data sets to compare them using a provided key. The differences are written to a specified log stream.
This approach works with most VSAM organization schemes.
Some of the benefits DMX CDC can offer to enterprises include:
Near-real time Replication of VSAM data sets to Hadoop data lakes.
Data can be kept unchanged in HDFS or the cloud, which can be archived or further processed.
Data can be loaded to Hive or Impala databases and kept in synch with the VSAM data on the mainframe.
Using the DMX-h capability to process complex COBOL copybook layouts, captured VSAM data set records can be cleaned, transformed on the fly and be made available in file formats like Tableau, Apache Avro, Apache Parquet, Hive, or Impala for analytics, data science, and machine learning processing.
Captured VSAM data can be pushed to streaming platforms like Kafka and MapR Streams
Reduce replication time and mainframe resources utilization by transferring only records that have been changed, instead of the full data set.
Avoid the cost and uncertainty of converting/migrating existing VSAM backed application to Db2 or other RDBMS systems for leveraging query-like RDBMS features.
For more reading on the topic of mainframe data to Hadoop, refer to these articles:
In today’s world of constant security threats and breaches, enhanced compliance regulations and data protection capabilities have become forefront for all organizations. Driven by compliance regulations such as FIPS, HIPPA and GDPR, organizations are challenging their IT departments to protect their data and digital assets with a vengeance never seen before. Syncsort’s new eBook, Data Encryption in the Mainframe World, explains some of the tools and technologies that continue to emerge on the IBM z platform.
Enhanced data protection for many z/OS data sets, zFS file systems, databases, and Coupling Facility structures give users the ability to encrypt data without making costly application changes. z/OS encryption policies make it possible to use encryption to protect critical datasets and files while helping ensure compliance with governmental regulations and industry mandates.
In this eBook, you’ll learn more about data encryption in the mainframe world and how z/OS encryption policies make it possible to use encryption to protect your organizations most critical datasets and files while helping to ensure compliance with governmental regulations and industry mandates.
If you’ve ever walked into a grocery store, you’ve probably seen employees stock shelves. What you may not know is that, in many cases, the people stocking those shelves aren’t employed by the grocery store, but rather by the vendor whose products they are stocking. And it makes sense: companies would sooner trust their own employees to handle their products properly than a busy and sometimes understaffed supermarket. This is how Terry Plath, Syncsorts’ Senior Vice President of Support and Services, sees managed services.
“People whose focus is on a particular solution, like HA/DR, really know it inside out,” says Plath. And that knowledge is the basis for Syncsorts’ managed services offering. Customers who use MIMIX® Availability or iTERA® Availability can trust Syncsort to handle the day-to-day management of their HA/DR environments. This takes stress off in-house staff while making sure the environment stays available and switch-ready. Customers can choose pre-packaged tiered options with additional a la carte features to personalize their experience, but one common element in managed services, according to Plath, is the knowledge of industry experts.
“We have a wide pool of experts with enough experience that there are no surprises in HA/DR anymore. They’ve seen it all.”
There are many managed services providers available, and picking the right one shouldn’t be left to chance. Plath provided four key tips for finding the provider that will work best with you:
Look for specialists
“As companies are looking to hand work off, you want to look for people who have the experience in the area you want,” says Plath, “A hosting service may not manage an HA/DR environment and even an HA/DR provider may not service your specific solution.” Plath advises customers to understand the difference between the level of service provided by an HA managed services provider like Syncsort and a general cloud or hosting provider in order to confirm whomever is managing your HA solution is giving you exactly what you need–not too much or too little.
Have a thorough service level agreement
Clients should make clear the exact expectations of their managed services offering. Plath even encourages SLAs include penalties for under-performance. When an entire IT environment is on the line, there is no room for guesswork or surprises.
Ensure a training and certification program is in place
The greatest benefit of using a managed services provider is access to often decades of industry expertise. And the best way of ensuring you get the proper expertise is by confirming that your service provide uses formal training and certification to find the best talent to manage your environment.
Make sure there isn’t a lot of turnover
The relationship your company forms with a managed services provider should last for years. So it’s best to avoid providers that suffer a great deal of shakeup within its ranks. Consistency in staff is the most direct path to consistency in IT management. And Terry Plath recommends the best way to see if your provider has a turnover problem is just to ask. Directness and openness will ensure that your professional relationship starts off on the right foot.
There are also misconceptions about managed services Mr. Plath would like to dispel. Chief among them is the fear that managed services is a way for companies to reduce jobs and outsource in-house labor. He notes that many of Syncsorts’ managed services customers are actually companies that need additional help performing critical tasks for their HA/DR environments.
“They have system administrators who are stretched way too thin. Their servers may not be ready for failover because HA/DR is only, say, 1/16 of their job–and sometimes that’s being generous.” More and more businesses are waking up to the fact that leanness can reach a point of diminishing returns. Insufficient staffing can occasionally lead to mistakes and long-term risk of disaster. Syncsort Managed Services looks to bridge that gap, offering a simple answer that keeps everyone on-task and satisfied.
“The fact of the matter is managed services is not going to take anyone’s job,” says Plath, “I actually look at this as job security because you know there will be no surprises with your HA solution.”
The General Data Protection Regulation (GDPR) compliance deadline of May 25, 2018 has passed, but many organizations are still grappling with the data governance challenges it has created. Whether your organization conducts most, some or just a small amount of business in Europe, there are many aspects of data management you need to consider to comply with the regulation. Here’s some great sources of information from industry experts to help bring you up-to-speed.
In our expert interview series with Paige Bartley, Senior Analyst for Data and Enterprise Intelligence at Ovum, she explains that some organizations may not reach the May 25th deadline. These will likely be smaller organizations, often based outside of Europe, that have a minority of their customers or employees based in the EU. They will have to adhere to the guidelines listed such as the documentation of processes, the correction of false data, and the transfer and ownership of data just to name a few.
Data lineage, data quality, and data availability also are inherently linked to the GDPR and play a large part in compliance.
Data Lineage is needed for the records of processing activities of personal data. This can account for how the data was handled, who handled it, and where it was handled.
Good data quality will help in GDPR compliance initiatives because it means that data subjects will have less to correct incorrect data. Data quality is both a driver of compliance as well as a product of it.
Data availability is cited directly in GDPR as part of Article 32’s requirement guidelines for the Security of Processing of personal data. High availability of systems, while not absolutely mandated, is highly encouraged for GDPR compliance.
Why the GDPR Matters Outside the EU
The GDPR applies not only to organizations that are based in Europe, but also to those that collect personal data from E.U. citizens who are located within the E.U., even if the company itself is not in the E.U. What this means in practice is that if you have, say, a website form that collects the personal information of visitors, and some of the people who fill it out are E.U. citizens who are located in the E.U. at the time that they fill out the form, that data could be subject to GDPR regulation. Similarly, if you partner with an organization that collects data from E.U. citizens, and some of that data is shared with you or otherwise comes under your ownership, the GDPR may also apply to the data.
Another reason why the GDPR matters outside of the E.U., and why it is a good idea to start planning for compliance now, is that the regulation may inspire similar frameworks in other jurisdictions in the future.
If you have a mainframe, the GDPR data management requirements may apply to it, even if the mainframe is not inside the European Union. If your company has any kind of presence in Europe, you may need to bring your mainframe data management practices up to speed with the GDPR, along with those of the rest of your infrastructure.
In a post on GDPR Compliance for the Mainframe we gave some key areas that organizations should focus on when becoming GDPR compliant: data erasure, data sovereignty, timely data recovery, data pseudonymization, and data encryption. This isn’t a full list for GDPR Compliance but it’s a great place to start.
GDPR and Machine Learning
GDPR Compliance is also changing the way that organizations approach machine learning.
Katharine Jarmul, founder of KJamistan data science consultancy, stated that GDPR compliance changes a few of the ways that organizations have to inform users about automated of their data. Organizations will want to take note of their current notification process and make changes accordingly. What GDPR gives people is the motivation to get started on that.
We’ve released a series of short webcasts in an effort to inform people of the importance in GDPR compliance. Check out these three great videos which focus primarily on Data Quality, Capacity Management, and IBM i Security.
If you want to learn more about GDPR compliance and how Syncsort can help, be sure to read our eBook on Data Quality-Driven GDPR.
Data is a crucial driver of value for most businesses today. But when managed improperly, data can become more of a liability than a benefit. Keep reading for tips on creating a data management strategy that helps, rather than hurts, your company.
No matter which industry your business is in, it almost certainly relies on data to a significant extent. These days, you don’t have to be a big bank or an insurance company to live and breathe data.
You could also be, for example, a fast-food restaurant that relies on data-driven systems for ordering supplies and paying employees. You could be a construction contractor who uses data to help price bids. You could be an ice cream parlor that collects and analyzes data to determine which flavors to offer.
In all of these examples, data plays a key role in keeping the business operating normally and profitably — assuming that the data is well managed. Poorly managed data can not only undercut your ability to drive meaningful insights, it can also create extra costs and legal risks that lead to threats to your business’s overall stability.
When that happens, your data ceases to be a boon for your business, and it becomes a danger instead.
Building a Healthy Data Management Strategy
That is why crafting a data management strategy that controls for the risks associated with data at your business is essential for using data wisely.
A data management strategy is your overall plan for storing, transforming, integrating, analyzing and archiving the data that your business collects. It applies to all types of data: Machine data from IT infrastructure, manually collected data like customer records, sales transaction data and much more.
When designing a data management strategy, you’ll want to take the following factors into account in order to ensure that your strategy is efficient and appropriate for your business needs:
Compliance. Increasingly, compliance frameworks like the GDPR are imposing government regulations on the way data must be stored, managed and secured. It’s crucial to identify which regulatory requirements apply to your company based on its location and industry and make sure that your data management strategy can meet those requirements.
SLAs. Service Level Agreements, or SLAs, are contractual guarantees that mandate that you provide a certain level of availability for your applications or services. SLAs typically involve more than just data availability, but because data availability is an important part of overall availability, you want to ensure that your data management strategy allows you to meet any SLA guarantees that you provide. SLAs are especially important if your business is the type that has a lot of external contracts and customers; they may be less important if your applications and infrastructure are used only internally.
Cost. The portion of your budget that you devote to data management will vary widely depending on factors such as the size of your infrastructure and whether you use commercial or open source data tools. In any case, however, keeping data management costs in check is important for protecting the overall financial health of your business. To control data management costs, consider questions such as how much it costs you to back up data, and how many backups you can afford without breaking the budget, for example. For another example, think about the time spent transforming data before it is usable, and whether decreasing that time using automated data transformation tools can help reduce your overall data management costs.
A data management strategy that is designed to address these needs will help to ensure that data continues to drive business value, rather than creating unnecessary business risks. This goal will only become more important as the amount of data your business collects and analyzes continues to grow.
In a big data environment, the notion of data quality that is “fit for purpose” is important. For some types of data science and analytics, raw, messy data is exactly what users want. Yet, even in this case, users need to know the data’s flaws and inconsistencies so that the unexpected insights they seek are based on knowledge, not ignorance. Syncsort’s new eBook, Strategies for Improving Big Data Quality for BI and Analytics, takes a look at applying data quality methods and technologies to big data challenges that fit an organization’s objectives.
As organizations grow dependent on the data they have stored in their big data repositories, or in the cloud, for a wider range of businesses decisions, they need data quality management to improve the data so that it is fit for each desired purpose.
Our TDWI checklist report offers six strategies for improving on big data quality:
Design big data quality strategies that are fit for each purpose
Focus on the most important data quality objectives for your requirements
Perform data quality processes natively on cloud and big data platforms
Reduce analytics and BI delays by applying flexible data quality processes
Provide data lineage tracking as part of data quality processes
Use data quality to improve governance and regulatory compliance