Loading...

Follow Gigaom » Big Data on Feedspot

Continue with Google
Continue with Facebook
or

Valid

This free 1-hour webinar from GigaOm Research brings experts in AI and data analytics, featuring GigaOm analyst William McKnight and a special guest from Microsoft. The discussion will focus on the promise AI holds for organizations in every industry and every size, and how to overcome some of the challenge today of how to prepare for AI in the organization and how to plan AI applications.

The foundation for AI is data. You must have enough data to analyze to build models. Your data determines the depth of AI you can achieve — for example, statistical modeling, machine learning, or deep learning — and its accuracy. The increased availability of data is the single biggest contributor to the uptake in AI where it is thriving. Indeed, data’s highest use in the organization soon will be training algorithms. AI is providing a powerful foundation for impending competitive advantage and business disruption.

In this 1-hour webinar, you will discover:

  • AI’s impending effect on the world
  • Data’s new highest use: training AI algorithms
  • Know & change behavior
  • Data collection
  • Corporate Skill Requirements

You’ll learn how organizations need to be thinking about AI and the data for AI.

Register now to join GigaOm and Microsoft for this free expert webinar.

Who Should Attend:

  • CIOs
  • CTOs
  • CDOs
  • Business Analysts
  • Data Analysts
  • Data Engineers
  • Data Scientists
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

GDPR day has come and gone, and the world is still turning, just about. Some remarked that it was like the Y2K day we never had; whereas the latter’s impact was a somewhat damp squib, the former has caused more of a kerfuffle: however much the authorities might say, “It’s not about you,” it has turned out that it is about just about everyone in a job, for better or worse.

I like the thinking behind GDPR. The notion that your data was something that could be harvested, processed, bought and sold, without you having a say in the matter, was imbalanced to say the least. Data monetisers have been good at following the letter of the law whilst ignoring its spirit, which is why its newly expressed spirit — of non-ambiguous clarity and agreement — that is so powerful.

Meanwhile, I don’t really have a problem with the principle of advertising. A cocktail menu in a bar could be seen as context-driven, targeted marketing, and rightly so as the chances are the people in the bar are going to be on the look-out for a cocktail. The old adage of 50% of advertising being wasted (but nobody knows which 50%) helps no-one so, sure, let’s work together on improving its accuracy.

The challenge, however, comes from the nature of our regulatory processes. GDPR has been created across a long period of time, by a set of international committees with all of our best interests at heart. The resulting process is not only slow but also and inevitably, a compromise based on past use of technology. Note that even as the Cambridge Analytica scandal still looms, Facebook’s position remains that it acted within the law.

Even now, our beloved corporations are looking to how they can work within the law and yet continue to follow the prevailing mantra of the day, which is how to monetise data. This notion has taken a bit of a hit, largely as now businesses need to be much clearer about what they are doing with it. “We will be selling your information” doesn’t quite have the same innocuous ring as “We share data with partners.”

To achieve this, most attention is on what GDPR doesn’t cover, notably around personal identifiable information (PII). In layperson’s terms, if I cannot tell who the specific person is that I am marketing to, then I am in the clear. I might still know that the ‘target’ is a left-leaning white male, aged 45-55, living in the UK, with a  propensity for jazz, an iPhone 6 and a short political fuse, and all manner of other details. But nope, no name and email address, no pack-drill.

Or indeed, I might be able to exchange obfuscated details about a person with another provider (such as Facebook again), which happen to match similarly obfuscated details — a mechanism known as hashing. As long as I am not exchanging PII, again, I am not in breach of GDPR. Which is all well and good apart from the fact that it just shows how advertisers don’t need to know who I am in order to personalise their promotions to me specifically.

As I say, I don’t really have a problem with advertising done right (I doubt many people do): indeed, the day on which sloppy retargeting can be consigned to the past (offering travel insurance once one has returned home, for example) cannot come too soon. However I do have a concern, that the regulation we are all finding so onerous is not actually achieving one of its central goals.

What can be done about this? I think the answer lies in renewing the contractual relationship between supplier and consumer, not in terms of non-ambiguity over corporate use of data, but to recognise the role of consumer as a data supplier. Essentially, if you want to market to me, then you can pay for it — and if you do, I’m prepared to help you focus on what I actually want.

We are already seeing these conversations start to emerge. Consider the recent story about a man selling his Facebook data on eBay; meanwhile at a recent startup event I attended, an organisation was asked about how a customer could choose to reveal certain aspects of their lifestyle, to achieve lower insurance premiums.

And let’s not forget AI. I’d personally love to be represented by a bot that could assess my data privately, compare it to what was available publicly, then perhaps do some outreach on my behalf. Remind me that I needed travel insurance, find the best deal and print off a contract without me having to fall on the goodwill of the corporate masses.

What all of this needs is the idea that individuals are not simply hapless pawns to be protected (from where comes the whole notion of privacy), but active participants in an increasingly algorithmic game. Sure, we need legislation against the hucksters and tricksters, plus continued enforcement of the balance between provider and consumer which is still tipped strongly towards “network economy” companies.

But without a recognition that individuals are data creators, whose interests extend beyond simple privacy rights, regulation will only become more onerous for all sides, without necessarily delivering the benefits they were set out to achieve.

P.S. Cocktail, anyone? Mine’s a John Collins.

Follow @jonno on Twitter.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Enterprises everywhere are on a quest to use their data efficiently and innovatively, and to maximum advantage, both in terms of operations and competitiveness. The advantages of doing so are taken on authority and reasonably so. Analyzing your data helps you better understand how your business actually runs.

Such insights can help you see where things can improve, and can help you make instantaneous decisions when required by emergent situations. You can even use your data to build predictive models that help you forecast operations and revenue, and, when applied correctly, these models can be used to prescribe actions and strategy in advance.

That today’s technology allows business to do this is exciting and inspiring. Once such practice becomes widespread, we’ll have trouble believing that our planning and decision-making wasn’t data-driven in the first place.

Bumps in the Road

While the analytics software has become so powerful, enterprise data integration needed to exploit that power has become much harder.

But we need to be cautious here. Even though the technical breakthroughs we’ve had are impressive and truly transformative, there are some dependencies – prerequisites – that must be met in order for these analytics technologies to work properly. If we get too far ahead of those requirements, then we will not succeed in our initiatives to extract business insights from data.

The dependencies concern the collection, the cleanliness, and the thoughtful integration of the organization’s data within the analytics layer. And, in an unfortunate irony, while analytics software has become so powerful, the integration work that’s needed to exploit that power has become more difficult.

From Consolidated to Dispersed

The reason for this added difficulty is the fragmentation and distribution of an organization’s data. Enterprise software, for the most part, used to run on-premises and much of its functionality was consolidated into a relatively small stable of applications, many of which shared the same database platform. Integrating the databases was a manageable process if proper time and resources were allocated.

But with so much enterprise software functionality now available through Software as a Service (SaaS) offerings in the cloud, bits and pieces of an enterprise’s data are now dispersed through different cloud environments on a variety of platforms. Pulling all of this data together is a unique exercise for each of these cloud applications, multiplying the required integration work many times over.

Even on-premises, the world of data has become complex. The database world was once dominated by three major relational database management system (RDBMS) products, but that’s no longer the case. Now, in addition to the three commercial majors, two open source RDBMSs have joined them in Enterprise popularity and adoption. And beyond the RDBMS world, various NoSQL databases and Big Data systems, like MongoDB and Hadoop, have joined the on-premises data fray.

A Way Forward

A major question emerges. As this data fragmentation is not merely an exception or temporary inconvenience, but rather the new normal, is there a way to approach it holistically? Can enterprises that must solve the issue of data dispersal and fragmentation at least have a unified approach to connecting to, integrating, and querying that data? While an ad hoc approach to integrating data one source at a time can eventually work, it’s a very expensive and slow way to go, and yields solutions that are very brittle.

In this report, we will explore the role of application programming interfaces (APIs) in pursuing the comprehensive data integration that is required to bring about a data-driven organization and culture. We’ll discuss the history of conventional APIs, and the web standards that most APIs use today. We’ll then explore how APIs and the metaphor of a database with tables, rows, and columns can be combined to create a new kind of API. And we’ll see how this new type of API scales across an array of data sources and is more easily accessible than older API types, by developers and analysts alike.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Many companies in the corporate world have attempted to set up their first data lake.  Maybe they bought a Hadoop distribution, and perhaps they spent significant time, money and effort connecting their CRM, HR, ERP and marketing systems to it.  And now that these companies have well-crafted, centralized data repositories, in many cases…they just sit there.

But maybe data lakes fall into disuse because they’re not being looked at for what they are.  Most companies see data lakes as auxiliary data warehouses.  And, sure, you can use any number of query technologies against the data in your lake to gain business insights.  But consider that data lakes can – and should – also serve as the foundation for operational, real-time corporate applications that embed AI and predictive analytics.

These two uses of data lakes — for (a) operational applications as well as for (b) insights and predictive analysis — aren’t mutually exclusive, either. With the right architecture, one can dovetail gracefully into the other.  But what database technologies can query and analyze, build machine learning models, and power microservices and applications directly on the data lake?

Join us for this free 1-hour webinar from GigaOm Research.  The Webinar features GigaOm analyst Andrew Brust, and Splice Machine CEO and Co-Founder, Monte Zweben. The discussion will explore how to leverage data lakes as the underpinning of application platforms, driving efficient operations, and predictive analytics that support real-time decisions.

In this 1-hour webinar, you will discover:

  • Why data latency is the enemy and data currency is key to digital transformation success
  • Why operational database workloads, analytics and construction of predictive models should not be segregated activities
  • How operational databases can support continually trained predictive models

Register now to join GigaOm Research and Splice Machine for this free expert webinar.

Who Should Attend:

  • CIOs
  • CTOs
  • Chief Data Officers
  • Digital Transformation Facilitators
  • Application Developers
  • Business Analysts
  • Data Engineers
  • Data Scientists
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Once data is under management in its best-fit leveragable platform in an organization, it is as prepared as it can be to serve its many callings. It is in position to be used for purposes operationally and analytically and across the spectrum of need. Ideas emerge from business areas no longer encumbered with the burden of managing data, which can be 60% – 70% of the effort to bring the idea to reality. Walls of distrust in data come down and the organization can truly excel with an important barrier to success removed.

An important goal of the information management function in an organization is to get all data under management by this definition, and to keep it under management as systems come and go over time.

Master Data Management (MDM) is one of these key leveragable platforms. It is the elegant place for data with widespread use in the organization. It becomes the system of record for customer, product, store, material, reference and all other non-transactional data. MDM data can be accessed directly from the hub or, more commonly, mapped and distributed widely throughout the organization. This use of MDM data does not even account for the significant MDM benefit of efficiently creating and curating master data to begin with.

MDM benefits are many, including hierarchy management, data quality, data governance/workflow, data curation, and data distribution. One overlooked benefit is just having a database where trusted data can be accessed. Like any data for access, the visualization aspect of this is important. With MDM data having a strong associative quality to it, the graph representation works quite well.

Graph traversals are a natural way for analyzing network patterns. Graphs can handle high degrees of separation with ease and facilitate visualization and exploration of networks and hierarchies. Graph databases themselves are no substitute for MDM as they provide only one of the many necessary functions that an MDM tool does. However, when graph technology is embedded within MDM, such as what IBM is doing in InfoSphere MDM – similar to AI (link) and blockchain (link) – it is very powerful.

Graph technology is one of the many ways to facilitate self-service to MDM. Long a goal of business intelligence, self-service has significant applicability to MDM as well. Self-service is opportunity oriented. Users may want to validate a hypothesis, experiment, innovate, etc. Long development cycles or laborious process between a user and the data can be frustrating.

Historically, the burden for all MDM functions has fallen squarely on a centralized, development function. It’s overloaded and, as with the self-service business intelligence movement, needs disintermediation. IBM is fundamentally changing this dynamic with the next release of Infosphere MDM. Its self-service data import, matching, and lightweight analytics allows the business user to find, share and get insight from both MDM and other data.

Then there’s Big Match. Big Match can analyze structured and unstructured customer data together to gain deeper customer insights. It can enable fast, efficient linking of data from multiple sources to grow and curate customer information. The majority of the information in your organization that is not under management is unstructured data. Unstructured data has always been a valuable asset to organizations, but it can be difficult to manage. Emails, documents, medical records, contracts, design specifications, legal agreements, advertisements, delivery instructions, and other text-based sources of information do not fit neatly into tabular relational databases. Most BI tools on MDM data offer the ability to drill down and roll up data in reports and dashboards, which is good. But what about the ability to “walk sideways” across data sources to discover how different parts of the business interrelate?

Using unstructured data for customer profiling allows organizations to unify diverse data from inside and outside the enterprise—even the “ugly” stuff; that is, dirty data that is incompatible with highly structured, fact-dimension data that would have been too costly to combine using traditional integration and ETL methods.

Finally, unstructured data management enables text analytics, so that organizations can gain insight into customer sentiment, competitive trends, current news trends, and other critical business information. In text analytics, everything is fair game for consideration, including customer complaints, product reviews from the web, call center transcripts, medical records, and comment/note fields in an operational system. Combining unstructured data with artificial intelligence and natural language processing can extract new attributes and facts for entities such as people, location, and sentiment from text, which can then be used to enrich the analytic experience.

All of these uses and capabilities are enhanced if they can be provided using a self-service interface that users can easily leverage to enrich data from within their apps and sources. This opens up a whole new world for discovery.

With graph technology, distribution of the publishing function and the integration of al data including unstructured data, MDM can truly have important data under management, empower the business user, be the cornerstone to digital transformation and truly be self-service.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

In a normal master data management (MDM) project, a current state business process flow is built, followed by a future state business process flow that incorporates master data management. The current state is usually ugly as it has been built piecemeal over time and represents something so onerous that the company is finally willing to do something about it and inject master data management into the process. Many obvious improvements to process come out of this exercise and the future state is usually quite streamlined, which is one of the benefits of MDM.

I present today that these future state processes are seldom as optimized as they could be.

Consider the following snippet, supposedly part of an optimized future state.

This leaves in the process four people to manually look at the product, do their (unspecified) thing and (hopefully) pass it along, but possibly send it backwards to an upstream participant based on nothing evident in particular.

The challenge for MDM is to optimize the flow. I suggest that many of the “approval jails” in business process workflow are ripe for reengineering. What criteria is used? It’s probably based on data that will now be in MDM. If training data for machine learning (ML) is available, not only can we recreate past decisions to automate future decisions, we can look at the results of those decisions and take past outcomes and actually create decisions in the process that should have been made and actually do them, speeding up the flow and improving the quality by an order of magnitude.

This concept of thinking ahead and automating decisions extends to other kinds of steps in a business flow that involve data entry, including survivorship determination. As with acceptance & rejection, data entry is also highly predictable, whether it is a selection from a drop-down or free-form entry. Again, with training data and backtesting, probable contributions at that step can be manifested and either automatically entered or provided as default for approval. The latter approach can be used while growing a comfort level.

Manual, human-scale processes, are ripe for the picking and it’s really a dereliction of duty to “do” MDM without significantly streamlining processes, much of which is done by eliminating the manual. As data volumes mount, it is often the only way to not watch process time increase over time. At the least, prioritizing stewardship activities or routing activities to specific stewards based on an ML interpretation of past results (quality, quantity) is required. This approach is paramount to having timely, data-infused processes.

As a modular and scalable trusted analytics foundational element, the IBM Unified Governance & Integration platform incorporates advanced machine learning capabilities into MDM processes, simplifying the user experience and adding cognitive capabilities.

Machine learning can also discover master data by looking at actual usage patterns. ML can source, suggest or utilize external data that would aid in the goal of business processes. Another important part of MDM is data quality (DQ). ML’s ability to recommend and/or apply DQ to data, in or out of MDM, is coming on strong. Name-identity reconciliation is a specific example but generally ML can look downstream of processes to see the chaos created by data lacking full DQ and start applying the rules to the data upstream.

IBM InfoSphere Master Data Management utilizes machine learning to speed the data discovery, mapping, quality and import processes.

In the last post (link), I postulated that blockchain would impact MDM tremendously. In this post, it’s machine learning affecting MDM. (Don’t get me started on graph technology). Welcome to the new center of the data universe. MDM is about to undergo a revolution. Products will look much different in 5 years. Make sure your vendor is committed to the MDM journey with machine learning.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Master Data Management (MDM) is an approach to the management of golden records that has been around over a decade only to find a growth spurt lately as some organizations are exceeding pain thresholds in the management of common data. Blockchain has a slightly shorter history, coming aboard with bitcoin, but also is seeing its revolution these days as data gets distributed far and wide and trust has taken center stage in business relationships.  

Volumes could be written about each on its own, and given that most organizations still have a way to go with each discipline, that might be appropriate. However, good ideas wait for no one and today’s idea is MDM on Blockchain.

Thinking back over our MDM implementations over the years, it is easy to see the data distribution network becoming wider. As a matter of fact, master data distribution is usually the most time-intensive and unwieldy part of an MDM implementation anymore. The blockchain removes overhead, costs and unreliability from authenticated peer-to-peer network partner transactions involving data exchange. It can support one of the big challenges of MDM with governed, bi-directional synchronization of master data between the blockchain and enterprise MDM.

Another core MDM challenge is arriving at the “single version of the truth”. It’s elusive even with MDM because everyone must tacitly agree to the process used to instantiate the data in the first place. While many MDM practitioners go to great lengths to utilize the data rules from a data governance process, it is still a process subject to criticism. The consensus that blockchain can achieve is a governance proxy for that elusive “single version of the truth” by achieving group consensus for trust as well as full lineage of data.

Blockchain enables the major components and tackles the major challenges in MDM.

Blockchain provides a distributed database, as opposed to a centralized hub, that can store data that is certified, and for perpetuity. By storing timestamped and linked blocks, the blockchain is unalterable and permanent. Though not for low latency transactions yet, transactions involving master data, such as financial settlements, are ideal for blockchain and can be sped up by an order of magnitude since blockchain removes the grist in a normal process.

Blockchain uses pre-defined rules that act as gatekeepers of data quality and governs the way in which data is utilized. Blockchains can be deployed publicly (like bitcoin) or internally (like an implementation of Hyperledger). There could be a blockchain per subject area (like customer or product) in the implementation. MDM will begin by utilizing these internal blockchain networks, also known as Distributed Ledger Technology, though utilization of public blockchains are inevitable.

A shared master data ledger beyond company boundaries can, for example, contain common and agreed master data including public company information and common contract clauses with only counterparties able to see the content and destination of the communication.

Hyperledger is quickly becoming the standard for open source blockchain. Hyperledger is hosted by The Linux Foundation. IBM, with the Hyperledger Fabric, is establishing the framework for blockchain in the enterprise. Supporting master data management with a programmable interface for confidential transactions over a permissioned network is becoming a key inflection point for blockchain and Hyperledger.

Data management is about right data at the right time and master data is fundamental to great data management, which is why centralized approaches like the discipline of master data management has taken center stage. MDM can utilize the blockchain for distribution and governance and blockchain can clearly utilize the great master data produced by MDM. Blockchain data needs data governance like any data. This data actually needs it more given its importance on the network.

MDM and blockchain are going to be intertwined now. It enables the key components of establishing and distributing the single version of the truth of data. Blockchain enables trusted, governed data. It integrates this data across broad networks. It prevents duplication and provides data lineage.

It will start in MDM in niches that demand these traits such as financial, insurance and government data. You can get to know the customer better with native fuzzy search and matching in the blockchain. You can track provenance, ownership, relationship and lineage of assets, do trade/channel finance and post-trade reconciliation/settlement.

Blockchain is now a disruption vector for MDM. MDM vendors need to be at least blockchain-aware today, creating the ability for blockchain integration in the near future, such as what IBM InfoSphere Master Data Management is doing this year. Others will lose ground.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Streaming and real-time data has high business value, but that value can rapidly decay if not processed quickly. If the value of the data is not realized in a certain window of time, its value is lost and the decision or action that was needed as a result never occurs. Streaming data – whether from sensors, devices, applications, or events – needs special attention because a sudden price change, a critical threshold met, a sensor reading changing rapidly, or a blip in a log file can all be of immense value, but only if the alert is in time.

In this webinar, we will review the landscape of streaming data and message queueing technology and introduce and demonstrate a method for an organization to assess and benchmark—for their own current and future uses and workloads—the technologies currently available. We will also reveal the results of our own execution of the OpenMessaging benchmark on workloads for two of the platforms: Apache Kafka and Apache Pulsar.

What Will Be Discussed:

  • The Evolution of Queuing, Messaging, and Streaming
  • Today’s Technology Landscape
  • Assessing Performance: The OpenMessaging Benchmark
  • Considerations for Your Evaluation

Register now and join Gigaom Research and our sponsor Streamlio for this free expert webinar.

Who Should Attend:

  • Enterprise developers
  • CIOs, purchasers and recommenders of data platforms
  • Data engineers and solution architects
  • Developer managers
  • IT decision makers
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

This free 1-hour webinar from GigaOm Research brings together leading minds in cloud data analytics, featuring GigaOm analyst Andrew Brust, joined by guests from cloud big data platform pioneer Qubole and cloud data warehouse juggernaut Snowflake Computing. The roundtable discussion will focus on enabling Enterprise ML and AI by bringing together data from different platforms, with efficiency and common sense.

In this 1-hour webinar, you will discover:

  • How the elasticity and storage economics of the cloud have made AI, ML and data analytics on high-volume data feasible, using a variety of technologies.
  • That the key to success in this new world of analytics is integrating platforms, so they can work together and share data
  • How this enables building accurate, business-critical machine leaning models and produces the data-driven insights that customers need and the industry has promised

You’ll learn how to make the lake, the warehouse, ML and AI technologies and the cloud work together, technically and strategically.

Register now to join GigaOm Research, Qubole and Snowflake for this free expert webinar.

Who Should Attend:

  • CIOs
  • CTOs
  • CDOs
  • Business Analysts
  • Data Analysts
  • Data Engineers
  • Data Scientists
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Many conversations around GDPR seem to follow a similar sequence as Dave Lister’s experience in the opening episode of Red Dwarf.

Holly: They’re all dead. Everybody’s dead, Dave.

Lister: Peterson isn’t, is he?

Holly: Everybody’s dead, Dave!

Lister: Not Chen!

Holly: Gordon Bennett! Yes, Chen. Everyone. Everybody’s dead, Dave!

Lister: Rimmer?

Holly: He’s dead, Dave. Everybody is dead. Everybody is dead, Dave.

Lister: Wait. Are you trying to tell me everybody’s dead?

So, yes, GDPR affects all kinds of data. Big data, small data, structured and unstructured data, online and offline, backup and archive, open or grey, digital or paper-based data. It’s all data, and therefore GDPR applies to it.

This simultaneously makes the task of GDPR compliance very easy, and very difficult. Easy, because decision makers don’t have to worry about what data is involved. And very difficult, because few organizations have a clear handle on what data is stored where. That filing cabinet in the back of a warehouse, the stack of old tapes on top of a cupboard, that rack of servers which were turned off… yeah, all of them.

Because that’s not the focus of GDPR, you know, the technology gubbins, complexity and all that. The regulation quite deliberately focuses on personally identifiable information and its potential impact on people, rather than worrying about the particular ramifications of this or that historical solution, process or lack of one.

At the same time, this does suggest quite a challenge. “But I don’t know what I have!” is a fair response, even if it is tinged with an element panic. Here’s some other good news however — laws around data protection, discovery, disclosure and so on never distinguished between the media upon which data was stored, nor its location.

You were always liable, and still are. The difference is that we now have a more consistent framework (which means less loopholes), a likelihood of stronger enforcement and indeed, potentially bigger fines. To whit, one conversation I had with a local business: “So, this is all stuff we should have been doing anyway?” Indeed.

Of course, this doesn’t make it any easier. It is unsurprising that technology companies and consulting firms, legal advisors and other third parties are lining up to help us all deal with the situation: supply is created by, and is doing its level best to catalyse, demand. Search and information management tools vendors are making hay, and frankly, rightly so if they solve a problem.

If I had one criticism however, it is that standard IT vendor and consulting trick of only asking the questions they can answer. When you have a hammer, all the world is a nail, goes the adage. Even a nail-filled world may seem attractive for purveyors of fine hammers, they should still be asking to what purpose the nails are to be used.

To whit for example, KPMG’s quick scan of unstructured data to identify (say) credit card numbers. Sure, it may serve a purpose. But the rhetoric — “Complete coverage, get in control over unstructured data on premises and in the cloud.” implies that a single piece of (no doubt clever) pattern matching software can somehow solve a goodly element of your GDPR woes.

As I have written before, if you want to get there, don’t start from the place which looks at data and says “Is this bit OK? What about this bit?” A better starting point is the regulation, its rules around the kinds of data you can process and why, as documented by the Information Commissioner’s Office (ICO).  The “lawful bases” offer a great deal of clarity, and start discussions from the right point.

Mapping an understanding of what you want to do with data, against what data you need, is not cause for concern. In the vast majority of cases, this is no different to what you would do when developing an information management strategy, undertaking a process modelling exercise, or otherwise understanding what you need to do business efficiently and effectively.

The thing GDPR rules out is use of personal data people didn’t want you to have, to fulfil purposes they didn’t want you to achieve. For example, use of ‘cold lists’ by direct marketing agencies may become more trouble than it is worth — both the agency, and the organization contracting them, become culpable. Equally, selling someone’s data against their will. That sort of thing.

But meanwhile, if you were thinking of harvesting maximum amounts of data about, well, anybody, because you were thinking you could be monetizing or otherwise leveraging it, or you were buying data from others and looking to use it to sell people things, goods or services, you should probably look for other ways to make money that are less, ahm, exploitative.

But if you have concerns about GDPR, and you are ‘just’ a traditional business doing traditional kinds of things, you have an opportunity to revisit your information management strategy, policies and so on. If these are out of date, chances are your business is running less efficiently than it could be so, how about spending to save, building in compliance in the process?

Across the board right now, you can get up to speed with what GDPR means for the kind of business you run, using the free helplines the regulators (such as the ICO) offer. If you are concerned, speak to a lawyer. And indeed, talk to vendors and consulting firms about how they are helping their customers, but be aware that their perspective will link to the solutions they offer.

Thank you to Criteo and Veritas, whose briefings and articles were very useful background when writing this article. As an online display advertising firm, Criteo is keenly aware of questions around personal vs pseudonymous data, as well as the legal bases for processing. Veritas offers solutions for analysis of unstructured data sources, and has GDPR modules and methodologies available.

Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview