Deep learning is a form of machine learning that models patterns in data as complex, multi-layered networks. Because deep learning is the most general way to model a problem, it has the potential to solve difficult problems—such as computer vision and natural language processing—that outstrip both conventional programming and other machine learning techniques.
Deep learning not only can produce useful results where other methods fail, but also can build more accurate models than other methods, and can reduce the time needed to build a useful model. However, training deep learning models requires a great deal of computing power. Another drawback to deep learning is the difficulty of interpreting deep learning models.
Data and big data analytics are the lifeblood of any successful business. Getting the technology right can be challenging but building the right team with the right skills to undertake big data initiatives can be even harder.
Successfully deploying big data initiatives requires more than data scientists and data analysts. It requires data architects who design the "blueprint" for your enterprise data management framework, and it requires data engineers who can build that framework and the data pipelines to bring in, process, and create business value out of data.
The financial services sector is pouring money into artificial intelligence (AI), with banks, for example, expected to spend $5.6 billion on AI in 2019 – second only to the retail sector.
Until now, the vast majority of AI projects have remained pilots, and in many cases those projects led to tech deployments without a clear business use.
Simply put, it's been trendy.
Most AI projects today are aimed at improving customer service efficiency and security by introducing chatbot technology, or by deploying machine-based learning to uncover trends across business lines in customer behavior and what they need.
Databricks, the company founded by the original developers of Apache Spark, has released Delta Lake, an open source storage layer for Spark that provides ACID transactions and other data-management functions for machine learning and other big data work.
Many kinds of data work need features like ACID transactions or schema enforcement for consistency, metadata management for security, and the ability to work with discrete versions of data. Features like those don’t come standard with every data source out there, so Delta Lake provides those features for any Spark DataFrame data source.
Two of IBM’s Watson-branded collection of machine-intelligence services will be available to run as standalone applications in the public or private cloud of your choice. IBM is delivering these local Watson services atop IBM Cloud Private for Data, a combined analytics and data governance platform that can be deployed on Kubernetes.
Ruchir Puri, CTO and chief architect for IBM Watson, said this was driven by customer demand for machine learning solutions that could be run where customer data already resides, typically a multicloud or hybrid cloud environment (see related interview).
In the first half of this JavaWorld introduction to Apache Kafka, you developed a couple of small-scale producer/consumer applications using Kafka. From these exercises you should be familiar with the basics of the Apache Kafka messaging system. In this second half, you'll learn how to use partitions to distribute load and scale your application horizontally, handling up to millions of messages per day. You'll also learn how Kafka uses message offsets to track and manage complex message processing, and how to protect your Apache Kafka messaging system against failure should a consumer go down. We'll develop the example application from Part 1 for both publish-subscribe and point-to-point use cases.
When the big data movement started it was mostly focused on batch processing. Distributed data storage and querying tools like MapReduce, Hive, and Pig were all designed to process data in batches rather than continuously. Businesses would run multiple jobs every night to extract data from a database, then analyze, transform, and eventually store the data. More recently enterprises have discovered the power of analyzing and processing data and events as they happen, not just once every few hours. Most traditional messaging systems don't scale up to handle big data in realtime, however. So engineers at LinkedIn built and open-sourced Apache Kafka: a distributed messaging framework that meets the demands of big data by scaling on commodity hardware.
Remember Snort? Or Asterisk? Or Jaspersoft or Zimbra? Heck, you might still be using them. All of these open source champions—InfoWorld Best of Open Source Software Award winners 10 years ago—are still going strong. And why not? They’re still perfectly useful.
Ten years ago these tools were among the best answers to pressing needs in the enterprise network—for intrusion detection, call management, reporting, and collaboration. But looking back on them now, you can’t help but think, “Wow. Software was so much simpler then.”
But even as we grapple with the likes of microservice architecture, distributed data processing frameworks, deep neural networks, and “dapps,” we remain steadfast in our commitment to bring you—this year and every year—the best that open source has to offer. Welcome to InfoWorld’s 2018 Best of Open Source Software Awards!
The best open source software for data storage and analytics
Image by IDG
Nothing is bigger these days than data, data, data. We have more data than ever before, and we have more ways to store and analyze it—SQL databases, NoSQL databases, distributed OLTP databases, distributed OLAP platforms, distributed hybrid OLTP/OLAP platforms. Our 2018 Bossie winners in databases and data analytics platforms include innovators in stream processing as well.
If you are tuned in to the latest technology concepts around big data, you’ve likely heard the term “data lake.” The image conjures up a large reservoir of water—and that’s what a data lake is, in concept: a reservoir. Only it’s for data.
Data lake defined
A data lake holds a vast amount of raw, unstructured data in its native format.
Therefore, all you need is a device that supports a flat file system, which means you can use a mainframe if you want. The data is moved to other servers for processing. Most enterprises go with the Hadoop File System (HDFS), because it is designed for fast processing of large data sets and is used in a big data environment where a data lake is likely to be used.