Key-value, document-oriented, column family, graph, relational... Today we seem to have as many kinds of databases as there are kinds of data. While this may make choosing a database harder, it makes choosing the right database easier. Of course, that does require doing your homework. You’ve got to know your databases.
One of the least-understood types of databases out there is the graph database. Designed for working with highly interconnected data, a graph database might be described as more “relational” than a relational database. Graph databases shine when the goal is to capture complex relationships in vast webs of information.
As data centers are called upon to handle an explosion of unstructured data fed into a variety of cutting-edge applications, the future for FPGAs looks bright.
That’s because FPGAs, or field programmable gate arrays, are essentially chips that can be programmed, after manufacturing, to act as custom accelerators for workloads including machine-learning, complex data analysis, video encoding, and genomics – applications that have far-reaching consequences for communications, networking, health care, the entertainment industry and many other businesses.
Such applications lend themselves to parallel processing, an important feature of FPGAs, which can also be reconfigured on the fly to handle new features as the nature of these workloads evolve.
You’d be forgiven for passing by the announcement of Apache Spark 2.3. After all, it’s a point release, isn’t it? Sure, there will be some bug fixes, maybe an improvement or two to the MLLib framework, maybe an extra operator or something, but nothing all that major. That will be saved for Apache Spark 3.0, surely?
In fact, this is no mere point release. Apache Spark 2.3 ships with two major new features, one of which is perhaps the biggest (and often-requested) change to streaming operations since Spark Streaming was added to the project. The other is native integration with Kubernetes to execute Spark jobs in container clusters.
Today’s big data analytics market is quite different from the industry of even a few years ago. The coming decade will see change, innovation, and disruption ripple through at every segment of this global industry.
In the recently published annual update to its market study, Wikibon, the analyst group of SiliconAngle Media, found that the worldwide big data analytics market grew at 24.5 percent in 2017 from the year before. (I work for Wikibon.) This was faster than forecast in the previous year’s report, owing largely to stronger-than-expected public cloud deployment and utilization as well as accelerating convergence of platforms, tools, and other solutions. Also, enterprises are moving more rapidly out of the experimentation and proof-of-concept phases with big data analytics and are achieving higher levels of business value from their deployments.
Data scientist is the best job in America, according to a survey on Glassdoor, and it consistently makes the top of the list year after year. With a job score of 4.8 out of 5, a median base salary of $110,000 per year and over 4,500 current job openings, it’s a great time to be a data scientist.
But, as the role of data scientist grows in demand, traditional schools aren't churning out qualified candidates fast enough to fill the open positions. There's also no clear path for those who have been in the tech industry for years and want to take advantage of a lucrative job opportunity. Enter the bootcamp, a trend that has quickly grown in popularity to train workers for in-demand tech skills.
Was 2017 the year that every product under the sun was marketed as being cognitive, having machine learning, or being artificially intelligent? Well, yes. But don’t hate all of them. In many cases, machine learning actually did improve the functionality of products, sometimes in surprising ways.
Our reviewers didn’t give any prizes for incorporating AI, but did pick out the most prominent tools for building and training models. These include the deep learning frameworks TensorFlow and PyTorch, the automated model-building package H2O.ai Driverless AI, and the solid machine learning toolbox Scikit-learn.
Fueled by a capital injection of $263 million making it the first cloud-native data warehouse startup to achieve "unicorn" status, Snowflake is set this year to expand its global footprint, offer cross-regional, data-sharing capabilities, and develop interoperability with a growing set of related tools.
With the new round of funding, announced Thursday, Snowflake has raised a total of $473 million at a valuation of $1.5 billion. Founded in 2012, the company has become a startup to watch because it has engineered its data warehouse from the ground up for the cloud, designing it to remove limits on how much data can be processed and how many concurrent queries can be handled.
Of the many use cases Python covers, data analytics has become perhaps the biggest and most significant. The Python ecosystem is loaded with libraries, tools, and applications that make the work of scientific computing and data analysis fast and convenient.
But for the developers behind the Julia language — aimed specifically at “scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing”—Python isn’t fast or convenient enough. It’s a trade-off, good for some parts of this work but terrible for others.
Created in 2009 by a four-person team and unveiled to the public in 2012, Julia is meant to address the shortcomings in Python and other languages and applications used for scientific computing and data processing. “We are greedy,” they wrote. They wanted more:
Modern ethos is that all data is valuable, should be stored forever, and that machine learning will one day magically find the value of it. You’ve probably seen that EMC picture about how there will be 44 zettabytes of data by 2020? Remember how everyone had Fitbits and Jawbone Ups for about a minute? Now Jawbone is out of business. Have you considered this “all data is valuable” fad might be the corporate equivalent? Maybe we shouldn’t take a data storage company’s word on it that we should store all data and never delete anything.
Back in the early days of the web it was said that the main reasons people went there were for porn, jobs, or cat pictures. If we download all of those cat pictures and run a machine learning algorithm on them, we can possibly determine the most popular colors of cats, the most popular breeds of cats, and the fact that people really like their cats. But we don’t need to do this—because we already know these things. Type any of those three things into Google and you’ll find the answer. Also, with all due respect to cat owners, this isn’t terribly important data.