Loading...

Follow Big Data Europe on Feedspot

Continue with Google
Continue with Facebook
or

Valid

We are happy to announce SANSA 0.5 – the fifth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • Query heterogeneous sources (Data Lake) using SPARQL – CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.) are supported
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify and Ontop
  • Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
  • RDFS, RDFS Simple and OWL-Horst forward chaining inference
  • RDF graph clustering with different algorithms
  • Terminological decision trees (experimental)
  • Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

  • A data lake concept for querying heterogeneous data sources has been integrated into SANSA
  • New clustering algorithms have been added and the interface for clustering has been unified
  • Ontop RDB2RDF engine support has been added
  • RDF data quality assessment methods have been substantially improved
  • Dataset statistics calculation has been substantially improved
  • Improved unit test coverage

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • Example code is available for various tasks.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects HOBBIT, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST, MLwin and Simple-ML.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

 Boost 4.0 is the largest European initiative in Big Data for Industry 4.0. Boost 4.0, has a budget of 20M euros, a private investment of 100M euros and a 50 companies consortium from 16 countries, all of them coordinated by Innovalia Group. This initiative will lead the construction of the European Industrial Data Space to improve the competitiveness of the European Automotive Industry and will guide the manufacturing industry in the introduction of Big Data in the factory, providing with the necessary tools to obtain the maximum benefit of the Big Data industrial value.

The Boost 4.0 initiative will establish a group of smart and connected factories in Europe that will serve as a guide for the European industry. Apart from making possible the collaboration between the largest European industrial companies, Boost 4.0 has a €20M funding and will last 36 months. This initiative will try to accelerate the adoption of Big Data and advanced analysis solutions in the European Automotive Industry thanks to global standards, open APIs, secure digital infrastructures, trusted Big Data Middleware and digital manufacturing platforms.

Boost 4.0 is led by the Basque industry and will have a great impact on the automotive industry and in the capital goods, these are strategic sectors in the Basque economy and pillars of the Basque Industry 4.0 strategy. With this initiative, and being driven from the Basque Country, Boost 4.0 will bring very significant improvements with productivity increases close to a 20% and reductions of 50% in unexpected maintenance operations. More specifically Boost 4.0 is led by the Basque industrial group Innovalia. It is a project that has been managed to form a great team with the main European industrial companies including Gestamp, Volvo, Volkswagen, Capvidia, Philips, Siemens, IBM, and Telefónica among many other representative companies of the European Industry.

Make sure to download the Boost 4.0 project’s leaflet and take a look at the project’s website:

https://boost40.eu/

For news and updates, make sure to follow Boost 4.0 in the respective social media accounts:

Twitter, Linkedin Group and YouTube channel.  

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The BigDataEurope consortium would like to thank all stakeholders for following the project, attending our activities and contributing to the results. Your endorsement and participation contributed to a successful outcome! 

Summary of Major Results

One can summarise the results of the project in three categories:

Strengthening the Societal Communities: the large number of project activities provided a platform for stakeholders interest in the seven societal challenges could come together, network, identify common issues and identify common solutions – the latter, at least the ones of a technical nature, largely supported by the project. We augur that a large part of the communities strengthened by the project will continue to work closely together.

BigDataIntegrator Platform: perhaps the most concrete project result, the BDI is a flexible and open-source platform that can be more easily deployed and customised to build Big Data pipelines that address open-ended challenges. Based on the Docker virtualization, the base platform is enriched with a layer of services that support the workflows’ setup, creation and maintenance. Supported by a simple graphical UI, it offers basic building blocks (e.g. Apache Spark, Hadoop HDFS, Apache Flink, etc.) to get started with common Big Data technologies. The BDI continues to be maintained (on Github) beyond the project, and is being used in various external projects and initiatives. Given it’s impact in the big data technical area, it is also being proposed as an Apache Incubator.

Pilot Demonstrationsthe value of the BDI has been demonstrated in 7 separate pipelines that target a selected big data problem from each of the 7 societal challenges. These pipelines extend a predefined BDI pipelines designed for each of the societal challenges, based on the requirements identified for each domain. The pilots have been useful to demonstrate the flexibility of the BDI platform and its ability to target big data challenges of a varying from any domain. Some of the final pilots have taken a life of their own, and are being extended in follow-up projects and initiatives.

In addition to the above, the large amount of valuable material generated by the project is available for posterity on the project profiles:

Youtube Channel (50 videos): The recorded technical webinars (e.g. final BDI launch, relevance in the context of activities like the Big Data Value Association) and societal hangouts; demonstrations of BDI and its pilots, and interviews with the developers. 
SlideShare (237 slides): All project presentations, from a wide variety of physical and online activities (including all workshops, webinars, hangouts and external events).
Flickr: Pictures from the physical events (workshops, conferences and other events).

Follow-up Projects

Below is a list of H2020 and other projects that build on the BigDataEurope results, most notably by adapting a BDI pipeline for their needs. The list is valid until early 2018 and might not be updated at a later date.

Project

Timeframe

BigDataEurope take-up 

BigDataOcean

2017 – 20

Re-use and customisation of the BDE Platform.

SPECIAL

2017 – 19

Re-use and customisation of the BDE Platform (in particular SANSA and Ontario).

NextGEOSS

2016 – 20

The SatCen pilot builds on results of H2020 projects, including the BDE SC7 pilot.

BETTER

2017 – 20

A share of the BETTER big data pipelines targeting a total of up to 36 challenges (12 per year) will exploit the project results, particularly the BDI and the SC7 pilot instance. 

BigDataGrapes

2018 – 20

Re-use and customisation of the BDE Platform, specifically the extension of the software stack that has been produced in the SC2 pilot instance.

DARE

2018 – 20

Re-use and customisation of the BDE Platform as a basis for the DARE platform.

SUMA-I-BDA

2018 – 21

Smarter use of mobility assets through innovative big data Analytics, using the know-how gained in the SC4 pilot.

Precision Medicine

2018 – 19

Re-use and customisation of the BDE Platform a basis for the new Greek national precision medicine infrastructure. These efforts are also related to the IASIS project.

LAMBDA

2018-21

This CSA shall re-use results from the BDE project, in particular the platform.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Big-data Earth observation Technology and Tools Enhancing Research and development is an EU-H2020 research and innovation project, started in November 2017 to the end of October 2020.

The project’s main objective is to implement Big Data solutions (denominated as Data Pipelines) based on the usage of large volumes and heterogeneous Earth Observation datasets. This should help addressing key Societal Challenges, so the users can focus on the analysis of the extraction of the potential knowledge within the data and not on the processing of the data itself.

To achieve that, BETTER is improving the way Big Data service developers interact with end-users. After defining the challenges, the promoters validate the pipelines requirements and co-design the solution with a dedicated development team in a workshop. During the implementation, promoters can continuously test and validate the pipelines. Later, the implemented pipelines will be used by the public in the scope of Hackathons, enabling the use of specific solutions in other areas and the collection of additional user feedback. www.ec-better.eu  

⇒ SUBSCRIBE HERE ⇐ for major project updates

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

We are happy to announce SANSA 0.4 – the fourth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs.

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

  • Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad format
  • Reading OWL files in various standard formats
  • Support for multiple data partitioning techniques
  • SPARQL querying via Sparqlify
  • Graph-parallel querying of RDF using SPARQL (1.0) via GraphX traversals (experimental)
  • RDFS, RDFS Simple, OWL-Horst, EL (experimental) forward chaining inference
  • Automatic inference plan creation (experimental)
  • RDF graph clustering with different algorithms
  • Terminological decision trees (experimental)
  • Anomaly detection (beta)
  • Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

  • Parser performance has been improved significantly e.g. DBpedia 2016-10 can be loaded in
  • Support for a wider range of data partitioning strategies
  • A better unified API across data representations (RDD, DataFrame, DataSet, Graph) for triple operations
  • Improved unit test coverage
  • Improved distributed statistics calculation (see ISWC paper)
  • Initial scalability tests on 6 billion triple Ethereum blockchain data on a 100 node cluster
  • New SPARQL-to-GraphX rewriter aiming at providing better performance for queries exploiting graph locality
  • Numeric outlier detection tested on DBpedia (en)
  • Improved clustering tested on 20 GB RDF data sets

Deployment and getting started:

  • There are template projects for SBT and Maven for Apache Spark as well as for Apache Flink available to get started.
  • The SANSA jar files are in Maven Central i.e. in most IDEs you can just search for “sansa” to include the dependencies in Maven projects.
  • Example code is available for various tasks.
  • We provide interactive notebooks for running and testing code via Docker.

We want to thank everyone who helped to create this release, in particular the projects Big Data EuropeHOBBITSAKEBig Data OceanSLIPOQROWDBETTERBOOST and SPECIAL.

Spread the word by retweeting our release announcement on Twitter. For more updates, please view our Twitter feed and consider following us.

Greetings from the SANSA Development Team

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview