2017 will be remembered as the year new applications workload will have weighted more in the cloud than on-premise. According to Cloud Security Alliance (CSA) report, in 2016, 60.9% of applications workloads were still in enterprise data centers. By the end of 2017, however, fewer than half (46.2%) will remain there. We’ve been witnessing a similar trend with an increasing number of analytics and machine learning initiatives being hosted in the cloud with the data locality balancing in and outside the cloud.
To accelerate and provide even better benefits for Azure customers, Trifacta has been working hard to comprehensively and securely support Azure services for data-driven initiatives, adding Trifacta to the Microsoft Azure Marketplace under the intelligence, analytics and compute categories while also earning the very exclusive Microsoft Co-sell Status, which recognizes strategic joint customers and deep technical solution validation on Azure.
This transition to cloud-centric computing environments was a big reason why our team recently raised a new $48 million financing round. There is a need for more efficient data preparation in this hybrid, multi-cloud world and Trifacta is extremely well-positioned given our interoperable architecture to be the de facto solution for enterprises for wrangle data for analytics and machine learning initiatives.
Wrangler Enterprise is designed for demanding data wrangling initiatives that span a large number of users and data volume at scale. It’s also for customers that wish to configure and manage their own HDInsight clusters to use with Trifacta.
Trifacta provides several different options for deploying Wrangler Enterprise via Azure Marketplace. You can choose to create a new HDInsight cluster, add Trifacta to an existing HDInsight cluster, or use a custom Azure Resource Manager (ARM) template. Based on data volume and processing needs, you can choose the appropriate resourcing required when you install via one of the methods mentioned above.
If you have less demanding data wrangling requirements with small to moderate data volume and you don’t want to manage the underlying data processing infrastructure, we also offer a hosted edition of Trifacta for individuals and teams – Wrangler Pro. You can find the best Trifacta edition that works for you here.
Trifacta Architecture on Microsoft Azure
Fig 1 – Typical deployment architecture of Trifacta on Microsoft Azure
Trifacta integrates natively with several components and services that are part of the Azure Cloud Platform. Most importantly it takes into consideration key security requirements to ensure data access and processing meet strict Enterprise governance standards and protocols.
You can wrangle data stored either in ADLS or WASB using Trifacta. These storage services provided by Azure allow a large variety of use cases to be supported. Combined with the security framework described below, data access is always secure.
Once data is wrangled in Trifacta, it can be made available to a variety of downstream analytics platforms and applications. Azure SQL Data Warehouse is the most popular platform for interoperating with analytics applications such as Power BI, Tableau and Qlik. Trifacta allows for read and write access from and to SQL Data Warehouse via either JDBC (small/medium sized data) or Polybase (larger volume data) interfaces.
Whether your data volume is GB, TB or PB, Trifacta can easily wrangle them all on Azure by leveraging different compute engines that’s best suited for the workload. For small to medium data volumes, Trifacta’s unique Photon in memory compute framework is made available within the application running on Azure. For larger volumes, Trifacta integrates natively with Apache Spark running on latest HDInsight v3.6.
Trifacta Wrangler Enterprise supports secure data access to all the resources provided on Azure via various SSO technologies. By default, you can authenticate through Azure Active Directory (Azure AD), a fully cloud enabled directory service offered by Microsoft. You can also integrate your existing LDAP directory services to that of Azure AD and fully leverage secured access to Trifacta.
For full enterprise security support, you can also choose to configure your HDInsight cluster to be a domain joined cluster, where it’s part of your Active Directory Domain. Trifacta supports accessing and running wrangling jobs against a domain joined cluster. For secured Hive access, Trifacta also supports Apache Ranger in conjunction with HDInsight.
Trifacta is Azure Co-sell Ready
This comprehensive support of Azure data services and the increasing customer adoption on Azure drove Microsoft’s attention to certify Trifacta as a Microsoft co-sell partner. Co-sell status indicates not only a certain level of large strategic joint customers but also a deep technical due diligence Microsoft conducted reviewing Trifacta’s solution on Azure. For customers this means the joint solution has been tested at some of the world’s largest enterprises as well as deeply reviewed by Microsoft Azure experts. This extreme level of vetting can ensure your organization can have confidence rolling out Trifacta on Azure across your organization.
Want to learn more about running Trifacta on Azure? Here are some additional resources to check out:
The users have spoken—for the fourth consecutive year, we’ve been named the #1 data preparation vendor in the End User Data Preparation Market Study conducted by Dresner Advisory Services. Under the Wisdom of Crowds® methodology, Dresner surveyed a wide range of participants and reported on their data prep usage patterns, the importance of data prep in their everyday work, and the data prep features they prefer. In considering 28 vendors, Trifacta came out on top.
The usability features considered to be the biggest priority among end users include “immediate preview and feedback” and “visual interface.”
Other important usability features include “machine learning and recommendations based on usage data” and “less than 2 second response time for design features,” among other nontechnical user-oriented features. We’ve specifically designed Trifacta around features like these, and are excited to hear they’re now a top priority among data prep end users.
2018 saw a shift in favor toward standalone tools.
This trend resonated with us. At Trifacta, we’ve made a conscious effort to partner with best-of-breed data vendors instead of building our own application-specific solutions. Not only has this allowed us to concentrate solely on building a rich data wrangling experience—which, in and of itself, is a thorny problem to solve—but also ensures organizations can choose the best technologies for their business.
Seventy-two percent of respondents say they “constantly” or “frequently” make use of end-user data preparation.
Only about 7 percent of respondents only “rarely” or “never” perform end-user data preparation. Notably, the highest frequency of use came from respondents in sales and marketing departments, where there is often a variety of complex, business-critical data to wrangle.
Twenty-two percent say their current approach to data preparation is “highly” effective and has improved over time. Witnessing the data prep market develop over time as end users find new and more effective approaches to preparing data is exciting, through there is clearly still room for continued growth. Twenty-two percent is a small percentage, and we’re excited to see that number grow in years to come.
Once again, we’re honored to have received this recognition and are excited to play a key role in the continued development of the data preparation market.
Des données brutes à l’analyse : Le Data Wrangling, aussi appelé Préparation de Données en Self-Service, est le processus qui permet à partir des données brutes de les découvrir, structurer, nettoyer, enrichir, valider et de publier les résultats dans un format adapté à l’analyse des données.
Mais pourquoi Wrangling ? En fait un Wrangler est un cowboy et donc ici le Data Wrangler peut être vu comme le cowboy de la donnée, essayant de rassembler ses données éparpillées de la même manière qu’un cowboy rassemble ou tri son bétail. Le terme wrangling, utilisé dans le langage courant aux Etats-Unis, induit que l’activité est laborieuse, déplaisante, fatiguante mais que celle-ci doit nécessairement être réalisée afin d’aboutir à un travail bien fait. Par exemple, vous pourriez demander à votre enfant, “Wrangle your room”, ce qui signifie en Francais “range cette pagaille” (en restant poli)
C’est donc un travail fastidieux et important qui devient de plus en plus consommateur de ressources et d’énergie avec la multitude de nouveaux formats de donnée et les volumes toujours plus importants de données produites et échangées. De même les équipes informatiques et IT n’ont pas toujours le temps de préparer les données pour tous les besoins métiers, il faut donc donner plus d’autonomie aux équipes métiers pour qu’elles préparent elle-même leurs données pour leurs besoins spécifiques.
Voici un peu plus de détail sur les principales étapes du Data Wrangling:
1/ La Découverte
Découvrez et explorez vos données brutes. Découvrez les structures, les contenus, la qualité et la distribution de vos données brutes quelques soit les formats ou les volumes. Le but est de voir et de comprendre la nature des données avant de les manipuler.
2/ La Structuration
Re-structurez tout type de données non-structurées ou semi-structurées, que ce soit des extractions brutes des systèmes legacy, des logs, des formats hiérarchiques (XML, JSON). Le but est de créer les colonnes et les lignes de données au bon niveau d’agrégat dont vous avez besoin dans vos analyses.
3/ Le Nettoyage
Nettoyez vos données que ce soit pour des problèmes de typage de donnée ou de valeurs manquantes. Normalisez, standardisez et ajoutez vos propres types de données afin de valider la qualité de vos données dans votre propre contexte métier. Le but est d’assurer la qualité de vos données afin de produire des analyses fiables et précises.
Enrichissez vos données en mélangeant des jeux de données provenant de différentes sources de données et avec différents types de jointures. Le but est d’enrichir vos analyses avec des données multiples et variés provenant de différentes sources métiers.
5/ La Validation
Validez vos étapes de transformation à l’échelle, c’est à dire sur l’ensemble des jeux de données. La manipulation des données et la création des scripts de transformation est effectuée de préférence de manière interactive sur des échantillons, il faut donc ensuite valider cette transformation et la qualité du résultat sur l’ensemble des données. Le but est de s’assurer de la qualité des données générées sur l’exhaustivité des jeux de données initiaux.
6/ La Publication
Publiez les résultats des transformations dans des formats accessibles aux outils d’analyse utilisés dans l’entreprise. Les ensembles de données générés peuvent être par exemple stockés dans des fichiers plats, dans des tables ou dans des formats pour des outils d’analyse. Le but est d’avoir un accès simple et transparent aux résultats depuis vos outils d’analyse prédictive, de reporting ou de visualisation de données.
Data visualization had traditionally been a task left to the IT department. Since IT staff were the only ones with the keys to the data, it made sense that they’d be the ones to prep the data and use it to create dashboards and visualizations. This system had a few obvious weak points: IT’s time could be better spent on other tasks, and compared to actual analysts, IT often lacks the context to perform true analysis. The rise of self-service BI seemed to solve these problems, putting the creation of visualizations in the hands of the analysts who have the greatest context for the data. However, the impact of this self-service model has only stretched so far.
The Clean Data Bottleneck
Self-service BI has solved one set of problems while creating another. True, analysts no longer necessarily need IT to create their dashboards and visualizations, but in order to get started, they first have to get clean data from IT that meets their requirements. Good visualizations need clean, accurate data that comes in particular formats; datasets almost always have to be transformed before they can be put to good use. This means analysts and IT can get caught in a potentially confusing, error-prone data wrangling loop: the analysts don’t know what the data looks like until they get it from IT, so they transmit requirements that don’t actually line up with what they need; at the same time, IT may not have full understanding of why certain datasets need to be transformed in certain ways. The constant back-and-forth between business analysts and IT creates a limited perspective, introducing glitches that neither side knows to look for. Data quality issues can have a huge impact on visualizations—an unnoticed outlier can skew an important average, or duplicate records could distort the overall quality of the dataset.
Not only is the IT/analyst loop treacherous due to the potential for information loss inherent in all communication between departments, it’s time-consuming. Even in a business with active self-service BI solutions where analysts no longer have to wait for IT to create their visualizations, data wrangling beforehand still eats up 80% of the time an analytics project takes to complete. Multiply that by the increasing number of analysts trying to get work done, and you’ve got a massive bottleneck in your analysis pipeline.
Moving On Up: Self-Service Data Prep
It doesn’t have to be this way; self-service access should be moved further up the data pipeline. Analysts know exactly what kind of data they want; why not let them curate and wrangle it themselves?
Self-service data preparation is the only way to scale analysis, data science, machine learning, etc.. in a data-driven organization, giving analysts the ability to create compelling and accurate analytics without having to wait for IT to deliver clean data. IT’s duties can instead shift towards governance, thereby granting and controlling self-service data access to users or classes of users; as self-service data prep expands throughout a given organization, maintaining data lineage becomes another important function for IT. Everyone gets to concentrate on the work they’re best at without each department having to wait for the other to get back to them; by cutting analysts free of the feedback loop, self-service data prep dramatically slashes project times.
Welcome To Trifacta’s Wheelhouse
According to Dresner Advisory Services and Forrester, Trifacta is the #1 self-service data prep platform, providing an analyst-friendly toolset that cuts the feedback loop between IT and business users. Trifacta also includes the governance and lineage needed for IT to manage Trifacta across thousands of users. Dataset volume isn’t a problem, either; just ask the German stock exchange. When the Deutsche Börse’s Content Lab needs to wrangle datasets in the 1-1.5 petabyte range, they use Trifacta to present customers with analyses within 24 hours, a shocking reduction from their former 2-3 month turnaround. Trifacta can help any data-driven organization deliver results faster. Put the power of data wrangling back in the hands of analysts where it belongs: to try it out yourself, download Trifacta Wrangler now.
While sitting in a cubicle, doing the kind of work you would expect I’d be doing in a cubicle at a large company, I got a Gchat message from my wife. It wasn’t one of her normal Gchat questions—she was asking why her VLOOKUP in Excel was not working. Now, my wife is no dummy. Admittedly, she is much smarter than I am. So, I was loving the fact that she needed my help with a spreadsheet. We got on the phone and I started going through the checklist of things that could be causing her VLOOKUP not to work. Frustrated, she said things like, “You mean it matters what order my columns are in? And the casing? What order does the formula go in? This is stupid.”
Yes, it is stupid. I spent a lot time in my career pulling my hair out trying to figure out why my VLOOKUP wasn’t working. I used to work as an analyst at one of the world’s largest CPG companies, and Excel was often no match for the size and variety of the data we leveraged. We struggled through endless hours of Excel, and then Access, until finally seeking out a modern approach to data prep, Trifacta. In comparison, using traditional tools like Excel and Access felt like watching a movie on a VHS; Trifacta was my Netflix.
In this post, I’ll walk you through some of the common frustrations that I faced using Excel and Access to build sales forecasting reports, and how Trifacta made my life easier.
Stage one: Buried in Excel Hell
When I was an analyst, I was put on a team that was going to pilot a new forecasting program. The idea was to take retail customer data and combine it with our own internal data to increase forecast accuracy. As you can imagine, this was a tall order since none of the data we were blending together was an easy 1:1 ratio—dates, item codes, and even units of measure needed to be converted so they could easily be linked together. Additionally, the data was stored in multiple databases and the customer data had to manually be downloaded from the customer’s portal.
Although we could export most of the data into Excel files, Excel wasn’t a sustainable option to create a report that needed regular updates. The reason? It was incredibly difficult to perform VLOOKUPs across multiple files. (It was actually easier to actually copy and paste the data into different tabs on one file, but that method is not efficient.) Even if I managed to get my data in one Excel file, VLOOKUPs can only be performed on one pair of key columns, and sometimes I needed three, four, or five keys in order for the data to blend together correctly. And when I got the dreaded #N/A error in each cell? I had to troubleshoot in the same way you do with Christmas lights—when the entire string goes out, you have to test each bulb one by one to see which one is causing the problem.
Now sure, there’s a lot of tricks, add-ons, and plugins that we could have Googled and figured out how to do this all in Excel. However, the director of my department already informed my customer that we would have a visual analytics tool for them to review next week to get them on board with this program. (He promised this to the customer before talking to me, of course.) I needed this done and I needed it done now. So, we pivoted (pun intended) to using Access.
Stage two: Relearning Access
I had not used Access since college so I had to dig in my closet and blow the dust of my old Computer Science 101 textbook. Since not a lot of the people on my team had much Access experience, we knew this would be an issue going forward with onboarding new team members. Access isn’t as widely used as it once was. Sure, it could accomplish a lot of the tasks we were looking for, but it was a painful process. We could join by using multiple keys, but when you create the join, you have to query the results at scale to find out if it worked. This is pretty much the case with anything in Access. You plug in your logic (which still requires you to memorize tons of formula syntax) and then you run the query. It would take a long time, so we would typically run to Starbucks or watch Youtube videos while we waited to see if what we did actually worked. If not, rinse and repeat.
We managed to get the project done and the customer bought into the program, but managing this report each week was a nightmare. I had to dedicate almost a full day to generating the report for the dashboard. If anything needed to be changed or updated in Access, I could not work on anything else for the rest of the week. We still needed a new solution, and that’s when I discovered self-service data prep platform Trifacta.
Stage three: Discovering Trifacta, the Netflix of data prep
The very first thing I saw in Trifacta was the predictive suggestions. If I simply highlighted a pattern of string characters of numbers it gave me suggestions, in natural language, of what Trifacta thought I might want to do with the data. “Do you want to Extract this pattern into a new column?” However I interacted with my data, Trifacta would provide these suggestions. I did not have to memorize formulas, commands, or syntax—it was more like I was actually working with another human than a computer program. Cleansing and structuring my data was a breeze in Trifacta, but the next test was joining data together. Not only could I join together data with multiple keys in the join interface, but I could use inner and outer join functionality, and my keys did not have to match exactly. I could join keys together regardless of white space, special characters, and casing.
I was able to get this project finished and make it sustainable. Running the report was almost pretty much automated. I actually got better at my job because I was spending less time fixing data so I could work with it and I could focus on what was actually in the data. I became more proactive as opposed to reactive. I started to solve problems. Now that I had the process down, I could onboard new customer data in a fraction of the time it took with Excel and Access. For one report, we spent about 3 months locked in a conference room putting all of the data together. My last day at the company, I finished a similar project by lunch.
The current low interest rate environment is affecting the bottom line of insurance, threatening income from rate-sensitive products and investments and in turn putting increased pressure on underwriting accuracy to make up the difference. Actuaries are now expected to provide more personalized and accurate policies using information gleaned from of a host of new sources of data, many of which are unstructured or lacking consistency.
Gathering and analyzing an abundance of data from disparate sources is a challenge. However, when done properly, the effort results in better risk pricing and better identification of key drivers.
A Data Wrangling Challenge: Actuaries’ Disparate Data Sources
In the last 15 years, the world has seen an explosion of new digital data sources, from governmental behavioral insights to the Internet of Things. Actuaries are faced with the very real challenge of managing many data sources and wrangling that disparate data to inform their work. There are several sources of behavioral and personal data that now drive analysis and predictive modeling.
Third-party Sources: Insurers can use information like credit scores to better assess risk, after empirical evidence showed a strong corollary between credit scores and driver safety ratings.
Geographical and Demographic Information: Properly combining the right data on geographic and demographic information can now be used to better predict risk of car accidents, disease, or workplace safety issues for policy holders.
Medical Data and Health Trends: In addition to looking at prior claims made for various medical conditions, actuaries now have access to data on treatments available in distinct geographic areas. Combined, actuaries can tell who has a higher risk of disease and less access to care for it, and adjust the policy according to this more accurate risk assessment.
Government-sourced Behavioral Data: The federal government collects thousands of data points on its constituents, from the number of members of the household to life satisfaction to risk behaviors exhibited (i.e. smoking, not wearing seat belts, etc.). Gathering data on these factors over time can lead to more predictive analysis for potential new policy holders.
Future Scenario Modeling: Actuaries can now use complex modeling to examine the probability of natural catastrophes versus the potential intensity of those disasters in certain areas, helping them predict why a natural disaster in one region can result in thousands more deaths than the same event in a different area.
Trend Toward New Technologies
With all of these new data sources providing both structured and unstructured data, actuaries need to find an efficient way to wrangle larger amounts of data. Traditional spreadsheet tools like Excel present a number of data wrangling challenges and are no longer the best tool for the job. Spreadsheets are not robust enough to handle the inconsistencies or the volume of data coming in, and errors are more likely to occur due to the varying data sources. The traditional spreadsheet process is also labor-intensive and slow because the manual work isn’t repeatable when new data comes in. In short, the overwhelmed actuary’s needs are worsened by the limitations of these traditional data management tools.
The Trifacta Solution
Thankfully, Trifacta is designed to easily aggregate, clean, validate, and visualize multiple data sets for seamless collaboration between actuarial colleagues. Better understanding of the data means better risk models and ultimately better policies for the customer. With Trifacta, actuaries can produce more accurate policies, speed consumer insights, and differentiate in the market, all while decreasing their production time and costs.
Today, investing in the right data technologies is more critical than ever before. The frontier for competition has shifted and the “Data Wars” have begun. Everyone now understands they compete on information as much as the goods and services they make. Welcoming uncertainty into your business, catering to long tail segments, spotting emerging shifts in buyer preference, improving risk modeling, even ferreting out “fake news” all require massive improvements in information agility. It is this information agility that will separate the winners from the losers over the next decade. To get there, data-driven organizations understand that there are three foundational shifts they must embrace.
The “who” is changing– end users who know the data best are doing the work that previously fell only to the highly technical. This removes bottlenecks and improves outcomes as users with context in their heads get to better, more unique insights faster.
The “what” is changing– the data is increasingly large and unstructured. Machine generated data is the new normal. Behavior data is everywhere. Interactions matter as much as transactions.
The “where” is changing– enterprises require the ability to make late binding decisions about where workloads are run and what tools are used to visualize/analyze data. They will also want the ability to change their minds and to cost effectively support multiple approaches while avoiding lock-in.
Without addressing these trends, information potential remains hidden and benefits go unrealized.
Solving for this will require customers to modernize their analytic stack to achieve better economics and improved flexibility. Instead of solving this with complex coding and legacy ETL frameworks, enterprises want to prepare data interactively; iteratively shaping, structuring and harmonizing complex data in real time. This requires customers to democratize the production of new data assets with self-service data wrangling in much the same way they have previously democratized the consumption of data with self-service BI. This has been Trifacta’s focus from the beginning— making the painstaking work of preparing data intuitive, efficient and fun for everyday data professionals. We pioneered a market that has since been validated by thousands of users and hundreds of enterprise customers, by industry analysts and by leading partners.
We will also use the funding to enhance our breakthrough ML-driven approach to self-service while increasing enterprise platform capabilities that allow customers to collaboratively curate data; crowdsourcing the best work and operationalizing it at scale. We will do all of this across cloud and hybrid environments and across analytic, visualization and data science tools. Our focus and commitment to making data understandable and useful for analysis regardless of where it lives, or how it is consumed, remains unchanged.
By helping our customers wrangle anything, we will connect them to everything. When we’re successful, not only will we have built a truly revolutionary business, but we’ll also have forever changed the way the people, data, and computation come together to solve problems. That is pioneering vision of the company. That is what drives us every day. That is what has established us as the market leader. That is the “Trifacta.”
To learn more about Trifacta’s latest round of financing and what it means for the company, read the full press release here.
Give Trifacta a try for yourself! Our wrangler product is free for you to use as long as you’d like. Simply download Wrangler here and let us know what you think.
We’re hiring! If you love the challenge of solving the world’s most complex data problems like to be part of this amazing team, please let us know.
Insurance organizations are now going through what banks have endured for the last 15 years—significant changes in compliance and regulation. Banks initially took a siloed approach to compliance, with business unit by business unit implementing specific solutions. No one knew that regulation would hit banks so hard and, over time, this siloed, uncoordinated response turned out to be very costly (institutions often spend 5% to 10% of their net income on compliance and it is expected to increase drastically). Without the right approach and the right tools, banks experienced a duplication of data and processes, inconsistent results, lack of reuse, difficulty adapting to changing regulations, and difficulty presenting to auditors.
With new banking regulations (in particular BCBS 239, which requires reporting at the transaction level), all banks have been forced to take a step back and review their approach. To make regulation reporting easier and more accurate, banks are now consolidating all their data into a single place and leaning on modern data prep tools to ensure accuracy.
Banks experienced a learning curve as they adjusted to new regulation compliance while striving to remain cost effective and efficient. Insurance companies can benefit from this experience by avoiding the same errors, and by seeking out the proper tools for the job.
IFRS 17: Not Your Average Reporting Standard
As far as insurance regulations go, Solvency II was an appetizer, and IFRS 17 is the main course—and a big one to swallow at that. It is the biggest challenge the insurance industry has faced in generations.
IFRS 17 is a monumental change because it applies a new accounting model to insurance contracts, changing profit calculation to occur when the insurance fulfills its promises to the policyholder, versus the old model that recognizes profit at the time the contract is issued.
Accurately estimating the cash inflows and outflows means insurance companies need to look far into the future—which requires complex, fundamental changes and new calculations in accounting for both liability measurement and profitability recognition. This change is mandatory and failing to deliver on IFRS 17 can have serious repercussions.
The new standards come into effect on January 1, 2021. While it may seem to be in the distant future, this massive initiative necessitates changes in technology to be done now.
Choosing the Right Data Platform for IFRS 17
At Trifacta, we’ve worked on many projects to modernize companies’ risk and compliance data pipelines. Over time, we’ve witnessed similar patterns in every single engagement we’ve been involved in. We recommend that insurance companies look for the following benefits as they strive to have their data pipeline support new regulations. A modern analytic stack should :
Result in cost savings
Avoid data duplication by consolidating the data in a single place to better supply and meet the demand of data for each regulation
Leverage modern data processing and storage technology that is scalable and flexible
Leverage both cloud and on-prem technology according to sensitivity and processing requirements
Optimize IT team activities
Generate results faster
With self-service solutions lessening IT dependence, ess business users are able to define their own calculations and reports
The time-to-market matches speed of query formulation and regulation requirements
Real-time data sourcing and management facilitates event-driven decisions
Increased automation of processes and process engineering
Continuous delivery and deployment with frequent deliveries in production
Yield better results
More accurate results with a universal data quality processing approach
Increased governance and full transaction level traceability for efficient auditing
Analytics capability integrated into the architecture, with Machine Learning available
Scalable, fully secured solution
Trifacta for IFRS 17
Trifacta is a critical component of a well-designed data solution for IFRS 17. Trifacta offers self-service data preparation for risk, financial, and accounting analysts. With it, they can clean and combine data, create metric calculations, and validate the data themselves to create the required reports. No matter the complexity or volume, Trifacta will deliver accurate, clean data that can be trusted for reporting.
Empowering risk and financial analysts with self-service data prep has a direct impact on the speed and design of reporting. With Trifacta, users experience up to 90% faster delivery gains. Some customers with very large reports went from spending nearly a day processing them to getting them delivered in less than an hour.
Perhaps the most important benefit is the shift of ownership of the metric calculation process from IT to the business users themselves. They can now keep up with any regulation changes and implement updates themselves without waiting 6 months for the IT team to deliver it. In addition, because Trifacta tracks every single data manipulation, it is easy to demonstrate to regulators how particular metrics are calculated.
To learn more about Trifacta and how it relates to regulatory reporting, watch this short demo or reach out to our team.
The increased volume of data available to the insurance industry means more information for making informed business decisions—both at the higher level of executives and the micro-level of quantitative analysts. The essential component to truly benefiting from this data, however, lies in using the right data wrangling tools available. Properly executed, wrangling provides data insights that improve both analytical inquiries and the quality of the results.
Decision-Making and Justifying the Decisions
More data is indeed more power in financial markets and in particular for the insurance industry, where key decision-making processes are made at a much more micro level.
In years gone by, decisions, and the rationale behind them, came from a much more macro level.
As data has become more prevalent in our day to day lives, so has the need to understand it and “do something with it.” This has led to decisions shifting down the chain, but the rationale given to regulators or boards of directors still being made at the macro level. This has caused a disconnect between the aggregation of these smaller micro-level decisions to a “one size fits all” macro-level rationale.
Scarce Data in Insurance
In the insurance industry—and in particular specialty insurance markets such as Lloyd’s of London—data has always been scarce. Of course this is at the essence of why you require such types of insurance because insurance is intended to collect premium from many parties to spread the risk of concentrating exposure in any one area.
However, the difference between this and retail insurance, for example, is that retail may have many hundreds of thousands of likely policyholders wanting a very generic policy. Specialty insurance conversely has very few potential policyholders who would want insurance coverage for specific needs, such as coverage for their oil rig in the Gulf of Mexico from the threat of hurricanes.
Historically, specialty markets have used sparsely-available data and judgement to assess risk. The key to this has always been imperfect information. The value of this imperfect information is the difference between an insurer either making profit or seeing huge losses.
Quants and Data
Publicly available data is helping to supplement quants with more data to analyse opportunities in the market. This means that entities that may hold a large proportion of the market share do not now hold as much of an advantage (because they own a large part of market data that they keep confidential) over market participants that hold a smaller market share.
Let’s take the example of the oil rig again. There have only been a finite number of hurricanes that have ripped through the Gulf of Mexico and each time a new oil rig is built it is meant to withstand greater force from a hurricane. This impairs the use of data within the models and puts an increasing amount of importance on cutting through the data and understanding what it is supposed to represent. So on the face of it, quants could make quite bold recommendations that are in fact flawed, if one investigated the detail behind it.
Using Tools to Understand Data
More data can mean more power only if we are able to understand and use that data. Solutions like Trifacta are today’s answer to analyzing large amounts of data quickly, using visual displays to glean insights easily, and in turn making well-informed business decisions.
Creating homogeneous groups of data that have enough volume to be credible is always going to be a difficult task. But as an industry we should use the tools available to us, like Trifacta, to understand the data before we provide macro-level rationales for micro-level decision-making.
This week, in the spirit of the premiere of Star Wars: The Last Jedi, we devoted our weekly wrangling challenge to all things Star Wars. Could we figure out which species preferred to live on swamp planets? Or how fast spaceships fly? We tasked our team to find out. In this post, I’ll share a bit of that behind-the-scenes wrangling, and a few pictures of our post-challenge celebration.
The Wrangling Challenge: May the Force Be With You
If you’re not familiar with Trifacta, we make a product that helps data analysts of all stripes clean and prepare data for analysis. Trifacta was built for all users, from non-technical analysts who have typically used Excel to SQL and python pros who would prefer a visual and fast solution to transform data – whether that’s reformatting timestamps or aggregating and averaging sales records – so it can be analyzed.
Our customers wrangle some incredibly complex datasets every day. To step into their shoes, we run bi-weekly “Wrangle U” challenges at Trifacta using our product, Wrangler Enterprise. We challenge ourselves with the same kinds of data cleansing and data prep problems our customers face, such as standardizing and structuring tissue sample manifests (like our biotech customers might have to manage) or un-pivoting weekly sales data or making sense of sensor data from IOT systems.
This last week, however, we did something a little different: wrangled Star Wars data from a public dataset, https://swapi.co. For this challenge, we compiled the data from the SWAPI database which consists of datasets with Star Wars planets, characters, spacecrafts, and more. After pairing up Trifacta employees across the world, we challenged each duo to wrangle the dataset to uncover answers to questions such as:
What green skinned character has the max mass, and what is that mass?
What are the spaceships that fly at 1000 km in a planet atmosphere?
What are the species that prefer to live on swamp planets?
Download Wrangler and create a recipe to answer any one of the questions above, tweet your answer to @trifacta with a screenshot of your recipe, and we’ll send you a special edition Trifacta t-shirt. If you need help with the dataset or have wrangling questions you can email me at jsilvers at trifacta.
While the dataset wasn’t Hadoop-sized, the wrangling was real. People had to extract data from JSON files and make sense of the alien lifeforms and fantastic spacecraft to answer both qualitative and quantitative questions
To wrangle the data, the teams needed to join, or blend, two or more datasets, aggregate and standardize data, derive new values such as average, and more. To make this challenge more fun, we offered prizes for the funniest and most creative answers as well as for the most correct responses in the least number of steps. One Trifacta employee even solved the Porg mystery.
The Triforce Awakens!
Off to the Movies
The Star Wars challenge we laid out for ourselves was part of a whole week of company festivities for the 2017 end-of-year holidays. We ended the week with the premiere of Star Wars: the Last Jedi, and from there to a holiday celebration to announce the winners of the Wrangling U challenge and have some fun.
Holiday movie time
Trifacta end-of-year party
The women of Trifacta celebrate
Celebrating with the Triforce