Technology startups wanting to “Change the World” have become something of a cliché in Silicon Valley. Some would say that this attitude reflects a naïve sense of optimism, but at Ayasdi, we not only want to change the world, we are doing it. Together with our collaborators. Incrementally. One scientific breakthrough at a time.
What is the key to enabling so many breakthroughs in such diverse disciplines? Our AI platform — leveraging Topological Data Analysis, or “TDA” — isthe innovation behind the solutions to all of these challenges where traditional technologies have failed.
TDA gives us the ability to handle high dimensional, unstructured/unlabeled data with weak signals that presents itself in a time series fashion. Or, you could say that TDA gives us the ability to analyze tremendous amounts of data — and how it evolves over time — to identify patterns and relationships that previously went unrecognized, and find solutions in answers to questions no one knew to ask.
The very purpose of data analysis and data science is hypothesis generation and hypothesis refinement. Our platform supports hypothesis generation by empowering subject matter experts with a platform that is simple to use, but delivers impressive scientific results.
As noted in our white paper, Understanding Ayasdi, we believe that for an application to be truly intelligent, it needs to meet several criteria. We call these criteria the five pillars of enterprise intelligence: Discover, Predict, Justify, Act, Learn. In this post, I will focus on the first three pillars – Discover, Predict, Justify – and how they have delivered a windfall of successful results for scientific researchers.
Why is discovery important? It is very difficult to confirm you are asking the “right” questions with any dataset, and this is especially true with complex datasets such as genomics data. In medical research applications, for example, Ayasdi’s approach does not require the development of extensive clinical hypotheses and can automatically map relevant patient subgroups based on advanced mathematical algorithms, providing researchers with answers to questions they didn’t even know to ask.
TDA uses unsupervised and semi-supervised machine learning to find patterns in data. U-BIOPRED used our artificial intelligence to discover a 1693-gene signature to meaningfully distinguish severe asthmatics from non-asthmatics and mild-to-moderate asthmatics. By segmenting the asthma population, the researchers hope to develop targeted treatments for patients who will respond to therapy. Such treatments have been effective in treating diseases that involve just a small number of genes, but it has been far more challenging to develop targeted medicines for conditions – like asthma – involving hundreds or thousands of genes. Ayasdi’s technology is well-suited to the challenge, given its ability to manage extremely high levels of complexity.
“Because asthma is a disease with a high variance in pathologies and is still not well understood, the ability to use the Ayasdi platform to drive unsupervised, multidimensional queries has been integral in accelerating our research,” said Dr. Timothy Hinks. “This progress has allowed our team to be less biased in generating hypotheses about the data. This has helped us focus on driving data-driven hypothesis that saves time and makes our work applicable to all healthcare workers treating asthma and similarly pathologically diverse diseases. Using Ayasdi, generating a network at an appropriate resolution to give significant insight takes only a few hours until insights can be gained. This gives a clear picture of the distinct groups of asthma we as clinicians see presenting to our severe asthma clinics, and will help with identifying subgroups for future clinical trials.”
This and other asthma research is published in The American Journal for Respiratory Critical Care Medicine, and in three different JACI articles – here, here, and here. The four studies, taken together, are helping researchers begin to build a clinical and genomic profile of patients with severe asthma. And this methodology is very applicable to other complex chronic diseases as well.
Evaluation of pathological heterogeneity using TDA with the aim of visualizing disease clusters and microclusters
TDA representation of hierarchical clustering of the severe asthma disease signature
Another advantage of TDA in the discovery phase is that it does not need labeled data but can readily incorporate it when available. Regarding diabetes, our collaborators at Mt. Sinai published a paper in Science: Translational Medicine on their work using TDA to identify previously unknown diabetes subtypes, identifying three distinct subgroups of type 2 diabetes in patient-patient networks.
Mt. Sinai has a large database that pairs the genetic, clinical, and medical record data of over 30 thousand patients. In addition to genomic sequencing data, the database also includes information about each patient’s age, gender, height, weight, race, allergies, blood tests, diagnoses, and family history. The team used a precision medicine approach to characterize the complexity of T2D patient populations based on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals, including genetic markers and clinical data, such as blood levels and symptoms. They were able to uncover hidden patterns in large and complex datasets, enabling research institutions to expedite biomarker discovery, segment disease types, and target drug discovery.
Patient-patient network for topology patterns on T2D patients
Leveraging unsupervised and semi-supervised machine learning, we are also able to identify previously unrecognized patterns in data. Upgrade Capital, as part of the Code in Finance program, used Ayasdi to help investors better understand the current state of financial markets by identifying analogous past states. Such an exercise is very challenging in traditional risk analysis, which tends to rely heavily on dimensionality reduction. Using Ayasdi, the researcher was able to tie together four decades of macroeconomic and market data in a loop, representing the economic cycle. To confirm the validity of this interpretation, he identified distinct regimes such as expansion, contraction, and recovery, and then confirmed that they generally follow each other in a consistent order.
Economic Cycles: Data assembles into distinct regimes. The economic cycle runs clockwise.
We generate relevant features for use in prediction tasks or find local patches of data where supervised algorithms may struggle. Why is prediction important? Our collaborators are not satisfied with discovery — they want to impact the world, and we help them predict the impacts of their discoveries.
TDA uses compressed representation to generate novel features , like time series datasets. Stanford microbiologists used our platform to predict whether an individual will recover from disease. Currently, there is little data collected when animals or humans recover from disease, so the potential business impact is significant and would vary by disease indication. This work was so substantial that it was published in two separate PLOS papers, and using these two papers, they won a $6M grant from DARPA to continue their pioneering work in understanding disease recovery. Using the unsupervised learning capabilities of our platform — distinguished by visualizations of disease maps using cross-sectional data — they mapped the way hosts loop through the disease space. TDA was the only technology that would have worked in this case. TDA produced graphs that are more obviously looping because the topological networks close these gaps. Stanford visualized the “disease space” traversed by infected hosts and identified the different states of the infection process. Resilient systems cannot be mapped by a tree and are better described by loops, which elude most analytical approaches, but don’t elude our platform.
Reconstructed human and mouse disease space maps from longitudinal and cross-sectional data
Using hierarchical models , TDA also discovers the structure of the data and builds distinct local models. In yet another groundbreaking example of our prediction capabilities, our collaborators at the University of Montana used Ayasdi to predict earthquakes of all things! Using Ayasdi and time series analysis, they demonstrated that large earthquakes appear to synchronize globally, in the sense that they are organized in time, according to their renewal properties, and occur in groups in response to very low-stress interactions. Major quakes appeared to cluster in time — although not in space – and the number of large earthquakes seemed to peak at 32-year intervals. The earthquakes could be somehow ‘talking’ to each other, or an external force could be nudging the earth into rupture.
A topological network [Carlsson, 2009] relating earthquakes (nodes) with similar renewal interval and date of occurrence using a variance-normalized Euclidian metric on two real-valued derived measures of event properties: L-infinity centrality and Gaussian density.
In another example of TDA’s use of compressed representation to generate novel features, UCSF used Ayasdi to discover insights into Spinal Cord Injury/Traumatic Brain Injury and Osteoarthritis. They verified that hypertension is a prognostic indicator of survival in Spinal Cord Injury / Traumatic Brain Injury — simply using drugs to lower hypertension immediately post-injury/pre-surgery could drastically improve outcomes. Using techniques designed for uncovering hidden relationships between large numbers of variables, UCSF retroactively mined old, “dark” data, discarded from an old clinical trial, potentially saving millions of dollars. In another Nature paper, UCSF used TDA to identify a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms — hence TDA could provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials.
Behavioral outcomes of forelimb function and histopathology were mapped onto the topological network using TDA
Methodological work-flow for integrating diverse clinical TBI data
Why is justification important? For prediction to have value it must be able to justify and explain its assertions and diagnose failures. TDA uses local information to provide justification, with the quality of the groups dictating the quality of the justifications.
Our collaboration program has demonstrated extraordinary potential to better the human condition. SumAll.org used Ayasdi to perform a quick systemic cluster analysis of a polio vaccination campaign for Syrian children. The data was provided by HumanitarianTracker.org and was complex in that it contained patient, temporal and geographical data — but was fluid, given the study area was an active warzone. Our software was able to interact with all of the available feature classes at once and the results were then visualized in such a way that any interesting behavior in the data could be quickly identified. By understanding where children are most likely not reachable for follow-up doses, SumAll.org was able to work with Humanitarian Tracker to evaluate a strategy to ensure that all individuals were receiving their doses. For humanitarian aid organizations strapped for time and resources, ease and efficiency of reliable statistics and reporting are critical, and in cases like this, Ayasdi can significantly reduce the investment needed for analysis.
SumAll.org identified two statistically distinct groups where Not Reached Doses were the largest defining factor, with statistically significant districts within each group
In the field of materials science engineering, EPFL researchers used our software to develop a pore recognition approach to quantify the similarity of pore structures and classify them in nanoporous materials, publishing in Nature. Quantifying similarity of pore structures allowed them not only to find structures geometrically similar to top-performing ones, but also to organize the set of materials with respect to the similarity of their pore shapes. This will be extremely useful to architects and construction companies.
Mapper plot of the best zeolites (top 1%) for methane storage
Columbia University published an analysis of the typology of viral evolution in PNAS, illustrating another example of TDA identifying relevant structure in real-world data that is invisible to classical techniques. Mathematical biology has traditionally represented evolutionary processes as a branching tree, modeling Darwin’s tree of life. That is, evolutionary networks have been depicted as tree-like, without any loops in the underlying networks that model time evolution. This paper shows such loops exist in real data, and that TDA is required to find and understand them. This is the first rigorous and systematic proof that such structures exist — and now it is clear that this is a ubiquitous phenomenon.
Linking algebraic topology to evolution
These are just a handful of examples in which our collaborators are leveraging TDA to change the world — one scientific..
Healthcare data interoperability is a known challenge. The fragmented nature of data in the provider and payer domains adds complexity to an already challenging picture. Further compounding the problem are the varying database systems deployed within organizations that capture cost, billing, pharmacy, orders, labs etc.
With a handful of vendors dominating both EHR and billing systems, there is little incentive to interoperate and develop a common standard. To address this challenge, innovative healthcare pioneers developed an open standard called FHIR (Fast Healthcare Interoperability Resources) to facilitate data sharing. The standard uses modern web technologies and opens the door for others to adopt and build on their work.
We quickly realized the potential of this standard to benefit our work and became an early adopter.
While advocates for FHIR, we encountered some limitations. As a result, Ayasdi has developed a FHIR “based” standard schema for its applications. This augmented approach underpins our award-winning clinical variation management application.
The standard FHIR elements are admitting diagnosis codes, encounter start, encounter end etc. They can be found in the specification here.
The areas that we needed to extend were around user-defined constructs such as direct cost, fixed cost, total cost or patient metadata (eg: BMI, risk scores, response variables etc.) The key is to make the addition of and mapping of these features simple and straightforward. By doing so, we facilitate the addition of these potentially valuable features, while simultaneously creating the flexibility to grow and expand down the road.
How exactly does Ayasdi use FHIR?
As part of the CVM application we typically follow a process of mapping customer data to the FHIR schema, followed by syntactic validation, transformation and platform load. The final step is semantic validation which is typically performed by a subject matter or domain expert.
Let’s consider a provider with an Epic medical record system. The EMR will have captured relevant encounters, interventions, and interactions with the patient. These elements will serve as the basis for our FHIR implementation.
For example, using the patient encounter data we can capture the start and stop time of encounter, the cost, length of stay and any other relevant outcome variable we choose. These are all individually mapped over to our FHIR schema to be used as the meta-data for analysis.
For the specific treatment path information, we can take medication, diagnostics, nutrition orders for example, and map over important columns, such as the class of medication (ie Analgesic, Antibiotic, Cardiac etc.) The classes could very well be ATC classes, NDC drug codes or any other system used by the provider. In addition, we typically we require the unique codes for each medication (eg oxycodone 5mg or cefazolin 2000 ML IV etc.) within a class. This process is repeated for each of the required FHIR resource files.
This mapping process is a one-time effort and can be leveraged for any type of episode, acute or non-acute. The data architect/analyst can create the necessary SQL queries and extracts to generate this data or they could choose to use an ETL tool accelerate this effort. Here is an example of sample worksheet an analyst would create to in order to populate the patient ENCOUNTER resource. The items in red are required, while all others are optional.
After all the files have been mapped, the extracts are validated against our schema. To do this validation we have built a simple utility called CARE-ETL. The utility takes the data, validates it, transforms it and loads it into the Ayasdi platform. We will take each one of these in turn.
The first step in validation is inherently visual. The screenshot below profiles each file, allowing the analyst to quickly identify any glaring issues, before running a detailed validation on the data.
This screenshot below shows a detailed resource by resource validation of the data files.
The transformation step is a critical component for analysis. When we think about multiple patients and their treatment paths, we need to consider how to align patients with each other. In order to do so, we must choose an “index” point which is selected during the ETL process. This index point could be the start of surgery, or the diagnosis of a condition, or the first admission to the hospital etc. After this choice is made, all patients pathways are calibrated to time zero as the index point.
The transformation step is also critical to finding windows of time for activity for each event category/class summary features are created. These summary elements are used by our core algorithms in the platform to create treatment groups and consensus carepaths.
The final step is a semantic validation step where the clinician or subject matter expert is involved to ensure the data is ready for analysis – this is done in the CVM application itself. Often we look a specific patient pathway to determine if it truly aligns with what the EMR records are telling us. In addition, we also examine if the mapping process successfully captured all our required elements.
Now the real fun begins as our unsupervised AI can find treatment groups to begin the discovery process!
In closing, we welcome the arrival of FHIR v4.0 and as we work with our customers we will consider augmenting the our schema as necessary to align with this latest version of FHIR. We continue to evangelize the need for our customers to become FHIR compliant as this will reduce (or maybe eliminate!) much of the fragmentation that is holding healthcare back from truly reducing costs and improving outcomes.
The team at Chartis Research is about as well regarded in financial services as you get. Their flagship report on the RiskTech100 is the bible for the industry and their ability to go deep on the issues AND the technology set them apart.
They have recognized Ayasdi consistently. We won their top spot in for AI in 2016 and 2017 and took the runner-up slot in 2018 (a three-peat would have been unprecedented).
The team over there continues to dial it up – and the 2019 Artificial Intelligence in Financial Services Market and Vendor Landscape is a superb example. Over 24 pages, they detail the best in Data Management, Workflow and Automation, Analytics and Packaged Applications.
We were delighted to make two of those quadrants. In Analytics we were one of just three companies in the Category Leader segment. This represents a tremendous achievement as we outperformed dozens of companies that are hundreds of times larger than us.
We also made the Packaged Applications quadrant as a Best of Breed company, just barely missing scoring another Category Leader slot. While we just missed this year we expect our growing portfolio of applications to move us up and to the right in 2020. That suite, enabled by our ever advancing Envision application development framework, will allow us to combine domain expertise with a library of UI widgets and machine learning recipes (segmentation + prediction, anomaly detection, hotspot detection, topological predict) across dozens of use cases. The force multiplier comes in the form of our partners and our customers. Envision is so simple that domain experts can turn the crank on applications, without having to lean on IT. This is significant for partners like Navigant, Parker Fitzgerald, Crowe and Accenture.
This is something to watch in the coming year. To read the full report requires a Chartis subscription – but its worth it.
The next big win for Ayasdi came courtesy of the team at Forbes. Each year they publish a list of the top 50 fintechs in the world. It is an exceptionally competitive list. There are literally thousands of companies in the fintech space from the giants like Stripe, Ripple and SoFi to emerging stars like Enigma.
Filtering through the PR pitches and marketing noise to find who actually has traction is a tall task, and Antione Gara and his crew have done an excellent job.
The team keyed in on our success in financial crime. This is an area where our technology has changed the equation, allowing for gains in efficiency and effectiveness while leaving the existing infrastructure untouched. We make everything else better.
This has been a HUGE month for us on the recognition front. From the Accenture HealthTech Challenge win to this amazing double it represents tremendous validation for the team and the technology.
The Final Round competition included 10 amazing startups – four from North America, two each from Europe, Asia and ANZ. The judging crew grew as well, to almost 50 judges from the world’s largest pharma companies, the biggest payers in the US and some of the country’s largest hospital systems. In addition, there were representatives from companies like Best Buy, GE Healthcare, and Medtronic.
The story we told was pretty simple but spoke to a complex problem. Here is a summary, including the slides we used:
The pitch started out with the size of the opportunity. Clinical Variation is an $800B+ problem in the United States alone. That number comes from the American Medical Association and generally covers all of those things that we do that does not improve the patient outcome or experience. We arrive at this figure by taking the AMA’s stats, pulling out administrative complexity and adjusting for the current cost of healthcare in the United States ($3T). It is a massive number.
More dramatically, that number has persisted for more than three decades – despite our knowing how to solve for it.
The answer is evidence-based medicine protocols. Evidence-based medicine works. No one in healthcare is debating that. The problem is we have never been able to scale it. The cost, time and effort to produce a care process model manually is extremely high (which is why the refresh cycle is around 4 to 7 years!).
Furthermore, the acceptance of those care process models (what we call adherence) is quite low, diminishing further the utility of the effort.
But advancements in machine intelligence and the availability of data change all that. They don’t eliminate the challenges, but they do make them far more manageable, and in doing so create the opportunity to reshape healthcare.
The challenge of clinical variation management lies in its inherent complexity.
Any healthcare episode can be broken down into component parts. How granular you go is a function of what you are looking for. For the purposes of this post, let’s keep it at a high level:
Events (every lab, test, order, incision, and suture, done inpatient or outpatient)
Sequences (you don’t put the new knee in until you have removed the old one)
Timing (you don’t administer the pain killer the day before the surgery, you do it 1 hour and 40 minutes prior)
When you combine the three it creates a picture of extraordinary complexity where the events, the sequence of those events and the timing of the sequence of events all conspire to confuse even the most competent clinician. Add to that co-morbidities and, most importantly the mission of the organization, and it becomes overwhelming.
What do we mean by the mission? Well, the truth is that while as an industry healthcare serves the patient, how health systems do so differ. Stanford, as a teaching hospital has a different mission than Kaiser, who in turn has a different mission than Sutter – and we haven’t even left the confines of the Bay Area. Mission matters. Payers who use our product have a different reason for doing so than providers. Their mission matters too.
Because the problem is complex our natural instinct is to simplify. The result is providers (and payers as seen here) generally develop something rather general – something high level, often using national literature as a guide. This destroys adoption and adherence. These amorphous care paths don’t reflect the organizational mission or their patient populations – and so clinicians ignore them, adding to the mountain of wasted work.
We then made the problem very clear for the judges (we had 10 groups of 6-7 including Accenture folks):
This data is from 1,315 patients who had total knee replacement at a large midwestern hospital system. The range, for an elective procedure for which complications are minimal, is stunning. And it is not just the range; when you dig deeper into the data you find big cost blocks (15% – 20%) that sit on either side of the average.
It tells a story, that this procedure is being practiced both consistently and unevenly.
So how do we attack the problem?
First, we start with the technology that has earned us two mentions on Fast Company’s World’s Most Innovative Companies list. That technology, topological data analysis, is THE state-of-the-art in unsupervised learning. Using it we can identify groups of patients that had similar treatment paths. By identifying those treatment paths we can present multiple candidate care process models for the clinicians to evaluate – often within seconds. Compare this to 8-10 doctors taking time over nine to twelve months and you start to understand how powerful this is.
Second, we solve the problem end to end. That means everything from ingesting the data using the FHIR standard to automating the creation of candidate carepaths, enabling deep inspection of the results through to publishing and tracking adherence.
Managing clinical variation is a whole organizational effort. The application needs to reflect the need of the data architect, the clinician and the administrator. That’s what our application does.
Finally, we put all the technology, all those features, into an intuitive user interface. Simple, but with tons of features within a click or two. The ability to set time windows and alternatives for adherence measurement. The ability to have user-defined variables. The ability to customize the list of co-morbidities the organization wants to track. The list is pretty amazing.
At that point, we dropped into the demo and then into questions (some of which were really detailed).
To see what we demoed (sans the ETL app) check out this video.
AYASDI Two Minute Tour of CVM - YouTube
That was it. Altering the trajectory of healthcare in nine minutes.
We used a slightly modified version for our final three minute, on-stage pitch which we included below.
Accenture’s HealthTech Challenge is a big honor and a massive platform from which to tell our story. We really look forward to taking advantage of it.
It has been a pretty good month for those who share our passion for clinical variation management and all things Ayasdi with awards, wins and recognition coming in from across the industry.
First, the news associated with our successful deployment at Flagler Hospital has reverberated throughout the industry, resulting in press coverage ranging from Healthcare Informatics to SearchHealthIT.
For those who are interested in hearing the story firsthand, Flagler’s CMIO did an entire webinar on the initiative. While the pneumonia results ($1,350 savings, 2.5 day LOS reduction, 7X readmission reduction) are the headlines, the fact that they are achieving these results without a single data scientist underscores the applicability of the application.
Second, we took the application on the road to the Accenture HealthTech Challenge where an esteemed panel of over thirty judges from providers to payers to lifesciences chose our solution as a global finalist – putting us through to San Francisco where we will compete against nine other companies from EMEA and APAC. Given the competition started with over 1,200 entries – we are humbled and flattered. Helping us to get to the finals was the fact that Clinical Variation Management is a $2T market opportunity and we have the best product ever made to address it. Interested in the deck we used? We posted it here.
Finally, our CVM application just took home Fierce Healthcare’s Innovation Award for the Clinical Variation Management application. The award was for the data analytics/business intelligence category and speaks to the power and usability of the application. In a nice bonus, we also won the Fiercest Cost Savings Award – speaking to our ability to carve billions out of the healthcare system while simultaneously improving patient outcomes.
This recognition isn’t the effort of marketing – it is the effort of engineers, product managers, data scientists and domain experts. They are the ones crafting the product into something that wins in the market and wins with those in the know. While we do it because we are passionate – a little recognition never hurt….
In their very provocative paper, Peter Battaglia and his colleagues, posit that in order for artificial intelligence (AI) to achieve the capabilities of human intelligence, it must be able to compute with highly structured data types, such as the ones humans deal with. This means that AI must be able to work with information about relationships between objects, which are effectively encoded as graphs, i.e. collections of nodes and edges between these nodes, and also attributes attached to the nodes and edges.
A simple example would be images, where one considers graphs which are shaped like a square grid, with the pixel values being the attribute attached to a particular node in the grid. The authors go on to describe the framework they have developed for computing with graph structured data, in particular describing how neural networks for this kind of computation can be built.
Embedded in this work is the following question:
Where do the graphs come from?
The question recognizes the fact that most data is “sensory” in character, i.e. it is obtained by measuring numerical quantities and recording their values as coordinates of vectors in a high dimensional space, and consequently do not come equipped with any explicit underlying graph structure.
It follows that in order to achieve human like intelligence in studying such data, it is necessary to construct graph or relational structures from sensory data, and also that performing such constructions is a critical ingredient in human intelligence.
The goal of this post is to describe a method for performing exactly this task, and to describe how neural networks can be constructed directly from the output of such methods. We can summarize the approach as follows.
A key ingredient is a geometry or metric on the set of features or sensors that are capturing the sensory data.
The second key ingredient is that of a covering of the set of features by subsets. The coverings typically are chosen to be well related to the geometry in a suitable sense.
The geometry and related coverings are used in conjunction to construct graph structures from the features defining the data.
They are also used to construct analogues of the pooling constructions used in convolutional neural networks, which permit the learning to be based on various levels of resolution.
We strongly agree with the point of view expressed in the DeepMind/Google paper that their work rejects the false choice between “hand-engineering” and “end-to end” approaches to learning, and instead argues for the recognition that the approaches are complementary and should be treated as such. What follows is that we should be developing a wide variety of tools and methods that enable hand-engineering of neural networks. The methods described in the DeepMind/Google paper are a step in that direction, as are papers on TDA-based Approaches to Deep Learning and Topological Approaches to Deep Learning.
We also observe that the benefits of such an approach go beyond improving the accuracy and speed of neural net computations. Once the point of view of the DeepMind/Google paper is adopted, one will be able to create algorithms which realize the idea of machine-human interaction. In particular, they will be more transparent, so that one can understand their internal functioning, and also provide more complex “human-readable” outputs, with which humans can interact. This paper on the Exposition and Interpretation of the Topology of Neural Networks constitutes a step in that direction.
Finally, we observe that in order to fully realize this vision, it is important not only to make the computation scheme human readable, but also the input data. Without a better understanding of what is in the data, the improvements described by Battaglia and his colleagues will only get us part of the way to fully actualizing artificial intelligence. Topological methods provide tools for understanding the qualitative and relational aspects of data sets, and should be used in the understanding of the algorithms which analyze them.
Graphs from geometry
In the case of images, the graph structure on the set of attributes (pixels) is that of a square grid, with some diagonal connections included.
Our first observation is that the grid is actually contained in a continuous geometric object, namely the plane. The importance of the plane is that it is equipped with a notion of distance, i.e. a metric in the mathematical sense (see Reisel’s work here). The set of vertices are chosen so as to be well distributed through a square in the plane, so that every point in the square is close to one of the vertices. In addition, the connections in the graph can be inferred from the distance. This is so because if we let r denote the distance from a vertex to its nearest neighbor (this is independent of what vertex we choose because of the regularity of the layout), then two vertices are connected if their distance is < √2r.
So, the graph is determined by:
(a) a choice of subset of the plane and
(b) a notion of distance in the plane
The value of this observation is that we often have sets of attributes or features that are equipped with metrics. Some interesting examples were studied in this paper, where a study of the statistical behavior of patches in natural images was performed. One of the findings was that there is an important feature that measures the “angular preference” of a patch in the direction of a particular angle θ. As we vary θ, we get a set of features which is geometrically laid out on a circle, which can be discretized and turned into the graph on the left below. A more detailed analysis showed a two-parameter set of features that are laid out in the form of a geometric object called a Klein bottle, shown with associated graph structure on the right below.
Analogous geometries can be found for 3D imaging and video imaging. The work in the Exposition paper has suggested larger spaces containing the Klein bottle which will apply to larger scale features and more abstract objects.
Purely data driven graph constructions
The constructions in the previous section rely on a prior knowledge concerning the features or a detailed data analytic study of particular sensing technologies. To have a method that is more generally applicable, without complicated data analysis, it is necessary to have methods that can quickly produce a useful graph structure on the set of features. Such a method exists, and is described in both the TDA paper as well as the Topology paper.
Suppose that we have a data matrix, where the rows are the samples or observations, and the columns are features or attributes. Then the columns are vectors, and there are various notions of distance one might equip them with, including high-dimensional Euclidean distance (perhaps applied to mean centered and normalized columns), correlation, angle, or Hamming distance if the values are Boolean. There are typically many options. By choosing a distance threshold, we can build a graph whose nodes are the individual features. If the threshold is small, the graph will be sparser than if we choose a larger threshold. So, at this point we have a method for constructing a graph from sensory data. Further, the graph describes a similarity relation between the features, in the sense that if two features are connected by an edge, we know that their distance (regarded as a dissimilarity measure) is less than some fixed threshold, and therefore that they are similar to each other.
This construction certainly makes sense and generalizes the construction that is done for images. However, it produces graphs which are large in the sense that they have the same number of nodes as we have features. They may also be dense in the sense that some nodes may have a very large number of connections, which is not ideal from the point of view of computation. We will address this problem via a form of compression of the graph structure.
To understand this construction, we will discuss another aspect of the deep learning pipeline for images, namely the pooling layers. We describe it as follows.
The “pooling” refers to pooling values of subsets of pixels, specifically smaller squares. The picture below describes the situation, where we have covered the pixel grid by nine smaller subgrids, each one two by two.
The idea of pooling is to create a new graph, this time three by three, with one node for each of the subgrids. The interesting point is that the creation of the new graph can be described in terms of the intersections of the subgrids only. Specifically, the nodes corresponding to two subgrids are connected by an edge if and only if the subgrids have at least one pixel in common. The value of this observation is that this criterion generalizes to the general situation of a geometry on the space of features, provided that we can produce a covering analogous to the covering of this square by subgrids. We refer to the collection of subgrids as a covering of the set of pixels, where by a covering of a set X we mean a family of subsets U of X so that every element of X is contained in some member of the U covering.
Topological data analysis provides a systematic way of producing such coverings, using a method referred to as Mapper, which is described in the TDA and Topology papers.
The output of this construction will be a family of graphs , ⌈1…,⌈r with the following properties.
Each vertex in each graph corresponds to a family of features.
The collections corresponding to vertices are small or localized in the assigned metric on the set of features.
Each graph is low-dimensional in the sense that the number of vertices that lie in a clique is bounded above.
The size of the sets of features corresponding to vertices grow as we move from ⌈1 to ⌈r . In other words, the graphs become coarser and coarser representations of the set of features.
The construction yields a systematic way of producing graph representations of the sensory data available from the data matrix. It turns out that one can go directly from this representation to neural network architectures.
In fact, one could even imagine incrementally producing coverings using Mapper on the activation-values of the layers following the input layer. Occasionally recomputing the coverings and therefore the graph structure would allow the network to adapt its graph structure during training.
Constructing neural networks
The circular constructions given above can give some very simple geometric constructions of neural network architectures adapted to them. For example, in the case of the circular networks described above, a picture of such a construction would look as follows.
Similar constructions are available for the Klein bottle geometries and others.
The construction of neural networks from the purely data driven constructions above is also quite simple. Since each node of each graph corresponds to a set of features from the data matrix, we can declare that a node ν in ⌈r is connected by a directed edge to a node w in ⌈r+1 if and only if the collection corresponding to ν and the collection corresponding to w contain at least one feature in common. This will describe a connection pattern for a simple feed-forward network with the nodes of ⌈r as the set of neurons in the r-th layer. A picture of a simple version of this construction is given below.
We have given an approach to assigning graph theoretic information to sensory or continuous data. It depends on the assignment of geometric information in the form of a distance function on the set of features of a data set. There are many constructions based on the mathematical discipline of topology (the study of shape) that should be useful for this task. The methodology should yield the following:
Improvement of the transparency of the algorithms, since the sets of neurons are now organized in understandable geometric patterns. In particular, coloring the geometric structures by activation values of neurons will make clearer the functioning of the network.
Because the topological model constructions have nodes that deal with groups of features, it is to be expected that using such models will help reduce the overfitting often occurring in such models. Preliminary experiments suggest that this does in fact occur.
Structures that lend themselves to topological data analysis may be classified and their topological properties might have interesting and useful interpretations. For example, it has been shown that the spatial filters of CNN’s arrange themselves in simple circles and that these are indicative of a network’s ability to generalize to unseen data. In addition, the persistent homology of the activation maps of CNN’s can be used to detect adversarial examples.
The Accenture HealthTech Innovation Challenge is a big deal. Run globally, the program attracted in excess of 1,200 applications this year. The world is broken into three groups EMEA, Asia and the Americas. There is a lengthy application process which sees the team at Accenture and their stellar lineup of judges winnow down each region to 12 finalists. Those finalists then compete in what Accenture calls the Demolition Derby where each company presents to 6-8 judges in 15-minute blocks. Given there are 8 blocks, this is both a sprint (15 mins is not a lot to tell your story) and a marathon.
While we were humbled to have made the cut for the Boston round where we competed against an extraordinary group of companies. We were even more humbled to be selected as a finalist for San Francisco, where we will go up against our fellow finalists from Boston, plus six teams from Tokyo, Sydney and Dublin rounds.
One of the reasons we made it to the final is that we are attacking a massive problem in healthcare – clinical variation management. When we say massive, we really mean massive – $812B annually in the US alone. Add another $1T for the rest of the world (using 5% of GDP for healthcare vs the 18% in the US). That’s closing in on $2 trillion per year for labs, tests, diagnostics, medications and other care that didn’t improve the patient outcomes – and in many cases diminished it.
Managing Clinical Variation is notoriously complex. How else could a $812B a year problem persist for the better part of three decades?
That’s right, we have been working the clinical variation problem for almost 30 years – ever since the AMA first started to aggressively advocate for evidenced-based care guidelines. The effectiveness of evidenced-based care is not in question, it has been proven out in hundreds of peer-reviewed studies. Better outcomes, lower costs. The pillars of the value-based care movement.
So if we know how to do it, why aren’t we doing it?
The answer lies in scale. The cost, time and effort to produce a care process model manually is extremely high (which is why the refresh cycle is around 4 to 7 years!). Furthermore, the acceptance of those care process models (what we call adherence) is quite low, diminishing further the utility of the effort.
Let’s examine both of these separately:
The Complexity of the Problem
Any healthcare episode can be broken down into component parts. How granular you go is a function of what you are looking for. For the purposes of this post – let’s keep it at a high level:
Events (every lab, test, order, incision and suture, done inpatient or outpatient)
Sequences (you don’t put the new knee in until you have removed the old one)
Timing (you don’t administer the pain killer the day before the surgery, you do it 1 hour and 40 minutes prior)
When you combine the three it creates a picture of extraordinary complexity where the events, the sequence of those events and the timing of the sequence of events all conspire to confuse even the most competent clinician.
The result is providers (and payers as seen here) generally develop something rather general – something high level, often using national literature as a guide.
The Challenge of Acceptance and its Impact on Adherence
Care path adherence is an a particularly thorny problem. Doctors own the patient relationship. The care path doesn’t own the patient relationship, nor does the machine. Doctors know what is best for their individual patients. They understand the exceptions, the co-morbidities, the family history, the patient’s preferences.
As a result, they often deviate from the care process model. Often it is warranted and at times is innovative. Often it is not warranted and is a result of habit, or in the rare case financial gain.
When a care path is too broad it will get dismissed.
When a care path is based on dissimilar cohorts it will get dismissed.
When a care path is based on too small a sample size it will get dismissed.
When a care path looks more like the opinion of a single doctor it will get dismissed.
When a care path doesn’t reflect the objectives of the organization it will get dismissed.
Only when a care path can fulfill all of the dismissal reasons (plus some others that I have likely missed) will it get accepted.
The Opportunity: Building Acceptable Care Pathways at Scale
To create acceptable care process models at scale requires a few key elements:
Use the hospital’s own data. By using the hospital’s own data one interacts with that hospital’s patient population – not some amorphous national standard. Building care process models with the hospital’s data also incorporates the work of the doctors at the hospital – ensuring that how they practice medicine gets built into the care process models.
Incorporate everything. To generate a great care path, there will need to be multiple databases involved, the EMR, billing,, pharmacy. This requires technology (FHIR for example) but the results have higher resolution and superior explainability.
Transparency. Simply presenting a care process model, no matter how granular, without showing what produced it will result in dismissal. Every step of every care process model must be open to inspection – why is it here, what are the stats, what is an acceptable substitute?
Model the mission. Every provider and payer has a slightly different set of objectives that can result in a large variance in how they practice medicine. Consider knee replacement surgery. A teaching hospital will approach it far differently than a for-profit hospital and different again from a faith-based hospital. Presenting candidate care paths that reflect the various missions puts the physician in charge of making the right decisions.
Make it granular. Care process models are difficult and time consuming and as a result, most are “one size fits all.” This is not inherently problematic, however, it would be superior if you had a care process model for 55+, active females getting a total knee replacement and another for 70+, inactive, overweight males.
The product we demonstrated in Boston delivers against much of what is outlined here – compressing man-years into hours and uncovering good variation (innovation) alongside bad variation. More importantly, the product has every bell and whistle all packed into a single application interface that was designed to be used by doctors, not data scientists (although they love it too).
Achieving the elements outlined above can deliver exceptional results. At Flagler in St. Augustine, Florida they applied our software to pneumonia resulting a care path that saved them $1,350 per patient ($850K per year), reduced LOS by 2.5 days and reduced readmissions by 7X. The result was a 22X ROI for the hospital. With a care pathway per month for the next 18 months, they expect to save over $20M while simultaneously improving the quality of care they deliver to their patients.
It is an amazing story underscores the size of the opportunity. If a 335-bed hospital can save $20M what can NY Presberterian do?
So wish us luck as we prep for the SF round. They say it is an honor to be nominated, but we would like to win – and take advantage of the platform it provides to make healthcare better, across the board.
The appeal of forecasting the future is very easy to understand, even though it is not realizable. That has not stopped an entire generation of analytics companies from selling such a promise. It also explains the myriad methods that attempt to give partial, inexact, and probabilistic information about the future.
Even if they could deliver on a crystal ball, such a capability would obviously have enormous consequences for all aspects of human existence. In truth, even small steps in this direction have major implications for society at large, as well as for specific industries and other entities.
Chief among the methods for prediction is regression (linear and logistic), often highly tuned to the specific environment. There are dozens of others, however, including deep learning, support vector machines and decision trees.
Many of these methods have as their output the estimate of a particular quantity, such as a stock price, the outcome of an athletic contest or an election, either in binary terms (who wins and who loses) or in numerical terms (point spread). Sometimes the output even comes with an estimate of the uncertainty, for example in political polling where margin of error is often quoted. The existing methods clearly do have value, but this post concerns itself with two claims:
Prediction is not always ideal for what we really want to know
Prediction is often enhanced, sometimes materially by combining it with unsupervised learning techniques
Let start with the first premise, that prediction may not be the right technique in certain circumstances.
The output of a prediction methodology is often used as input to an optimization algorithm. For example, predictions of stock prices are often used in portfolio optimization, prediction of outcomes of athletic contests are used in optimizing a betting portfolio, and results of election polling used to determine the choices made by political parties in supporting various candidates. There are many powerful methods for optimization, and applying the predictive information obtained by these methods is often a straightforward process. However, in order to use them, it is required that one starts with a well-defined objective function to which one can apply optimization methods in order to maximize or minimize it.
Herein lies the challenge.
In many situations it is unclear what defines a good objective function. Further, our attempts to create a proxy for an ill-defined objective function makes matters worse (for example, sometimes we choose a function that is easy to define and compute with, but that doesn’t make it good, it just makes it easy). Let’s consider a few examples:
We don’t know the objective function: Political parties often view themselves as optimizing aggregate societal well-being, but in fact it is very difficult for us to quantify societal well-being in terms of a single number. For example, one might believe that per capita income is a reasonable proxy. On the other hand, it could be that inequality of incomes diminishes societal well-being even in situations where every person’s income is increased. While one’s views on this may be related to their political orientation – we simply don’t know the answer. Similarly, airlines may optimize their dollar cost per passenger, resulting in less space per passenger and less flexibility in scheduling flights. It is easier mathematically to focus exclusively on cost, but it might be better in the long term to optimize the customer experience with the goal of driving revenue. Again, there is no clear answer. .
We disagree about the objective function: Suppose that we are trying to optimize health outcomes for individuals. One person might feel that an objective function that maximizes expected lifespan is entirely appropriate. Such an objective function is easy to obtain mathematically and statistically. On the other hand, another person, who likes beer, would not feel that an objective function that focuses entirely on expected lifespan is appropriate for him/her, since it might be the case that beer-drinking would lower (marginally) his or her expected lifespan. The tradeoff of a potentially somewhat shorter lifespan versus the pleasure of drinking beer might very well be appropriate for that person.
We care most about minimizing very bad outcomes: In many situations, it is not so important to us find optimal strategies as it is to ensure that nothing completely catastrophic occurs. In financial portfolio management, if one is working with a client who is nearing retirement, it may be much more important to avoid significant losses than it is to optimize growth. Similarly, if one is treating patients suffering from heart disease, it is of much higher priority to avoid disastrous outcomes such as infection and/or death than to optimize for length of stay or to minimize the expenses.
In all these situations, the notion of prediction becomes significantly more difficult when the role of the objective function is unclear or debatable.
Because we are not in a position to define an objective function, we cannot rely exclusively on optimization techniques to solve the problems that face us. This means we need to take a step back and understand our data better so that we might guide ourselves to better outcomes.
This is where unsupervised learning comes in.
Unsupervised learning allows us to explore of various objective functions, find methods that permit the adaptation of an objective function to individual cases, and search for potentially catastrophic outcomes.
These tasks cannot be resolved by methods whose outputs are a single number. They instead require an output with which one can interact in addressing them, and therefore is equipped with more information and functionality than simple binary or numeric outputs. It is also vitally important that the methods we use let the data do the talking, because otherwise our natural inclinations are to verify what we already believe to be true, or to favor an objective function we have decided on, or to decide that we already know what all the relevant factors in an analysis are.
This is, in many ways, the underappreciated beauty in unsupervised approaches.
OK, what we have established so far is that:
Prediction needs a numerical objective function
For a variety of reasons, an objective function may be hard to come by
Unsupervised learning can help define the objective function by providing more insight into the structure of the data
So what are the capabilities required of such unsupervised methods?
Defining and analyzing subpopulations of data sets: In analyzing the effect of various objective functions, it is critical to understand their effect on distinct subpopulations within the data. For example, certain government policies may have widely different effects on various populations. It might be the case that certain governmental policy prescriptions provide an aggregate improvement for the population as a whole, but are disastrous for members of some subpopulations, and should therefore be rejected. In many situations, the subpopulations or the taxonomy of the data set are not defined ahead of time, but need to be discovered. It is therefore important that the unsupervised method provide techniques for discovering subpopulations and taxonomies. This capability is very important for the situations identified earlier – namely where we don’t know or agree up on the objective function.
Location of “weak signals” in data: When we are focused on minimizing bad outcomes, it is often the case that those very outcomes appear in past data as anomalous outcomes or outcomes which occur rarely (we call them weak signals), and then grow into stronger phenomena over time. This kind of situation is not captured in many prediction methods, since the stronger signals tend to drown out the weaker signals in the output of these methods. What is required is a methodology that performs a systematic kind of anomaly detection, and, is rich enough that it can include both strong signal and weak signal in one output. Simple numerical methods cannot have this capability. In general, it requires a method that accommodates both analysis and search of the data set.
Unsupervised analysis over time: Of course, the temporal information about data sets plays a critical role in any kind of prediction. It is often incorporated using algebraic regression methods, but for the kind of problems we are discussing, it is important to understand time dependent behavior in unsupervised analyses. For example, populations may increase or decrease over time in their proportion of the total data, they may split into smaller populations, and they may also merge. This kind of information is important in making any kind of predictions and optimizations based on the unsupervised information.
When we can deliver against these challenges, the performance of the prediction improves materially.
For example, we are working with a major manufacturer on a problem of particular value to them. They make very complicated widgets and failure rates are paramount. Every .01 percent change has tens of millions of dollars in bottom line impact. Their prediction capabilities had tapped out and no longer yielded valuable information. By employing unsupervised approaches, the company determined a key piece of information, namely that there were many different kinds of failure and those failures were not distributed evenly across their factories. This allowed them to take immediate steps to improve their failure rates.
So what are some techniques to create this one/two punch of unsupervised learning and supervised prediction?
There are many notions of unsupervised analysis, including clustering, principal component analysis, multidimensional scaling, and the topological data analysis (TDA) graph models.
The TDA models have by far the richest functionality and are, unsurprisingly, what we use in our work. They include all the capabilities described above. TDA begins with a similarity measure on a data set X, and then constructs a graph for X which acts as a similarity map or similarity model for it. Each node in the graph corresponds to a sub-collection of X. Pairs of points which lie in the same node or in adjacent nodes are more similar to each other than pairs which lie in nodes far removed from each other in the graph structure. The graphical model can of course be visualized, but it has a great deal of other functionality.
Segmentation: From the graph structure, one is able to construct decompositions of X into segments, each of which is coherent in the sense that includes only data points that are reasonably similar. This can be done manually or automatically, based on graph theoretic ideas.
Coloring by quantities or groups: Suppose that we have a data set, equipped with a particular quantity q we might be interested in, such as survival time in a biomedical data set or revenue in a financial data set. Then we can attach a value to each node by forming the average of the values of q for all the data points in X in the group attached to the node. By assigning a color scale to the values of q, we obtain a coloring of the graph model, which yields a great deal of insight into the behavior of q. Here’s an example. The graph below is a model of a data set of roughly 200 responses to a survey that asked questions about various topics, including trust in societal institutions, right/left political preference, and membership in various groups. It is colored by the answer (on a scale of one to ten) about the respondents’ right/left political preference, with low numbers (blue) corresponding to left preference and large numbers (red) to right preference.
Similarly, if one has a predefined group G within X, one can assign to each node v the percentage of points within the collection corresponding to v which also belong to G, and also color code that percentage. Below is the same graph we saw above, but now colored by percentage of respondents who belong to a labor union (red is concentrated union membership).
This gives an understanding of the distribution of the group G within X, which can be very informative in designing objective functions. One can observe, for example, that the heavy concentrations of labor union members does not overlap with the darkest red regions obtained by coloring by right/left preference.
Anomalies and weak signal: TDA models have the ability to surface phenomena that have small extent and relatively small effect. For instance, one might detect a very small group of people whose income is greater than others but whose profiles are similar to theirs. Their income might not be very high relative to all the members of the data set, but the fact that they are higher than their “neighbors” in the model might be quite interesting. Similarly, one might see that a small number of credit card transactions that appear different from previously observed transactions but that sufficiently similar to each other to invite additional scrutiny – indeed they may be group which might be growing in the future.
Explaining phenomena: Given a group G within a data set X, it is important to understand what characterizes the group as distinct from the rest of X. This can be formulated in terms of statistical tests to produce ordered lists of features in the data set that best differentiate between the group and the rest of the population. This is another capability that is extremely useful in designing objective functions.
To summarize, it is fundamentally important to move from the idea of pure numerical or binary prediction to methods that permit one to have a better understanding of ones underlying data in order to design better objective functions and capture small and weak phenomena, that may eventually become very significant. TDA provides a methodology that is very well suited to these requirements.
Of all the industry groups we engage with, ACAMS is different. The mission focus, the educational framework, and the overall community make its events must attend for practitioner and vendor alike. The Las Vegas flagship event dominates the fall calendar and so we were back in sin city again this year – but with an even bigger team of domain experts. Our continued success in financial crimes has attracted some of the best and the brightest in the industry and we wanted to open by introducing that team:
Leading the group is our VP and Global Head of Financial Services, Doug Stevenson. Doug is a certified anti-money laundering specialist (CAMS) and certified financial crimes investigator (FCI), Doug joined Ayasdi from Pitney Bowes where he served as the General Manager of their Global Financial Crimes and Compliance business unit.
Doug has assembled an amazing team of fellow financial crimes specialists.
Those include David Brooks who formerly ran the N.A. Financial Crimes Practice for Capgemini. His rich experience includes leadership roles at NICE Actimize, Fortent, and Mantas.
The team also includes Chief Architect, Sridhar Govindarajan who, like Doug, hails from the successful Pitney Bowes team. Prior to Pitney Bowes, Sridhar worked in AML and architecture roles at Citi, Deutsche Bank, and Credit Suisse.
The data science lead for financial crimes is also a recent addition. Lei “Ray” Mi comes to Ayasdi from HSBC where he was the Vice President for AML Transaction Monitoring and Global Risk Analytics. In that role, Ray pioneered the use of machine learning and AI for HSBC’s AML function and was a key participant in the creation of the bank’s center of excellence in AI and Machine Learning.
When you have this much talent, you generate a lot of great thinking. That was on full display when we got together as a team to talk about this year’s event. Here are the takeaways on the ACAMS Vegas show.
1. Artificial Intelligence is Going Mainstream
Though many vendors are touting the use of Artificial Intelligence (AI) in their AML offerings, production deployment remains the litmus test of acceptance. Compared to previous years, we found a far greater appetite for and far more financial institutions (FIs) actively experimenting with Artificial Intelligence in their AML programs. Though the number of FIs actual moving from AI experimentation to production is still modest, we expect this number to grow considerably in the coming quarters.
This growth is a function of two drivers.
First, FIs are becoming more sophisticated and are actively sharing information and know how within the community – which is to be expected in a community who doesn’t really “compete” with each other as they do in other lines of business. This manifests itself as a more informed AML buyer. In the past, discussions would revolve mainly around high-level subjects, such as the definition of AI and the differences between supervised and unsupervised learning. However, this year, AML experts wanted to dive deeper and learn more about specific ways in which AI was solving real problems in AML programs. This only gives us more confidence that AI adoption in AML will begin to boom in the coming years.
Second, the vendor community is becoming more adept at building deployable solutions – applications vs. use cases. This allows these POCs to move from the innovation area to the business far faster than before.
Smarter buyer + better product = more deployments of intelligence in financial crimes.
2. Segmentation as Lens in Behavior
Over the last two years, Ayasdi led the dialogue in the AML community from tackling money laundering with a traditional (and increasingly ineffective) rules-based approach vs. a more effective (and efficient) customer behavior-based approach.
This message is gaining widespread adoption amongst both FIs and vendors – with Verafin’s panel correctly stating, “setting a rule doesn’t indicate a behavior.” Foundational to any strong AML program is a customer’s behavior – not a static risk rating or rules-based segmentation.
Much of the work we are doing in this space revolves around collaborating with the FI to determine a) the right data b) the right features c) the resulting customer segments that accurately reflect behavior.
As we do this work, it is clear to the FIs (and the regulator) that accurate segmentation is critical to a successful AML program. It should also be noted, that while accurate segmentation is imperative if you can’t explain it, it borders on useless. This is one of the knocks on AI in AML and another area where Ayasdi has been a pioneer.
3. Change in Behavior is Key
After speaking with one AML/BSA officer at ACAMS, he looked at us and said “Wow! That is the view I’ve always wanted to have!” What we had shown him was how Ayasdi’s AML offering could take a customer behavioral segmentation and add the variable of time to see how customer’s behaviors changed over a certain period.
This is very important and will define the conversation in the coming years.
As noted, an accurate customer segmentation provides a lens into customer behavior. How that behavior changes over time is even more valuable – and not just to the AML team. If a customer moves from one group to another – there is information encoded into that move. Did the customer become riskier to the FI? Did they change how they used certain products and with what frequency? These are powerful insights that have eluded the industry – primarily because of the complexity involved in the challenge.
The implications are profound for the $80M a year KYC refresh component. Change in behavior can make this more effective and efficient – eliminating diligence on those customer’s whose behavior is the same while initiating diligence on those whose behavior has changed in a material fashion.
These are exciting times for the financial crimes community. The technology has never been better, the talent level never higher.
We see progress everywhere – with our clients, our partners like Navigant, the talent joining our team. We look forward to our ACAMS webinar in January where we can expand on the subject of change in behavior.
While there is much teeth-gnashing over the hype associated with AI, the fact of the matter is that it is here to stay.
While it still needs to grow into its lofty expectations (particularly in the enterprise) there is a growing body of evidence that the small wins are stacking up. One area where AI is particularly challenged is in regulated industries. Regulated industries have the same complexity as other industries but have the added dimension of requiring human explainable models. This is not to suggest they are simple models, they can be complex, but they need extreme transparency and explainability (something we call justification) for the regulator to sign off on their use.
Ayasdi is a pioneer in justifiable AI and our underlying technology supports it on an atomic level (the machine did this because of that…for any action). It is why the team at Basis Technology asked us to contribute to their handbook on integrating AI in highly regulated industries. The report covers AI from a macro perspective, the regulators perspective and the practitioner’s perspective and is eminently consumable at just 57 pages. So hop over, fill out the form and thanks us later….