Data Science Association blog
477 FOLLOWERS
The Data Science Association is a non-profit professional group that offers education, professional certification, a "Data Science Code of Professional Conduct" and conferences / meetups to discuss data science (e.g. predictive / prescriptive analytics, algorithm design and execution, applied machine learning, statistical modeling, and data visualization).
Data Science Association blog
3y ago
The original literal Moore's Law was just that the number of transistors would double every two years. Physics limits that so the continuation of progress requires thinking outside the box. Below is a list of technologies that can help that:
Near-term
3D chips
NVRAM as main computer memory
Medium term
Optical computing
Quantum computing
Neuromorphic/analog computing
Alternative RAM
Memristor
Phase-change memory (PCM)
Spin-transfer torque (STT)
Magneto-resistive
Non-silicon substrates
Gallium arsenide
Silicon germanium
Carbon nanotubes
Graphene
Homogenization/deep integration of compu ..read more
Data Science Association blog
3y ago
The Washington Post reported that scientists discovered skipping breakfast leads to weight loss after all, not to weight gain as previously believed. What lead scientists astray previously was relying on observational studies, a.k.a. Quasi-Experimental Design. Only a randomized trial, the "gold standard", can establish causality.
Designing and conducting a randomized trial is extremely expensive, and many data scientists have not had the privilege of being involved in one. But by relying on already-collected data, which incidentally often falls in the category of Big Data, data scientis ..read more
Data Science Association blog
3y ago
Brain
Artificial Neural Network
Asynchronous
Global synchronous clock
Stochastic
Deterministic
Shaped waves
Scalar values
Storage and compute synonymous
Storage and compute separate
Training is a Mystery
Backpropagation
Adaptive network topology
Fixed network
Cycles in topology
Cycle-free topology
The diagram of biological brain waves comes from med.utah.edu and the diagram of an artificial neural network neuron comes from hemming.se
The table above lists the differences between a regular artificial neural network (feed-forward non-spiking, to be specific) and a biologi ..read more
Data Science Association blog
3y ago
Artificial General Intelligence (AGI), also known as Strong AI, distinguishes itself from the more general term AI by specifically having its a goal human-level or greater intelligence. There have been many attempts to achieve it. Pei Wang's page Artificial General Intelligence -- A Gentle Introduction has the best curated set of links. Various people have categorized the numerous approaches in various ways, but here is my categorization:
Neural
The neural approaches, sometimes called "connectionist", try to imitate the human brain in someway.
1. Neuroscience
The neural approaches inspired b ..read more
Data Science Association blog
3y ago
Remember the "business rules" craze of the early 2000s? They were popular especially with mortgage lenders. An example ILOG JRules decision table for mortgage lending.
Although sometimes imbued with some kind of magic reasoning (e.g. backtracking), business rules engines were really just simplistic decision trees under the covers. The real main advantage of a business rules engine at the time was that it exposed business logic in a form digestable and specifiable by business analysts and decision makers, as opposed to locking it up in Java or C++.
But now we are in the era of machine learnin ..read more
Data Science Association blog
3y ago
No doubt you've encountered the image below from Gizmodo in some PowerPoint somewhere this year.
But that same PowerPoint likely didn't bother to answer the next logical question:
How to get to causality?
It's not an easy question to answer. Having really, really good correlation is definitely not the answer. First a couple of counterexamples.
Common ancestral cause
Putting aside spurious correlations such as the one above, the much more common scenario is that of a common cause, such as shown below. Finding the correlation of "street wet" and "hair wet" in some data set does not lead to the ..read more
Data Science Association blog
3y ago
Somewhat of a sequel to my earlier post on causality, where do hypotheses come from?
The ideal hypothesis:
Has basis in a reasonable engineering, physical, or economic, etc. model.
Is as simple as can be in terms of number of variables. I.e. Occam's Razor has been applied.
Either has been vetted against a number of other hypotheses and selected as the most reasonable, or will be tested along with other reasonable hypotheses.
Will be tested in the gold standard, the randomized controlled experiment.
Is actionable.
Real life is not ideal, so below I discuss compromises and trade-offs inv ..read more
Data Science Association blog
3y ago
We see it all the time when reading scientific papers, "controlling for confounding variables," but how do they do it? The term "quasi-experimental design" is unknown even to many who today call themselves "data scientists." College curricula exacerbate the matter by dictating that probability be learned before statistics, yet this simple concept from statistics requires no probability background, and would help many to understand and produce scientific and data science results.
As discussed previously, a controlled randomized experiment from scratch is the "gold standard". The reason is beca ..read more
Data Science Association blog
3y ago
When constructing a recommender system and selecting algorithms, there is more to consider than just "accuracy". The most "accurate" recommender system would recommend the same items (whether those "items" are books, websites, options available to a software end user, etc.) over and over again, focused on a narrow topic area, and ignorant of context. Below are features of various recommender systems that, if combined, would perhaps form the ideal recommender system to produce "useful" rather than "accurate" results. However, in reality, some of these features are at odds with one another, suc ..read more
Data Science Association blog
3y ago
A Business Insider infographic 20 cognitive biases that screw up your decisions went viral this past week. Each and every one of those 20 biases can negatively affect data scientists in their work:
Ancorhing bias. A data scientist might find an interesting result in early exploration, and ignore other possible results or worse, ignore conflicting information.
Availability heuristic. As I blogged in If All Your Data is Big Data, You May Not Be a Complete Data Scientist, data scientists all too often rely only on pre-collected data and do not conduct randomized controlled experiments of their ..read more