Data Mining in MATLAB

4 FOLLOWERS

I am a data scientist with more years of experience than I care to remember. I've worked in a variety of fields and used a wide array of tools, but MATLAB is my tool of choice. This blog is about exploring data mining using MATLAB (and sometimes MATLAB Toolboxes).

Data Mining in MATLAB

2y ago

Oddsmakers provide a useful prediction mechanism for many subjects of interest. Beyond sports, they host prediction markets for events in politics, entertainment, current events and other fields. There are some subtleties, however, in converting payout odds to probabilities. Note that that payout odds (also called payoff odds or house odds) are expressed in this article as fractional odds (there are several other popular formats used in casinos and on-line bookmakers, such as decimal odds and American odds: they simply indicate the same information a different way).
Payout ..read more

Data Mining in MATLAB

4y ago

The relentless improvement in speed of computers continues. While some technical barriers to this progress have begun to emerge, exploitation of parallelism has actually increased the rate of acceleration for many purposes, especially in applied mathematical fields such as data mining. Interestingly, new, powerful hardware has been put to the task of running ever more baroque algorithms. Feedforward neural networks, once trained over several days, now train in minutes on affordable desktop hardware. Over time, ever fancier algorithms have been fed to these machines: boosting, support vector m ..read more

Data Mining in MATLAB

4y ago

Harrisburg University of Science and Technology (Harrisburg, Pennsylvania) has just finished hosting Data Analytics Summit III. This is a multi-day event featuring a mix of presenters from the private sector, the government/government-related businesses and academia which spans research, practice and more visionary ("big picture") topics. The theme was “Analytics Applied: Case Studies, Measuring Impact, and Communicating Results". Regrettably, I was unable to attend this time because I was traveling for business, but I was at Data Analytics Summit II, which was held in December of 2015. If y ..read more

Data Mining in MATLAB

4y ago

Recently, I wanted to calculate the distance between locations on the Earth. Finding a handy solution, I thought readers might be interested. In my situation, location data included ZIP codes (American postal codes). Also available to me is a look-up table of the latitude and longitude of the geometric centroid of each ZIP code. Since the areas identified by ZIP codes are usually geographical small, and making the "close enough" assumption that this planet is perfectly spherical, trigonometry will allow distance calculations which are, for most purposes, precise enough. Given the latitude and ..read more

Data Mining in MATLAB

4y ago

Below are listed four books on statistics which I feel are worth owning. They largely take a "traditional" statistics perspective, as opposed to a machine learning/data mining one. With the exception of "The Statistical Sleuth", these are less like textbooks than guide-books, with information reflecting the experience and practical advice of their respective authors. Comparatively few of their pages are devoted to predictive modeling- rather, they cover a host of topics relevant to the statistical analyst: sample size determination, hypothesis testing, assumptions, sampling technique, etc ..read more

Data Mining in MATLAB

4y ago

Categorical variables as candidate predictors pose a distinct challenge to the analyst, especially when they exhibit high cardinality (a large number of distinct values). Numerical models (for instance linear regression and most neural networks) cannot accept these variables directly as inputs, since operations between categories and numbers are not defined. It is sometimes advantageous (even necessary) to re-code such variables as one or more numeric dummy variables, with each new variable containing a 0 or 1 value indicating the presence (1) or absence (0) of one distinct value. This often ..read more

Data Mining in MATLAB

4y ago

In "Teaching Data Science in English (Not in Math)", the Feb-08-2016 entry of his Web log, "The Datatist", Charles Givre criticizes the use of specialized math symbols (capital sigma for summation, etc.) and Greek letters as being confusing, especially to newcomers to the field. He offers, as an example, the following, traditional definition of sum of squared errors:
He suggests that "English" (pseudo-code) be used instead, such as the following: residuals_squared = (actual_values - predictions) ^ 2 RSS = sum( residuals_squared ) Although there are some flaws in this particular compari ..read more

Data Mining in MATLAB

4y ago

Introduction Though Halloween is months away, I found the following interesting and thought readers might enjoy examining my solution. Recently, I was given the following probability question to answer: Halloween Probability Puzzler The number of trick-or-treaters knocking on my door in any five minute interval between 6 and 8pm on Halloween night is distributed as a Poisson with a mean of 5 (ignoring time effects). The number of pieces of candy taken by each child, in addition to the expected one piece per child, is distributed as a Poisson with a mean of 1. What is the minimum number of ..read more