STHDA on Feedspot

Linear Regression Essentials in R

STHDA

by

3y ago

Linear regression (or linear model) is used to predict a quantitative outcome variable (y) on the basis of one or multiple predictor variables (x) (James et al. 2014,P. Bruce and Bruce (2017)). The goal is to build a mathematical formula that defines y as a function of the x variable. Once, we built a statistically significant model, it’s possible to use it for predicting future outcome on the basis of new x values. When you build a regression model, you need to assess the performance of the predictive model. In other words, you need to evaluate how well the model is in predicting the outcome ..read more

Visit website

Interaction Effect in Multiple Regression: Essentials

STHDA

by

3y ago

This chapter describes how to compute multiple linear regression with interaction effects. Previously, we have described how to build a multiple linear regression model (Chapter @ref(linear-regression)) for predicting a continuous outcome variable (y) based on multiple predictor variables (x). For example, to predict sales, based on advertising budgets spent on youtube and facebook, the model equation is sales = b0 + b1*youtube + b2*facebook, where, b0 is the intercept; b1 and b2 are the regression coefficients associated respectively with the predictor variables youtube and facebook. The abo ..read more

Visit website

Regression with Categorical Variables: Dummy Coding Essentials in R

STHDA

by

3y ago

This chapter describes how to compute regression with categorical variables. Categorical variables (also known as factor or qualitative variables) are variables that classify observations into groups. They have a limited number of different values, called levels. For example the gender of individuals are a categorical variable that can take two levels: Male or Female. Regression analysis requires numerical variables. So, when a researcher wishes to include a categorical variable in a regression model, supplementary steps are required to make the results interpretable. In these steps, the cate ..read more

Visit website

Nonlinear Regression Essentials in R: Polynomial and Spline Regression Models

STHDA

by

3y ago

In some cases, the true relationship between the outcome and a predictor variable might not be linear. There are different solutions extending the linear regression model (Chapter @ref(linear-regression)) for capturing these nonlinear effects, including: Polynomial regression. This is the simple approach to model non-linear relationships. It add polynomial terms or quadratic terms (square, cubes, etc) to a regression. Spline regression. Fits a smooth curve with a series of polynomial segments. The values delimiting the spline segments are called Knots. Generalized additive models (GAM ..read more

Visit website

Linear Regression Assumptions and Diagnostics in R: Essentials

STHDA

by

3y ago

Linear regression (Chapter @ref(linear-regression)) makes several assumptions about the data at hand. This chapter describes regression assumptions and provides built-in plots for regression diagnostics in R programming language. After performing a regression analysis, you should always check if the model works well for the data at hand. A first step of this regression diagnostic is to inspect the significance of the regression beta coefficients, as well as, the R2 that tells us how well the linear regression model fits to the data. This has been described in the Chapters @ref(linear-regressi ..read more

Visit website

Multicollinearity Essentials and VIF in R

STHDA

by

3y ago

In multiple regression (Chapter @ref(linear-regression)), two or more predictor variables might be correlated with each other. This situation is referred as collinearity. There is an extreme situation, called multicollinearity, where collinearity exists between three or more variables even if no pair of variables has a particularly high correlation. This means that there is redundancy between predictor variables. In the presence of multicollinearity, the solution of the regression model becomes unstable. For a given predictor (p), multicollinearity can assessed by computing a score called the ..read more

Visit website

Confounding Variable Essentials

STHDA

by

3y ago

A Confounding variable is an important variable that should be included in the predictive model but you omit it.Naive interpretation of such models can lead to invalid conclusions. For example, consider that we want to model life expentency in different countries based on the GDP per capita, using the gapminder data set: library(gapminder) lm(lifeExp ~ gdpPercap, data = gapminder) In this example, it is clear that the continent is an important variable: countries in Europe are estimated to have a higher life expectancy compared to countries in Africa. Therefore, continent is a confounding v ..read more

Visit website

Regression Model Accuracy Metrics: R-square, AIC, BIC, Cp and more

STHDA

by

3y ago

In this chapter we’ll describe different statistical regression metrics for measuring the performance of a regression model (Chapter @ref(linear-regression)). Next, we’ll provide practical examples in R for comparing the performance of two models in order to select the best one for our data. Contents: Model performance metrics Loading required R packages Example of data Building regression models Assessing model quality Comparing regression models performance Discussion The Book: Machine Learning Essentials: Practical Guide in R Model performance metrics In regression model, the most c ..read more

Visit website

Cross-Validation Essentials in R

STHDA

by

3y ago

Cross-validation refers to a set of methods for measuring the performance of a given predictive model on new test data sets. The basic idea, behind cross-validation techniques, consists of dividing the data into two sets: The training set, used to train (i.e. build) the model; and the testing set (or validation set), used to test (i.e. validate) the model by estimating the prediction error. Cross-validation is also known as a resampling method because it involves fitting the same statistical method multiple times using different subsets of the data. In this chapter, you’ll learn ..read more

Visit website

Bootstrap Resampling Essentials in R

STHDA

by

3y ago

Similarly to cross-validation techniques (Chapter @ref(cross-validation)), the bootstrap resampling method can be used to measure the accuracy of a predictive model. Additionally, it can be used to measure the uncertainty associated with any statistical estimator. Bootstrap resampling consists of repeatedly selecting a sample of n observations from the original data set, and to evaluate the model on each copy. An average standard error is then calculated and the results provide an indication of the overall variance of the model performance. This chapter describes the basics of bootstrapping a ..read more

Visit website

Follow STHDA on FeedSpot