The Analysis Factor

1,045 FOLLOWERS

The founder and president of The Analysis Factor, a statistics consulting company, tackles issues in applied statistics and data analysis on her blog. Complex statistical concepts and analysis are explained with clarity, though some background in stats is needed to understand most posts

The Analysis Factor

1M ago

How do you know when to use a time series and when to use a linear mixed model for longitudinal data? What’s the difference between repeated measures data and longitudinal? In this training, you’ll learn over a dozen different ways of analyzing longitudinal, repeated measures, and time series data, starting with the simplest Stage 1 […]
The post Member Training: Types of Longitudinal, Repeated Measures, and Time Series appeared first on The Analysis Factor ..read more

The Analysis Factor

3M ago

If you’ve tried coding in Stata, you may have found it strange. The syntax rules are straightforward, but different from what I’d expect.
I had experience coding in Java and R before I ever used Stata. Because of this, I expected commands to be followed by parentheses, and for this to make it easy to read the code’s structure.
Stata does not work this way.
An Example of how Stata Code Works
To see the way Stata handles a linear regression, go to the command line and type
h reg or help regress
You will see a help page pop up, with this Syntax line near the top.
(If you need a refresher on getti ..read more

The Analysis Factor

5M ago

Structural Equation Modeling (SEM) is a popular method to test hypothetical relationships between constructs in the social sciences. These constructs may be unobserved (a.k.a., “latent”) or observed (a.k.a., “manifest”).
In this training, you will learn the different types of SEM: confirmatory factor analysis, path analysis for manifest and latent variables, and latent growth modeling (i.e., the application of SEM on longitudinal data).
We’ll discuss the different terminology, the commonly used symbols, and the different ways a model can be specified, as well as how to present results and eva ..read more

The Analysis Factor

5M ago

There are many designs that could be considered Repeated Measures design, and they all have one key feature: you measure the outcome variable for each subject on several occasions, treatments, or locations.
Understanding this design is important for avoiding analysis mistakes. For example, you can’t treat multiple observations on the same subject as independent observations.
Example
Suppose that you recruit 10 subjects for an athletic training experiment. Each subject runs a mile on three separate occasions, and your outcome of interest is run time. So you have 30 measures of the outcome varia ..read more

The Analysis Factor

6M ago

In part 3 of this series, we explored the Stata graphics menu. In this post, let’s look at the Stata Statistics menu.
Statistics Menu
Let’s use the Statistics menu to see if price varies by car origin (foreign).
We are testing whether a continuous variable has a different mean for the two categories of a categorical variable. So we should do a 2-sample t-test.
Say we want to use a 90% confidence level, and we have reason to suspect the two groups have unequal variance.
Click Statistics -> Summaries, tables, and tests -> Classical tests of hypothesis -> t test (mean-comparison test ..read more

The Analysis Factor

7M ago

Regression is one of the most common analyses in statistics. Most of us learn it in grad school, and we learned it in a specific software. Maybe SPSS, maybe another software package. The thing is, depending on your training and when you did it, there is SO MUCH to know about doing a regression analysis in SPSS.
There are the general procedures everyone needs to know, the options that are important for testing assumptions and plotting results for reporting, and more.
And, perhaps most surprisingly, there are two totally different procedures for running linear regression in SPSS: Regression and ..read more

The Analysis Factor

8M ago

I recently received a great question in a comment about whether the assumptions of normality, constant variance, and independence in linear models are about the errors, εi, or the response variable, Yi.
The asker had a situation where Y, the response, was not normally distributed, but the residuals were.
Quick Answer: It’s just the errors.
In fact, if you look at any (good) statistics textbook on linear models, you’ll see below the model, stating the assumptions:
εi ~ i.i.d. N(0, σ²)
That εi is the random error term.
The i.i.d. means every error is independent and identically distri ..read more

The Analysis Factor

8M ago

The objective for quasi-experimental designs is to establish cause and effect relationships between the dependent and independent variables. However, they have one big challenge in achieving this objective: lack of an established control group.
There are ways, though, to create a post-hoc control group. One way is to match non-treated subjects with treated subjects.
The most common matching method is Propensity Score Matching. Gaining popularity as a matching method is Coarsened Exact Matching. How are these matching methods different?
To understand the differences, this Stats Amore Training e ..read more

The Analysis Factor

9M ago

From our first 2 posts, you should be comfortable navigating the windows and menus of Stata. We can now get into the real meat of programming in Stata: do-files.
Why Do-Files?
A do-file is a Stata file that provides a list of commands to run. You can run an entire do-file at once, or you can highlight and run particular lines from the file.
If you set up your do-file correctly, you can just click “run” after opening it. The do-file will set you to the correct directory, open your dataset, do all analyses, and save any graphs or results you want saved.
I’ll start off by saying this: Any analysi ..read more

The Analysis Factor

9M ago

Do you ever wish your data analysis project were a little more organized?
Statistical analysis projects vary in complexity, ranging from a single run t-test to multi-analyst, multi-year projects with large and diverse datasets, time consuming models, frequent data/code updates, and complex reporting. Having organized systems is always a good idea— and for projects on the complex end, preparing process flow, file structure, version control and intermediate computations can help to reduce chaos and increase the likelihood of successful outcomes.
In this training, you’ll learn common ways to ma ..read more