Is a measure of association between two quantitative variables.
Purpose is to measure the strength and direction of the relationship between two variables.
\[ r_{xy} = \frac{\Sigma_{i=1}^n (x_i - \bar x) (y_i - \bar y)}{\sqrt{\Sigma_{i=1}^n (x_i - \bar x)^2 (y_i - \bar y)^2}} \]
Correlation coefficient | Psychology | Politics and economics | Medicine |
---|---|---|---|
± 1.0 | Perfect | Perfect | Perfect |
± 0.9 | Strong | Very strong | Very strong |
± 0.8 | Strong | Very strong | Very strong |
± 0.7 | Strong | Very strong | Moderate |
± 0.6 | Moderate | Strong | Moderate |
± 0.5 | Moderate | Strong | Fair |
± 0.4 | Moderate | Strong | Fair |
± 0.3 | Weak | Moderate | Fair |
± 0.2 | Weak | Weak | Poor |
± 0.1 | Weak | Negligible | Poor |
± 0.0 | Zero | None | None |
```{r}
# Use the built-in 'mtcars' dataset
data <- mtcars
# Calculate correlation coefficient
correlation <- cor(data$mpg, data$wt)
# Create scatter plot
ggplot(data, aes(x = wt, y = mpg)) +
geom_point(color = "blue", alpha = 0.7) +
labs(
title = "Scatter Plot of Miles Per Gallon vs Car Weight",
x = "Car Weight (1000 lbs)",
y = "Miles Per Gallon (MPG)"
) +
annotate("text",
x = max(data$wt) * 0.7,
y = max(data$mpg) * 0.9,
label = paste("Correlation:", round(correlation, 2)),
size = 6,
color = "red") +
theme_minimal()
```
two things that goes together may not necessarily mean that there is causation
one variable can be strongly related to another, yet not cause it.
Correlation does not imply causality.
Type | Data type | When to use |
---|---|---|
|
|
|
|
|
|
R activity
Test if there is a relationship between mpg and car weight using mtcars
dataset.
Step 1: read in data
data(mtcars)
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Step 3: Create scatterplot
Step 4: Perform pearson correlation
cor.test(mtcars$mpg, mtcars$wt, method = "pearson")
Pearson's product-moment correlation
data: mtcars$mpg and mtcars$wt
t = -9.559, df = 30, p-value = 1.294e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.9338264 -0.7440872
sample estimates:
cor
-0.8676594
R activity
Test if there is a relationship between mpg and car horse power using mtcars
dataset.
Step 1: read in data
data(mtcars)
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Step 3: Create scatterplot
The cause-and-effect relationship between two variables.
The strength and direction of the linear relationship between two variables
The difference in means between two groups.
The probability of an event occurring.
0.25
-0.70
0.10
0.50
A strong positive relationship.
A weak positive relationship.
A strong negative relationship
No relationship.
0 to 1
-1 to 1
-∞ to ∞
0 to ∞
Outliers in the data.
The units of measurement of the variables.
The sample size.
All of the above
AgEc 211: statistical methods