ANOVA also known as Analysis of Variance
is a powerful statistical method to test a hypothesis involving more than two groups (also known as treatments). However, ANOVA is limited in providing a detailed insights between different treatments or groups, and this is where, Tukey (T) test
also known as T-test comes in to play. In this tutorial, I will show how to prepare input
files and run ANOVA and Tukey test in R software. For detailed information on ANOVA and R, please read this article at this
link.
Step 1.0 Download and install R software and R studio
- Download and install the latest version of the R software from this link
- Download and install R studio from this link
- Finally, install the library qtl in R
Step 1.2 - Setup working directory following the below steps:
Step 1.3: Preparing the Input file
Create an input file as shown in the example below:
Step 2: Run ANOVA in R
2.1 Import R package
Install R package agricolae
and open the library typing the below command line:
library(agricolae)
Note: Please remember to install the correct R package for ANOVA!
2.2 Import data
Import your data by typing the below command line:
data= read.table(file = "fileName.txt", header = T)
2.3 Check data
Once the data is imported, check it by typing the below command line:
head(data_pressure)
tail(data_pressure)
2.4 Conduct ANOVA
Now, Simply run ANOVA
by typing the below command lines:
data.lm <- lm(data$Dependent_variable ~ data$Treatment, data = data)
data.av <- aov(data.lm)
summary(data.av)
2.5 Regression Coefficient
Obtain regression coefficient of the predictors in the data using below code:
summary(data.lm)
Output –>
> summary(data.lm)
Call:
lm(formula = data$Dependent_variable ~ data$Treatment, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.32500 -0.48500 0.05917 0.23979 2.68500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.9383 0.4481 4.325 0.000329 ***
data$TreatmentB 2.5267 0.6338 3.987 0.000726 ***
data$TreatmentD 5.7450 0.6338 9.065 1.61e-08 ***
data$TreatmentE 5.7258 0.6338 9.035 1.70e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.098 on 20 degrees of freedom
Multiple R-squared: 0.8524, Adjusted R-squared: 0.8302
F-statistic: 38.49 on 3 and 20 DF, p-value: 1.692e-08
To add another coefficient, add the symbol “+” for every additional variable you want to add to the model.
2.6 Overall model’s performance
The overall model’s performane cand be obtained using the below code:
summary(data.av)
Output –>
> summary(data.av)
Df Sum Sq Mean Sq F value Pr(>F)
data$Treatment 3 139.2 46.38 38.49 1.69e-08 ***
Residuals 20 24.1 1.20
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2.7 Good fit of the linear model
The coefficient of determination or R² is a good measure to test if the linear model has the good fit, and is measured by the proportion of the total variability explained by the regression model.
R² of a linear model can be obtained using below code:
summary(data.lm)$r.squared
Output –>
summary(data.lm)$r.squared
[1] 0.8523754
The model can explain ~85% of the total variability, which tells us that the model fits the data very well.
3.0 Conduct Tukey test
From the summary output, one can interpret that there is a significant difference (i.e. P < 0.001) between the Treatments
, however, we perfom Tukey’s Test to investigate the differences between all treaments using steps below.
Type below commands to run Tukey test:
data.test <- TukeyHSD(data.av)
data.test
Below is the summary of the Tukey test:
> data.test
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = data.lm)
$`data$Treatment`
diff lwr upr p adj
B-A 2.52666667 0.7527896 4.300544 0.0037260
D-A 5.74500000 3.9711229 7.518877 0.0000001
E-A 5.72583333 3.9519563 7.499710 0.0000001
D-B 3.21833333 1.4444563 4.992210 0.0003106
E-B 3.19916667 1.4252896 4.973044 0.0003326
E-D -0.01916667 -1.7930437 1.754710 0.9999897
From the above T-test, one can conclude that there is a significant difference in the most of groups, except between-groups E-D at P <0.001
Finally, one can plot the above results using the below command:
plot(data.test)
Output:
--- End of Tutorial ---
Thank you for reading this tutorial. If you have any questions or comments, please let me know in the comment section below or send me an email.
Bibliography
Felipe de Mendiburu (2019). agricolae: Statistical Procedures for Agricultural Research. R package version 1.3-1. https://CRAN.R-project.org/package=agricolae