ANOVA also known as Analysis of Variance is a powerful statistical method to test a hypothesis involving more than two groups (also known as treatments). However, ANOVA is limited in providing a detailed insights between different treatments or groups, and this is where, Tukey (T) test also known as T-test comes in to play. In this tutorial, I will show how to prepare input files and run ANOVA and Tukey test in R software. For detailed information on ANOVA and R, please read this article at this link.

3. Finally, install the library qtl in R

Step 1.3: Preparing the Input file

Create an input file as shown in the example below:

Step 2: Run ANOVA in R

2.1 Import R package

Install R package agricolae and open the library typing the below command line:

library(agricolae)


Note: Please remember to install the correct R package for ANOVA!

2.2 Import data

Import your data by typing the below command line:

data= read.table(file = "fileName.txt", header = T)


2.3 Check data

Once the data is imported, check it by typing the below command line:

head(data_pressure)
tail(data_pressure)


2.4 Conduct ANOVA

Now, Simply run ANOVA by typing the below command lines:

data.lm <- lm(data$Dependent_variable ~ data$Treatment, data = data)

data.av <- aov(data.lm)
summary(data.av)


2.5 Regression Coefficient

Obtain regression coefficient of the predictors in the data using below code:

summary(data.lm)


Output –>

> summary(data.lm)

Call:
lm(formula = data$Dependent_variable ~ data$Treatment, data = data)

Residuals:
Min       1Q   Median       3Q      Max
-2.32500 -0.48500  0.05917  0.23979  2.68500

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)       1.9383     0.4481   4.325 0.000329 ***
data$TreatmentB 2.5267 0.6338 3.987 0.000726 *** data$TreatmentD   5.7450     0.6338   9.065 1.61e-08 ***
data$TreatmentE 5.7258 0.6338 9.035 1.70e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.098 on 20 degrees of freedom Multiple R-squared: 0.8524, Adjusted R-squared: 0.8302 F-statistic: 38.49 on 3 and 20 DF, p-value: 1.692e-08  To add another coefficient, add the symbol “+” for every additional variable you want to add to the model. 2.6 Overall model’s performance The overall model’s performane cand be obtained using the below code: summary(data.av)  Output –> > summary(data.av) Df Sum Sq Mean Sq F value Pr(>F) data$Treatment  3  139.2   46.38   38.49 1.69e-08 ***
Residuals      20   24.1    1.20
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


2.7 Good fit of the linear model

The coefficient of determination or R² is a good measure to test if the linear model has the good fit, and is measured by the proportion of the total variability explained by the regression model.

R² of a linear model can be obtained using below code:


summary(data.lm)$r.squared  Output –>  summary(data.lm)$r.squared
[1] 0.8523754



The model can explain ~85% of the total variability, which tells us that the model fits the data very well.

3.0 Conduct Tukey test

From the summary output, one can interpret that there is a significant difference (i.e. P < 0.001) between the Treatments, however, we perfom Tukey’s Test to investigate the differences between all treaments using steps below.

Type below commands to run Tukey test:

data.test <- TukeyHSD(data.av)
data.test


Below is the summary of the Tukey test:

> data.test
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = data.lm)

$data$Treatment
B-A  2.52666667  0.7527896 4.300544 0.0037260
D-A  5.74500000  3.9711229 7.518877 0.0000001
E-A  5.72583333  3.9519563 7.499710 0.0000001
D-B  3.21833333  1.4444563 4.992210 0.0003106
E-B  3.19916667  1.4252896 4.973044 0.0003326
E-D -0.01916667 -1.7930437 1.754710 0.9999897


From the above T-test, one can conclude that there is a significant difference in the most of groups, except between-groups E-D at P <0.001

Finally, one can plot the above results using the below command:

plot(data.test)


Output:

--- End of Tutorial ---

Thank you for reading this tutorial. If you have any questions or comments, please let me know in the comment section below or send me an email.

Bibliography

Felipe de Mendiburu (2019). agricolae: Statistical Procedures for Agricultural Research. R package version 1.3-1. https://CRAN.R-project.org/package=agricolae