Ideally alleles or haplotype of a single or multiple markers identified post-GWAS or QTL analysis are used to evaluate their effect on a trait of interest. The goal of this tutorial is to visualize the allele effects of a genetic marker on a trait (phenotype) utilizing a R package ggstatsplot developed by Indrajeet Patil, and use its in-built parameters to test the statistical significance of the effect sizes. ggstatsplot
produces publication quality figures with descriptive and inferential statisitcs, and requires minimal coding experience to analyze their data.
- Install ggstatsplot in R
- Creating input file
- Download sample file
- Importing data in R environment
- Plotting allele effect & Parameters
- Citation
1. Install ggstatsplot in R
The latest version 0.9.1 of the ggstatsplot
can be installed using the below code:
install.packages("ggstatsplot")
library(ggstatsplot)
- ggstatsplot requires R version >=4.0
2. Creating input file
An input file can be a text(.txt) or comma separated file (.csv) in a format shown in an screenshot below:
Description of the columns or header in the file are:
Genotypes : List of sample names
Phenotype : Column containing phenotypic data
Marker_s10_1532223 : Marker name and its allele calls of the corresponding samples
3. Download sample file
Please downnload sample file from below links:
4. Importing data in R environment
The input file can imported into R environment via below line of codes:
library(ggstatsplot)
input <- read.table("inputFile.txt", header = T)
head(input)
Output –>
> head(input)
Genotypes Phenotype Marker_s10_1532223
1 Sample_1 2.00 A
2 Sample_2 2.00 A
3 Sample_3 2.50 A
4 Sample_4 2.75 A
5 Sample_5 3.00 A
6 Sample_8 3.00 A
Note: Alleles of multiple markers can be added via concating the allele calls of a marker set for a list of sample. For example: A;G;A. Also, please remove any missing allele (“N”) calls in the data or else it will be considered as an allele state and not treated as missing data.
5. Plotting allele effect & Parameters
Code to plot the allele effects using ggbetweenstats
function:
ggbetweenstats(
data = input,
x = Marker_s10_1532223 ,
y = Phenotype , type = "n", pairwise.comparisons = T,
xlab = "Alleles",
ylab = 'Phenotype',
title = "Allele Effects of the marker s10_1532223"
)
Output –>
Description of the parameters in the above code are:
data : Obejct containing the input file
x : X-axis of the marker containing allele/haplotype calls
y : Y-axis of the phenotypic values
xlab : renaming x-axis label
ylab : renaming y-axis label
title : title of the plot
type : distribution of the data type
Note: You can define the distribution of your data, that is: parameteric or non-paratemeric using the type
parameter, and perform pairwise-comparison
. Also, if you have unbalanced data per group make sure type=np
to calculate the non-parameteric pairwise-comparsion test a.k.a Dunn test.
To see all supported plots and statistical analyses, see the package website: https://indrajeetpatil.github.io/ggstatsplot/
Thank you for reading this tutorial. I really hope these steps will assist in your analysis. If you have any questions or comments, please comment below or send an email.
6. Citation
Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach. Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167