Ideally alleles or haplotype of a single or multiple markers identified post-GWAS or QTL analysis are used to evaluate their effect on a trait of interest. The goal of this tutorial is to visualize the allele effects of a genetic marker on a trait (phenotype) utilizing a R package ggstatsplot developed by Indrajeet Patil, and use its in-built parameters to test the statistical significance of the effect sizes. ggstatsplot produces publication quality figures with descriptive and inferential statisitcs, and requires minimal coding experience to analyze their data.

• Install ggstatsplot in R
• Creating input file
• Importing data in R environment
• Plotting allele effect & Parameters
• Citation

## 1. Install ggstatsplot in R

The latest version 0.9.1 of the ggstatsplot can be installed using the below code:

install.packages("ggstatsplot")

library(ggstatsplot)

• ggstatsplot requires R version >=4.0

## 2. Creating input file

An input file can be a text(.txt) or comma separated file (.csv) in a format shown in an screenshot below:

Description of the columns or header in the file are:

Genotypes : List of sample names

Phenotype : Column containing phenotypic data

Marker_s10_1532223 : Marker name and its allele calls of the corresponding samples

## 4. Importing data in R environment

The input file can imported into R environment via below line of codes:

library(ggstatsplot)



Output –>


Genotypes Phenotype Marker_s10_1532223
1  Sample_1      2.00                  A
2  Sample_2      2.00                  A
3  Sample_3      2.50                  A
4  Sample_4      2.75                  A
5  Sample_5      3.00                  A
6  Sample_8      3.00                  A


Note: Alleles of multiple markers can be added via concating the allele calls of a marker set for a list of sample. For example: A;G;A. Also, please remove any missing allele (“N”) calls in the data or else it will be considered as an allele state and not treated as missing data.

## 5. Plotting allele effect & Parameters

Code to plot the allele effects using ggbetweenstats function:

ggbetweenstats(
data  = input,
x     = Marker_s10_1532223 ,
y     = Phenotype   , type = "n", pairwise.comparisons = T,
xlab = "Alleles",
ylab = 'Phenotype',
title = "Allele Effects of the marker s10_1532223"
)


Output –>

Description of the parameters in the above code are:

data : Obejct containing the input file

x : X-axis of the marker containing allele/haplotype calls

y : Y-axis of the phenotypic values

xlab : renaming x-axis label

ylab : renaming y-axis label

title : title of the plot

type : distribution of the data type

Note: You can define the distribution of your data, that is: parameteric or non-paratemeric using the type parameter, and perform pairwise-comparison. Also, if you have unbalanced data per group make sure type=np to calculate the non-parameteric pairwise-comparsion test a.k.a Dunn test.

To see all supported plots and statistical analyses, see the package website: https://indrajeetpatil.github.io/ggstatsplot/

Thank you for reading this tutorial. I really hope these steps will assist in your analysis. If you have any questions or comments, please comment below or send an email.

## 6. Citation

Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach. Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167