Machine Learning algorithms for Genomic Selection of Quantitative Traits with varying Heritabilities

Multiple Quantitative traits were evaluated with varying heritabilties to study how the inheritance of a trait affect the performance of genomic selection and prediction models. Machine learning algorithms Random Forest and GLMnet: Lasso and Elastic-Net Regularized Generalized Linear Models were deployed to develop Genomic selection models using the publicly available... [Read More]

RNA-Seq data analysis in R - Investigate differentially expressed genes in your data!

In this tutorial, negative binomial was used to perform differential gene expression analyis in R using DESeq2, pheatmap and tidyverse packages. The workflow for the RNA-Seq data is: Obatin the FASTQ sequencing files from the sequencing facilty Assess the quality of the sequencing reads Perform genome alignment to identify the... [Read More]

Utilizing Machine Learning algorithms (GLMnet and Random Forest models) for Genomic Prediction of a Quantitative trait

In this tutorial, I used two popular machine learning algorithms: Random Forest and GLMnet for Genomic Prediction of a quantitative trait. There are several machine learning R packages available, however, in this tutorial i used caret package. The objective was to develop two models: Random forest and glmnet using real... [Read More]

Check for mendelian errors using pedigree and genotype information

Cluster based methods such as multidimensional scaling (MDS) and priniciple component analysis (PCA) are traditionally used in identifying samples with genotypic inconsistencies, however, it is important to identify genotypes with high mendelian inconsistencies prior to any genetic or statistical analysis. In this tutorial, I would like to share how one... [Read More]

Pruning genetic markers based on their physical distance and linkage disequilibrium (LD)

High density markers do not provide any additional information, therefore, can be pruned based on the physical distances between adjacent markers and linkage disequilibrium (LD). In this tutorial, I will show how to prune markers based on their physical position in TASSEL software, and based on LD in PLINK software.... [Read More]

Investigate genetic admixture using STRUCTURE software

Structure Software is a freely available software package that one may use for rigorous investigation of admixed individuals; identification of point of hybridization and migrants; and estimate over all structure of a population using a commonly used genetic markers such as SNPs and SSRs. This software was developed by Pritchard... [Read More]

QTL mapping using Composite Interval Mapping (CIM) method in R software

In the Quantitative Trait Locus (QTL) analysis, composite interval mapping (CIM) method estimates the QTL positon with higher accuracy and statistical significance by combining interval mapping with multiple regression. This method also controls background noise resulting from genetic variations in other regions of the genome that affect the detection of... [Read More]

Amplicon Sequencing (AmpSeq): Concept and data analysis

Amplicon Sequencing (AmpSeq) technology combines highly multiplexed PCR sequences of multiple barcoded samples in a single reaction. Amplicons include SNP, haplotypes, SSRs and presence/absence variants. Currently, in a single reaction about 400 amplicons and 3000 samples can be processed simultaneously. In this blog, my goal is to basically introduce the... [Read More]

Genetic map construction in LepMap3

Building genetic maps can be challenging and sometimes quite stressful, especially, when dealing with thousands or even millions of markers. In this post, I am hoping to help anyone who would like to get started to build a decent genetic map in an open software Lep-MAP3 , and finally, evaluating... [Read More]