In the Quantitative Trait Locus (QTL) analysis, composite interval mapping (CIM) method estimates the QTL positon with higher accuracy and statistical significance by combining interval mapping with multiple regression. This method also controls background noise resulting from genetic variations in other regions of the genome that affect the detection of the true QTL. In this post, my objective is to describe and walk you through all the basic steps one might need run the composite inteval mapping in R software. For further descriptive information on R/qtl, please read the paper by Karl Broman at this link: http://www.rqtl.org/

To get started

Step 1. Download and install R software and R studio

Download and install the latest version of the R software from this link
Download and install R studio from this link
Finally, install the library qtl in R

Step 1.1 - Install qtl library in R following the below steps:

Step 2. Prepare the input files

Phenotype data
Genetic map

Both the phenotype and genetic map infomation can be put together in a single .csv file. See below the screenshot to understand the formatting of the input file.

Step 2.1 - Setup working directory following the below steps:

Step 3. Running CIM in R

Please double check the formatting of the file is correct. Once ready, open R studio and type the following commands to get started.

Step 3.1. Import qtl library

library(qtl)

Step 3.2. Load the .csv file with the genotype and phenotype data

phenoGeno <-read.cross(format = "csv", file = "file.csv",  
                 na.strings= c('NA'), genotypes = c("A","B","H"))

In the above command, if your genetic data is formatted differently then specify the in genotype section. To execute the above command, highlight the command and press CTRL + ENTER. You should see output in the terminal window, and please check for any warnings or errors.

Step 3.3. Summary of the data file and plotting

Type:

summary(phenoGeno)
plot(phenoGeno)

You should see a plot similiar to shown below:

Step 3.4. Calculating genotype probabilities

In this step, Hidden Markov model technology is used to calculate the probabilities of the true underlying genotypes.

phenoGeno <- calc.genoprob(phenoGeno, step=0, off.end=0.0, error.prob=1.0e-4,stepwidth = "fixed", map.function="kosambi")

phenoGeno_cross <- sim.geno(phenoGeno,  n.draws=32, step=0, off.end=0.0, error.prob=1.0e-4, stepwidth = "fixed", map.function="kosambi")

Important Make sure to rename phenoGeno to phenoGeno_cross in the second step of the above command.

Step 3.5. running QTL scan using CIM method with permutations

Type the below command in the console to run the qtl scan using CIM method.
```
scan.cim = cim(phenoGeno_cross, pheno.col=2, map.function="kosambi")
```
Note In #value type the column number of the phenotype
Run permutation test by typing the below command:
```
scan.cim.perm = cim(phenoGeno_cross, pheno.col=2, map.function="kosambi", n.perm=1000)
```
It is recommended to run at 1,000 permutation test.
check the LOD threshold after permutations and significant QTL above the threshold.
```
summary(scan.cim.perm)
summary(scan.cim, threshold = #value)
```
Note In #value Add the LOD score at desired alpha (0.05 0r 0.1)
Plot the QTL scan
```
plot(scan.cim)
```

Step 4.0 QTL effect plot

Type the below command to plot the effect plot of the QTL on the phenotype.

plotPXG(phenoGeno_cross, pheno.col = #value, marker = c("#value"))

Provide phenotype colum number in pheno.col and marker name in marker =

Output:

Step 5.0 Make QTL model with the significant marker

qtl <- makeqtl(phenoGeno_cross, chr=c(#value), pos=c(#value),what=c("prob")) 
       
fitqtl <- fitqtl(don, pheno.col=c(#value), formula = y~Q1, qtl= qtl, method = "hk", get.ests = T)

summary(fitqtl)

Note In #value Add the signifincat marker’s coordinates i.e. chromosome and pos.

Below is the screenshot of the output one might except to see in the R shell:

fitqtl summary

Method: Haley-Knott regression 
Model:  normal phenotype
Number of observations : 93 

Full model result
----------------------------------  
Model formula: y ~ Q1 

      	df    SS   MS   LOD    %var 	Pvalue(Chi2) Pvalue(F)
Model 	 2  13.07	6.53  30.4  77.8           0         0
Error	 90  3.724093 0.04137881                                        
Total	 92 16.795699 

Estimated effects:
-----------------
               est       SE      t
Intercept  3.01835  0.09467 31.882
9@2.5a    -0.22761  0.09566 -2.379
9@2.5d    -1.41852  0.18935 -7.492

Note: Additive effect (a) = (BB - AA)/2 and Dominance effect (d) = AB - (AA + BB)/2.

Step 6.0 Identify QTL intervals using LOD drop

lodint(results = scan.cim, chr = #value, drop = 1.8)

Provide the cM to drop in the drop = and the chr

An example output is show below:

>   lodint(scan.cim, chr=2, drop=1.8)
           chr    pos      lod
2_14850509   3 45.290 11.44917
2_15846197   3 48.827 14.43254
2_19297395   3 50.312 12.45010

--- End of Tutorial ---

Thank you for reading this tutorial. I really hope this helpful in giving you the concept and technology behind AmpSeq and the data analysis. If you have any questions or comments, please let comment below or send me an email.

Bibliography

Broman KW, Wu H, Sen Ś, Churchill GA (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19:889-890 [PubMed | pdf (236k) | Screen pdf (288k)]