#|
library(tidyverse)
library(here)
library(rrBLUP)
library(switchgrassGWAS)
<- read_rds(here("data", "Kinship_van_Raden_630_individuals_SNPs_r2_20percent.rds")) k_full
Narrow-sense heritability, or h2, is a descriptive statistic calculated by geneticists and crop breeders prior to modeling the association between a trait (such as height or yield) and molecular variation present in a population (such as single nucleotide polymorphisms, or SNPs).
It varies between 0 and 1 and is a measure of the fraction of the total variation in a trait attributable to additive effects of the molecular variation present in the group of individuals where the trait was measured. If it’s zero, or as a rule of thumb, 0.2 or less, there is little additive genetic variation for that trait, and any modeling of the association between the trait and molecular variation is likely to be unsuccessful.
Here I calculate h2 for biomass data in the switchgrass diversity panel dataset.
To calculate h2, I need trait data for my group of interest (total biomass for genotypes in the switchgrass diversity panel data, at the Austin, Texas site, in 2019) and a kinship matrix. The kinship matrix here contains information about how the switchgrass individuals are genetically related to one another. I’ve pre-calculated the kinship matrix for these individuals using the Van Raden method (VanRaden 2008).
There are many factors that go in to making the kinship matrix, but I won’t go into the details of that here.
First, load the R packages I’ll need to compute this statistic and load the kinship matrix.
The kinship matrix is a matrix with column and row names that are individual plant IDs (PLANT_ID
in the trait dataframe).
str(k_full)
num [1:630, 1:630] 1.3406 0.0871 0.0746 0.0404 0.1924 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:630] "J181.A" "J250.C" "J251.C" "J352.A" ...
..$ : chr [1:630] "J181.A" "J250.C" "J251.C" "J352.A" ...
1:5, 1:5] k_full[
J181.A J250.C J251.C J352.A J001.A
J181.A 1.34059943 0.08712208 0.07459743 0.04039814 0.19244956
J250.C 0.08712208 1.25057146 0.30333836 0.09003609 0.14031216
J251.C 0.07459743 0.30333836 1.26267550 0.05864841 0.16804045
J352.A 0.04039814 0.09003609 0.05864841 1.35194085 0.04047479
J001.A 0.19244956 0.14031216 0.16804045 0.04047479 1.18593277
Next, load the trait data. Here I am going to use the publicly available trait data from the switchgrassGWAS R package.
<- switchgrassGWAS::pvdiv_phenotypes trait_data
To calculate narrow-sense heritability for one trait, CLMB_BIOMASS
, I need to manipulate the trait dataframe so that the PLANT_ID
included match the ones in the kinship matrix.
- 1
- Filter individuals to those with data in the CLMB_BIOMASS column
- 2
- Filter individuals to those with PLANT_ID that match the names in the kinship matrix.
Then, I need to make sure that the same PLANT_ID
are present in the kinship matrix and in the dataframe, in the same order. To do this, first I find the numerical values of the column names that are in the trait dataframe, then I subset the kinship matrix to just those values.
<- which(colnames(k_full) %in% one_trait$PLANT_ID)
k_in_phe <- k_full[k_in_phe, k_in_phe] k_one
Then, make sure that the PLANT_ID in the trait dataframe are in the same order as those in the kinship matrix by making the PLANT_ID from the kinship matrix into a dataframe, and joining the trait dataframe to it.
<- enframe(colnames(k_one), name = NULL, value = "PLANT_ID")
k_order <- k_order |>
one_trait left_join(one_trait, by = "PLANT_ID")
Finally, check that all the PLANT_ID in the two dataframes are equal.
all.equal(k_order, select(one_trait, PLANT_ID))
[1] TRUE
Now, we used mixed.solve() from the rrBLUP package to calculate both the additive genetic variation (Vu) and the error variation in the trait (Ve). Added together, Vu and Ve make up the total trait variation in the population.
<- mixed.solve(y = one_trait$CLMB_BIOMASS, K = k_one)
mod2
<- mod2$Vu / (mod2$Vu + mod2$Ve)
h2
round(h2, digits = 4)
[1] 0.8923
h2 for the biomass at CLMB is 0.8923. This is a very high heritability, suggesting that additive SNP-trait associations may explain much of the trait variation at this location.
References
Citation
@online{macqueen2024,
author = {MacQueen, Alice},
title = {Calculating Narrow-Sense Heritability},
date = {2024-04-03},
url = {https://alice-macqueen.github.io/posts/2024-04-03-heritability/},
langid = {en}
}