Alice MacQueen - Calculating narrow-sense heritability

Narrow-sense heritability, or h², is a descriptive statistic calculated by geneticists and crop breeders prior to modeling the association between a trait (such as height or yield) and molecular variation present in a population (such as single nucleotide polymorphisms, or SNPs).

It varies between 0 and 1 and is a measure of the fraction of the total variation in a trait attributable to additive effects of the molecular variation present in the group of individuals where the trait was measured. If it’s zero, or as a rule of thumb, 0.2 or less, there is little additive genetic variation for that trait, and any modeling of the association between the trait and molecular variation is likely to be unsuccessful.

Here I calculate h² for biomass data in the switchgrass diversity panel dataset.

To calculate h², I need trait data for my group of interest (total biomass for genotypes in the switchgrass diversity panel data, at the Austin, Texas site, in 2019) and a kinship matrix. The kinship matrix here contains information about how the switchgrass individuals are genetically related to one another. I’ve pre-calculated the kinship matrix for these individuals using the Van Raden method (VanRaden 2008).

There are many factors that go in to making the kinship matrix, but I won’t go into the details of that here.

First, load the R packages I’ll need to compute this statistic and load the kinship matrix.

#|
library(tidyverse)
library(here)
library(rrBLUP)
library(switchgrassGWAS)

k_full <- read_rds(here("data", "Kinship_van_Raden_630_individuals_SNPs_r2_20percent.rds"))

The kinship matrix is a matrix with column and row names that are individual plant IDs (PLANT_ID in the trait dataframe).

str(k_full)

 num [1:630, 1:630] 1.3406 0.0871 0.0746 0.0404 0.1924 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:630] "J181.A" "J250.C" "J251.C" "J352.A" ...
  ..$ : chr [1:630] "J181.A" "J250.C" "J251.C" "J352.A" ...

k_full[1:5, 1:5]

           J181.A     J250.C     J251.C     J352.A     J001.A
J181.A 1.34059943 0.08712208 0.07459743 0.04039814 0.19244956
J250.C 0.08712208 1.25057146 0.30333836 0.09003609 0.14031216
J251.C 0.07459743 0.30333836 1.26267550 0.05864841 0.16804045
J352.A 0.04039814 0.09003609 0.05864841 1.35194085 0.04047479
J001.A 0.19244956 0.14031216 0.16804045 0.04047479 1.18593277

Next, load the trait data. Here I am going to use the publicly available trait data from the switchgrassGWAS R package.

trait_data <- switchgrassGWAS::pvdiv_phenotypes

To calculate narrow-sense heritability for one trait, CLMB_BIOMASS, I need to manipulate the trait dataframe so that the PLANT_ID included match the ones in the kinship matrix.

one_trait <- trait_data |>
1  filter(!is.na(CLMB_BIOMASS)) |>
2  filter(PLANT_ID %in% colnames(k_full))

1: Filter individuals to those with data in the CLMB_BIOMASS column
2: Filter individuals to those with PLANT_ID that match the names in the kinship matrix.

Then, I need to make sure that the same PLANT_ID are present in the kinship matrix and in the dataframe, in the same order. To do this, first I find the numerical values of the column names that are in the trait dataframe, then I subset the kinship matrix to just those values.

k_in_phe <- which(colnames(k_full) %in% one_trait$PLANT_ID)
k_one <- k_full[k_in_phe, k_in_phe]

Then, make sure that the PLANT_ID in the trait dataframe are in the same order as those in the kinship matrix by making the PLANT_ID from the kinship matrix into a dataframe, and joining the trait dataframe to it.

k_order <- enframe(colnames(k_one), name = NULL, value = "PLANT_ID")
one_trait <- k_order |>
  left_join(one_trait, by = "PLANT_ID")

Finally, check that all the PLANT_ID in the two dataframes are equal.

all.equal(k_order, select(one_trait, PLANT_ID))

[1] TRUE

Now, we used mixed.solve() from the rrBLUP package to calculate both the additive genetic variation (Vu) and the error variation in the trait (Ve). Added together, Vu and Ve make up the total trait variation in the population.

mod2 <- mixed.solve(y = one_trait$CLMB_BIOMASS, K = k_one)

h2 <- mod2$Vu / (mod2$Vu + mod2$Ve)

round(h2, digits = 4)

[1] 0.8923

h² for the biomass at CLMB is 0.8923. This is a very high heritability, suggesting that additive SNP-trait associations may explain much of the trait variation at this location.

References

VanRaden, P. M. 2008. “Efficient Methods to Compute Genomic Predictions.” Journal of Dairy Science 91 (11): 4414–23. https://doi.org/10.3168/jds.2007-0980.

Citation

BibTeX citation:

@online{macqueen2024,
  author = {MacQueen, Alice},
  title = {Calculating Narrow-Sense Heritability},
  date = {2024-04-03},
  url = {https://alice-macqueen.github.io/posts/2024-04-03-heritability/},
  langid = {en}
}

For attribution, please cite this work as:

MacQueen, Alice. 2024. “Calculating Narrow-Sense Heritability.” April 3, 2024. https://alice-macqueen.github.io/posts/2024-04-03-heritability/.