annotations.Rmd
The function pvdiv_table_topsnps
creates dataframes containing annotation information for the switchgrass diversity panel. To use this function, first load the provided annotation information from wherever you have saved it (currently, this is version 5.1 of the annotation information for Panicum virgatum):
library(AnnotationDbi)
library(VariantAnnotation)
txdb <- loadDb(file = "Pvirgatum_516_v5.1.gene.txdb.sqlite")
We will use the same toy SNP dataset that we used for genome-wide association, and find annotations for the same phenotype, GWAS_CT.
library(bigsnpr)
#> Loading required package: bigstatsr
snpfile <- system.file("extdata", "example_bigsnp.rds", package = "switchgrassGWAS")
gwasfile <- system.file("extdata", "GWAS_datatable_example_GWAS_CT.rds",
package = "switchgrassGWAS")
gwas <- readRDS(gwasfile)
snp <- snp_attach(snpfile)
Then, specify the number of top SNPs you want to find annotation information for, a FDR threshold to find annotation information for, and a set of distances away from the associated SNP (in bp) for which to pull annotations. Here, we find genes 1kb and 20kb away from 1) top 10 associations and 2) associations above a FDR of 10%.
anno_tables <- pvdiv_table_topsnps(df = gwas, type = "bigsnp", n = 10,
FDRalpha = 0.1, rangevector = c(1000, 20000),
snp = snp, txdb = txdb)
Let’s look at the annotations for genes within 1000bp of SNPs above the 10% FDR cutoff. In our toy example, there are seven of these:
kable(anno_tables[[2]])
CHR | region_start | region_end | width | region_strand | Annotation | QUERYID | TXID | Gene ID | POS | SNP Effect | SNP standard error | bigsnpscore | p value | log10p | FDR Adjusted p value | gene_start | gene_end | gene_width | gene_strand | pacId | transcriptName | peptideName | Pfam | Panther Categories | KOG | ec | KO | GO | Arabidopsis thaliana homolog | A. thaliana gene name | A. thaliana gene annotation | Rice homolog | Rice gene name | Rice gene annotation | d2_start | d2_end | distance from gene |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Chr01N | 34181686 | 34182686 | 1001 | * | intergenic | 1 | NA | NA | 34182186 | 0.9888194 | 0.2562594 | 3.858665 | 0.0001261 | 3.899256 | 0.0756650 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Chr02K | 68243567 | 68244567 | 1001 | * | intergenic | 2 | NA | NA | 68244067 | -0.9049649 | 0.2237918 | -4.043781 | 0.0000593 | 4.226830 | 0.0533842 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Chr03N | 31494893 | 31495893 | 1001 | * | intergenic | 3 | NA | NA | 31495393 | 1.2009167 | 0.2758415 | 4.353647 | 0.0000157 | 4.804299 | 0.0209074 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Chr05K | 49122041 | 49123041 | 1001 | - | promoter | 4 | 66970 | Pavir.5KG551800 | 49122541 | -1.1140440 | 0.2251390 | -4.948250 | 0.0000010 | 6.013622 | 0.0034888 | 49122303 | 49122857 | 555 | - | 41544150 | Pavir.5KG551800.1 | Pavir.5KG551800.1.p | PF01167 | PTHR16517,PTHR16517:SF32 | NA | NA | NA | NA | AT1G53320.1 | AtTLP7,TLP7 | tubby like protein 7 | LOC_Os05g43850.1 | NA | OsFBT9 - F-box and tubby domain containing protein, expressed | 238 | -316 | 238 |
Chr07N | 34540063 | 34541063 | 1001 | - | promoter | 5 | 96920 | Pavir.7NG229717 | 34540563 | -0.7609417 | 0.1757385 | -4.329965 | 0.0000174 | 4.758882 | 0.0209074 | 34539192 | 34539791 | 600 | - | 41627406 | Pavir.7NG229717.1 | Pavir.7NG229717.1.p | NA | NA | NA | NA | NA | NA | NA | NA | NA | ChrUn.fgenesh.mRNA.22 | NA | expressed protein | 1371 | 772 | 772 |
Chr07N | 34540063 | 34541063 | 1001 | * | intergenic | 5 | NA | NA | 34540563 | -0.7609417 | 0.1757385 | -4.329965 | 0.0000174 | 4.758882 | 0.0209074 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Chr08K | 4378182 | 4379182 | 1001 | * | intergenic | 6 | NA | NA | 4378682 | -0.6237182 | 0.1604175 | -3.888093 | 0.0001121 | 3.950457 | 0.0756650 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
pvdiv_table_topsnps
will return a list of all of the tables you requested, named according to the criteria used to create the table. You can then save these tables individually as csv or any other type of file.
If the resultant tables are small, say, less than 2000 entries apiece, I favor saving these tables as individual sheets in an Excel file using the R package XLConnect
.