
Extract putative causal variants within a candidate gene from a tabix-indexed snpEff annotated VCF file.
Source:R/variant_annotation.R
extract_variant.Rd
Extract putative causal variants within a candidate gene from a tabix-indexed snpEff annotated VCF file.
Usage
extract_variant(
cand_gene_id,
gff_path,
vcf_dir,
vcf_file,
output_path = tempdir(),
outfile_suffix = "variants"
)
Arguments
- cand_gene_id
A character value specifying the candidate gene ID.
- gff_path
A character value indicating the path to the GFF file, including the complete file name.
- vcf_dir
A character value indicating the path to directory containing snpEff annotated VCF files.
- vcf_file
A character value indicating the file name for snpEff annotated VCF file including the .vcf.gz extension.
- output_path
A character value indicating the path to directory for saving extracted variants.
- outfile_suffix
A character value indicating the file name to be used for saving extracted variants.
Details
This wrapper function operates on tabix-indexed snpEff annotated VCF files. However, if a tabix-indexed VCF file is not available, it can create one from the inputted VCF file.
The file names of snpEff annotated VCF files are expected to consist of three components: a common prefix, chromosome tag and a common suffix.
Examples
# example code
# \donttest{
library(panGenomeBreedr)
# Work from the tempdir
vcf_dir <- tempdir()
# Google drive link to gff3 file
flink1 <- "https://drive.google.com/file/d/1XjYyJ2JLywbbniIU6oUIIxAmEBKfmHpz/view?usp=sharing"
# Download gff3 file to tempdir()
gff3 <- folder_download_gd(drive_link = flink1,
output_path = vcf_dir,
is.folder = FALSE)
#> ℹ Not logged in as any specific Google user.
#> File downloaded:
#> • Sbicolor_730_v5.1.gene.gff3 <id: 1XjYyJ2JLywbbniIU6oUIIxAmEBKfmHpz>
#> Saved locally as:
#> • /var/folders/n_/swy48fpx1w76xyqp3qx2prz00000gn/T//RtmprGBEmF/Sbicolor_730_v5.1.gene.gff3
# Google drive link to indel snpEff annotated vcf file on Chr05
flink2 <- "https://drive.google.com/file/d/1LiOeDsfIwbsCuHbw9rCJ1FLOZqICTrfs/view?usp=sharing"
# Download indel snpEff annotated vcf file to tempdir()
vcf_file_indel <- folder_download_gd(drive_link = flink2,
output_path = vcf_dir,
is.folder = FALSE)
#> ℹ Not logged in as any specific Google user.
#> File downloaded:
#> • Sorghum_d8.noduplicates.Chr05.indel._markernamesadded_imputed_snpeff.vcf.gz
#> <id: 1LiOeDsfIwbsCuHbw9rCJ1FLOZqICTrfs>
#> Saved locally as:
#> • /var/folders/n_/swy48fpx1w76xyqp3qx2prz00000gn/T//RtmprGBEmF/Sorghum_d8.noduplicates.Chr05.indel._markernamesadded_imputed_snpeff.vcf.gz
# View downloaded files in tempdir
# list.files(vcf_dir)
# InDel variant extraction for lgs1 (Sobic.005G213600)
extract_variant(cand_gene_id = 'Sobic.005G213600',
gff_path = gff3,
vcf_dir = vcf_dir,
vcf_file = basename(vcf_file_indel),
output_path = vcf_dir,
outfile_suffix = 'lgs_variants_indel')
# Clean tempdir after variant extraction
# unlink(vcf_dir, recursive = TRUE)
# }