http://seqanswers.com/forums/archive/index.php/t-10182.html
To my knowledge, FreeBayes is significantly different than other variant
detection systems in common use in that it is not limited to the
analysis of haploid or diploid individuals
The GATK can be used to call the sex (X and Y) chromosomes, without
explicit knowledge of the gender of the samples. In an ideal world,
with perfect upfront data processing, we would get perfect genotypes on
the sex chromosomes without knowledge of who is diploid on X and has no
Y, and who is hemizygous on both. However, misalignment and mismapping
contributes especially to these chromosomes, as their reference sequence
is clearly of lower quality than the autosomal regions of the genome.
Nevertheless, it is possible to get reasonably good SNP calls, even with
simple data processing and basic filtering. Results with proper, full
data processing as per the best practices in the GATK should lead to
very good calls. You can view a presentation "The GATK Unified
Genotyper on chrX and chrY" in GSA Public Drop Box
What I ended up doing was using GATK's UnifiedGenotyper, manually
extracting the likelihoods for both of the homozygote genotypes, and
calling a SNP if the likelihood of the alternative allele was above a
certain amount higher than the likelihood of the reference allele (I
believe I required the likelihood of the alt allele to be at least 3X
greater than the ref allele, although I haven't tested extensively to
find the best threshold).
I have used FreeBayes on haploid sequences with good results; it is recommended.
http://www.broadinstitute.org/gsa/wiki/index.php/Understanding_the_Unified_Genotyper%27s_VCF_files
git clone --recursive git://github.com/ekg/freebayes.git
Field
Meaning
GT
The genotype of this sample. For a diploid, the GT field
indicates the two alleles carried by the sample, encoded by a 0 for the
REF allele, 1 for the first ALT allele, 2 for the second ALT allele,
etc. When there's a single ALT allele (the by far more common case), GT
will be either:
- 0/0 - the sample is homozygous reference
- 0/1 - the sample is heterozygous, carrying 1 copy of each of the REF and ALT alleles
- 1/1 - the sample is homozygous alternate
In the three examples above, NA12878 is T/G, G/G, and C/T.
|
GQ
The Genotype Quality, as a Phred-scaled confidence at the true
genotype is the one provided in GT. In diploid case, if GT is 0/1,
then GQ is really L(0/1) / (L(0/0) + L(0/1) + L(1/1)), where L is the
likelihood of the NGS sequencing data under the model of that the sample
is 0/0, 0/1/, or 1/1.
|
AD and DP
See the online documentation for AD
and DP
.
|
PL
We provide the AD and DP fields since this is usually what
downstream users want. However, the truly sophisticated users will want
to directly use the likelihoods of the three genotypes 0/0, 0/1, and
1/1 provide in the PL field. These are normalized, Phred-scaled
likelihoods for each of the 0/0, 0/1, and 1/1, without priors. To be
concrete, for the het case, this is L(data given that the true genotype
is 0/1). The most likely genotype (the one in GT) is scaled so that
it's P = 1.0 (0 when Phred-scaled), and the other likelihoods reflect
their Phred-scaled likelihoods relative to this most likely genotype.
Currently only provided when the site is biallelic.
|
分享到:
相关推荐
The latter assembly leads a great improvement of quality to the previous genome assembled from the 54x haploid SMRT data. MECAT performance were compared with PBcR-Mhap pipeline, FALCON and Canu(v1.3...
This paper introduced the research progress of China’S maize haploid in producing approaches,the identification method and the doubling method,and made a summary for the application value of ...
本研究以花培3号×豫麦57的168个双单倍体(doubled haploid,DH)群体为材料,根据2年12个环境下千粒重性状的表型数据和含有323个位点的分子遗传图谱,对千粒重性状进行了QTL分析。结果共检测到40个QTL,主要集中在染色体...
断断续续 快速单倍体变异调用和核心基因组比对 作者 概要 Snippy在单倍体参考基因组和您的NGS序列读数之间找到SNP。 它将找到替换(snps)和插入/删除(indels)。 它将在单台计算机上使用尽可能多的CPU(已测试64核...
vcf2phylip 将VCF格式的SNP转换为PHYLIP,NEXUS,二进制NEXUS或FASTA比对以进行系统发育分析 简要描述;简介 该脚本以VCF文件作为输入,并将使用SNP基因型创建PHYLIP(松弛版本),FASTA,NEXUS或二进制NEXUS格式的...