`
wanguan2000
  • 浏览: 67083 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

haploid genome

    博客分类:
  • seq
 
阅读更多

http://seqanswers.com/forums/archive/index.php/t-10182.html

To my knowledge, FreeBayes is significantly different than other variant detection systems in common use in that it is not limited to the analysis of haploid or diploid individuals

The GATK can be used to call the sex (X and Y) chromosomes, without explicit knowledge of the gender of the samples. In an ideal world, with perfect upfront data processing, we would get perfect genotypes on the sex chromosomes without knowledge of who is diploid on X and has no Y, and who is hemizygous on both. However, misalignment and mismapping contributes especially to these chromosomes, as their reference sequence is clearly of lower quality than the autosomal regions of the genome. Nevertheless, it is possible to get reasonably good SNP calls, even with simple data processing and basic filtering. Results with proper, full data processing as per the best practices in the GATK should lead to very good calls. You can view a presentation "The GATK Unified Genotyper on chrX and chrY" in GSA Public Drop Box

 

What I ended up doing was using GATK's UnifiedGenotyper, manually extracting the likelihoods for both of the homozygote genotypes, and calling a SNP if the likelihood of the alternative allele was above a certain amount higher than the likelihood of the reference allele (I believe I required the likelihood of the alt allele to be at least 3X greater than the ref allele, although I haven't tested extensively to find the best threshold).

I have used FreeBayes on haploid sequences with good results; it is recommended.

http://www.broadinstitute.org/gsa/wiki/index.php/Understanding_the_Unified_Genotyper%27s_VCF_files

 

git clone --recursive git://github.com/ekg/freebayes.git

 

 

 

Field Meaning GT GQ AD and DP PL
The genotype of this sample. For a diploid, the GT field indicates the two alleles carried by the sample, encoded by a 0 for the REF allele, 1 for the first ALT allele, 2 for the second ALT allele, etc. When there's a single ALT allele (the by far more common case), GT will be either:
  • 0/0 - the sample is homozygous reference
  • 0/1 - the sample is heterozygous, carrying 1 copy of each of the REF and ALT alleles
  • 1/1 - the sample is homozygous alternate

In the three examples above, NA12878 is T/G, G/G, and C/T.

The Genotype Quality, as a Phred-scaled confidence at the true genotype is the one provided in GT. In diploid case, if GT is 0/1, then GQ is really L(0/1) / (L(0/0) + L(0/1) + L(1/1)), where L is the likelihood of the NGS sequencing data under the model of that the sample is 0/0, 0/1/, or 1/1.
See the online documentation for AD and DP .
We provide the AD and DP fields since this is usually what downstream users want. However, the truly sophisticated users will want to directly use the likelihoods of the three genotypes 0/0, 0/1, and 1/1 provide in the PL field. These are normalized, Phred-scaled likelihoods for each of the 0/0, 0/1, and 1/1, without priors. To be concrete, for the het case, this is L(data given that the true genotype is 0/1). The most likely genotype (the one in GT) is scaled so that it's P = 1.0 (0 when Phred-scaled), and the other likelihoods reflect their Phred-scaled likelihoods relative to this most likely genotype. Currently only provided when the site is biallelic.
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics