新科学想法 学术文库 学术文献 浏览文献

有读书笔记Homologous over-extension: a challenge for iterative similarity searches

3 quwubin 添加于 2010-4-12 22:13 | 2112 次阅读 | 0 个评论
  •  作 者

    Gonzalez MW, Pearson WR
  •  摘 要

    We have characterized a novel type of PSI-BLAST error, homologous over-extension (HOE), using embedded PFAM domain queries on searches against a reference library containing Pfam-annotated UniProt sequences and random synthetic sequences. PSI-BLAST makes two types of errors: alignments to non-homologous regions and HOE alignments that begin in a homologous region, but extend beyond the homology into neighboring sequence regions. When the neighboring sequence region contains a non-homologous domain, PSI-BLAST can incorporate the unrelated sequence into its position specific scoring matrix, which then finds non-homologous proteins with significant expectation values. HOE accounts for the largest fraction of the initial false positive (FP) errors, and the largest fraction of FPs at iteration 5. In searches against complete protein sequences, 5-9% of alignments at iteration 5 are non-homologous. HOE frequently begins in a partial protein domain; when partial domains are removed from the library, HOE errors decrease from 16 to 3% of weighted coverage (hard queries; 35-5% for sampled queries) and no-error searches increase from 2 to 58% weighed coverage (hard; 16-78% sampled). When HOE is reduced by not extending previously found sequences, PSI-BLAST specificity improves 4-8-fold, with little loss in sensitivity.
  •  详细资料

    • 文献种类: Journal Article
    • 期刊名称: Nucleic Acids Research
    • 期刊缩写: Nucleic Acids Res
    • 期卷页: 2010
    • 地址: Department of Biological Sciences, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 and Department of Biochemistry and Molecular Genetics, Jordan Hall Box 800733, Charlottesville, VA 22908, USA
    • ISBN: 0305-1048
    • 备注:PMID:20064877
  • 学科领域 生物医药 » 生物学

  •  标 签

  • 相关链接 DOI URL 

  •  quwubin 的文献笔记  订阅

    【原创】

    本文发现并确定了导致PSI-BLAST(注)产生假阳性的两种新的错误类型:

    1. 非同源匹配;
    2. HOE(homologus over-extension),同源区过渡延伸,从同源区域开始匹配,但是超过了同源区域,延伸到了旁边的非同源区域。
    并发现HOE是产生假阳性结果的主要原因,特别是当迭代次数达到5的时候。实验数据集包括:Query: PFAM domain; Database: Pfam注释的UniProt序列以及以及一些随机合成序列。

    本文的研究结果显示,对PSI-BLAST的使用需要慎重,对参数的把握以及评估更要慎重,根据自己的数据以及实验需求,预先测试、评估参数,尽量避免由于HOE所而导致假阳性结果的产生。

    注:PSI-BLAST (Position-Specific Iterated BLAST,位置特异性叠代BLAST,简称PSI-BLAST)是对蛋白质序列数据库进行搜索的改进,其主要思想是通过多次叠代找出最佳结果。具体做法是利用第一次搜索结果构建位置特异性分数矩阵,并用于第二次的搜索,第二次搜索结果用于第三次搜索,依此类推,直到找出最佳搜索结果。

管理选项: 导出文献|

评论(0 人)

facelist doodle 涂鸦板

Copyright;  © 新科学想法 2016-2017   浙公网安备 33010202000686号   ( 浙ICP备09035230号-1 )