新科学想法 › 文献管理 › 浏览文献

Identifying gene-disease associations using centrality on a literature mined gene-interaction network

lyshaerbin 添加于 2010-4-22 21:01 | 1804 次阅读 | 0 个评论

作者
Ozgur A, Vu T, Erkan G, Radev DR
摘要
Motivation: Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide information manually extracted from the literature is limited. Another challenge is that determining diseaserelated genes requires laborious experiments. Therefore, predicting good candidate genes before experimental analysis will save time and effort. We introduce an automatic approach based on text mining and network analysis to predict gene-disease associations. We collected an initial set of known disease-related genes and built an interaction network by automatic literature mining based on dependency parsing and support vector machines. Our hypothesis is that the central genes in this disease-specific network are likely to be related to the disease. We used the degree, eigenvector, betweenness and closeness centrality metrics to rank the genes in the network. Results: The proposed approach can be used to extract known and to infer unknown gene-disease associations. We evaluated the approach for prostate cancer. Eigenvector and degree centrality achieved high accuracy. A total of 95% of the top 20 genes ranked by these methods are confirmed to be related to prostate cancer. On the other hand, betweenness and closeness centrality predicted more genes whose relation to the disease is currently unknown and are candidates for experimental study. Availability: A web-based system for browsing the disease-specific gene-interaction networks is available at: http://gin.ncibi.org
详细资料
- 文献种类:期刊
- 期刊名称: Bioinformatics
- 期刊缩写: Bioinformatics
- 期卷页: 2008年第24卷第13期 i277-i285页
- ISBN: 1367-4803
所属群组

bioinformatics
标签

疾病基因预测
相关链接 DOI URL

lyshaerbin 的文献笔记订阅

【原创】

本文主要是利用文本挖掘以及网络分析的方法进行疾病基因的预测。首先寻找已知的疾病基因作为种子基因，然后利用文本挖掘技术构建基因互作网络，主要是从全文中筛选句子中共同出现两个基因以及互作词语，利用依存分析以及SVM构建了基因互作网络，这种方法不仅考虑了种子基因间的互作，以及种子与邻居基因间的互作，而且考虑了非种子基因间的互作。以前的方法不考虑非种子基因间的互作会使方法偏向于种子基因。构建了基因互作网络之后，利用4个拓扑学测度对基因进行排秩，然后取top20作为候选的疾病基因。四个测度分别为度、特征向量、紧密度与介数。以前方法都是将基因的贡献等同，本文利用特征向量来考虑不同基因对于疾病的贡献，是一个新的方法，主要思想是来自于社会网络的威望效应。利用4中测度对基因进行排秩结构发现，度和特征向量的效果比较好，介数以及紧密度预测较多的基因用于以后的实验分析。本文分析的疾病为前列腺癌疾病，用于验证的数据库为PGDB数据库，如果新的预测不在此数据库内，在搜索相关文献进行证实。

启示：以前的疾病基因优化的方法对于种子基因对于疾病的贡献都是等同的，能否利用本文提出的特征向量来对种子基因进行加权，然后对候选的疾病基因进行优化。

管理选项： 导出文献