项目编号: | 1354146
|
项目名称: | Estimating the Bayesian Phylogenetic Information Content of Systematic Data |
作者: | Paul Lewis
|
承担单位: | University of Connecticut
|
批准年: | 2013
|
开始日期: | 2014-09-01
|
结束日期: | 2018-08-31
|
资助金额: | USD600000
|
资助来源: | US-NSF
|
项目类别: | Standard Grant
|
国家: | US
|
语种: | 英语
|
特色学科分类: | Biological Sciences - Environmental Biology
|
英文关键词: | information content
; much information
; marginal likelihood
; bayesian statistical framework
; data set
; systematic community
; topological information
; phylogenetic information
; datum subset
; bayesian phylogenetic software
|
英文摘要: | This research will develop and test new analytical methods for estimating the quality and information content of biological data sets used to determine genealogical relationships (phylogenies) among species. Phylogenies are crucial in many areas of biology, from identifying emerging pathogens to studying the mechanisms of evolutionary change and the functioning of communities of species in nature. The methods to be developed in this research project will be announced in scientific journals, and will be provided in free, open-source software enabling researchers to apply the new methods to their own data. This project will also facilitate interdisciplinary training of postdoctoral associates, graduate students and faculty in biology and statistics. High school students will be involved in developing mobile apps that will provide useful, free tools to the scientific community.
The Bayesian statistical framework is widely used in phylogenetics and molecular evolution; however, the means for estimating the information content of data remains poorly developed. One reason for this may be the fact that marginal likelihoods and topology posterior probabilities are required for estimating Kullback-Leibler (KL) divergences, and accurate means of estimating these quantities have only recently been achieved. Correspondingly, primary objectives for this research are: (1) Evaluate the utility of KL-based information content measurement to answer a diversity of questions important to systematists, including: (a) How much information about tree topology is present in a data set? (b) Can topological information be separated from information about substitution model parameters? (c) How much information does one data subset have compared to a different data subset? (d) Do two data subsets contain conflicting phylogenetic information? (e) How much information is there about particular model parameters (e.g. a divergence time of interest)? (f) How much information is there for resolving particular clades? (2) Explore issues related to information content, such as: (a) a topological information content definition of saturation; (b) a KL-based method for estimating variable-tree marginal likelihoods for purposes of model selection; and (c) polytomy analyses. (3) Implement KL estimation of information content in existing Bayesian phylogenetics software to make it freely available to the systematics community. These objectives will be pursued using the variable tree IDR method to provide accurate estimates of all marginal likelihoods needed for KL estimation. The recently published conditional clade method allows accurate posterior probabilities of tree topologies to be estimated from sample conditional clade frequencies. Computer simulation experiments and analyses of relevant real world data sets will be used to evaluate the effectiveness of KL in measuring information content and in providing answers to the questions posed above. |
资源类型: | 项目
|
标识符: | http://119.78.100.158/handle/2HF3EXSE/95793
|
Appears in Collections: | 影响、适应和脆弱性 气候减缓与适应
|
There are no files associated with this item.
|
Recommended Citation: |
Paul Lewis. Estimating the Bayesian Phylogenetic Information Content of Systematic Data. 2013-01-01.
|
|
|