Tuesday, February 06, 2007

Timeless questions in the study of population genetics

For those of us using genetic data to infer population structure and/or history, phylogenetic or phylogeographic patterns, one always asks the questions:

  1. How much sequence data do I need?
  2. Which loci, and how many loci?
  3. mtDNA versus nuclear DNA?

Anyway, in studies aiming for resolution of fine scale patterns, a combination approach is usually taken. Let’s say 1kb of mtDNA plus nuclear introns data… Microsatellites could be used to determine super-fine scale population structure. Good scientists use power analyses to figure out how much data is needed, others keep adding loci until $$$ runs short. Like everything else, data is subject to the law of diminishing returns, so while you could sequence a complete mt genome- much less will probably get you what you need (not to mention the probabalistic notion that ceteris paribus, smaller datasets will contain fewer errors)…

A forthcoming paper in Molecular Phylogenetics and Evolution (Non, et.al., in press) answers exactly this question- How much mtDNA sequence data will give you maximum resolution, while not going overboard and wasting money. They were explicitly looking at TMRCA and phylogenetic reconstruction.

Their results: Forget whole mt genome sequence, you get the same resolution from ½ that. Which half? From base 11399-3668. (RE: table, ratios/percents are polymorphic sites) which includes ND1, 16S, 12S,, D-loop, cytB, ND5 and 6…

Now 8kb of sequence is still a shit-load, but a lot less than 16Kb, the size if the whole mt genome…

I’m about to see how the resolution of this study compares to other studies using mt+nuclear data… Stay tuned…

0 comments: