During our lab meeting today, we discussed a recently published paper on insect phylogenomics on Proc. R. Soc. B entitled "Insect phylogenomics: results, problems and the impact of matrix composition" by Letsch et al. This is one of several studies on all-insect phylogenomics (for someprevious studies, see: 1, 2, 3, 4).
This study employs an extensive data set of ESTs (Expressed Sequence Tags), which are short sequences derived from reverse transcription of messenger RNA and also a proteome data set. We did not quite figure out how the proteome data set is different from the EST one. It seems quite a bit larger (787k bp vs 445k). It also uses a matrix optimization method to trim down the data sets to about 15% of their original size. Several analyses were done with a combination of unreduced, reduced, EST and proteome data sets. There are some interesting results. Most of the previously proposed deeper relationships among insect lineages were recovered, such as the monophyly of Polyneoptera, Paraneoptera (less human louse), Homometabola, Eumetabola (Paraneoptera + Holometabola). However, some of the relationships did make us raise our eye brows (the authors did raise theirs too). These include mayflies nested within the Orthoptera (at the same time making the latter paraphyletic), stick insects sister to web spinners, and Auchenorrhyncha consistently paraphyletic.
The most strange of all is the position of human louse. It goes from being the sister of all the remaining insects sampled to being sister to Holometabola. This is also where the authors try very hard to spin their paper. They claim/speculate that the problem of the human louse is due to matrix composition biases and that the human louse share more genes with the taxa it is not related to than with the taxa it should be related to (based on morphology). Well, this could be an interested and worthwhile hypothesis. But where is the test? The authors did not even give a number of the genes sampled for the human louse. Also, they did not discuss whether human louse was the only taxon that has the alleged matrix composition bias. If it is, then their hypothesis would receive somewhat more intuitive support. But if they have another 10 species that have biased matrix composition, which, however, were recovered with the 'expected' relationships and positions, what we could only say is that the focus on human louse is completely expedient. A lab mate suggested a method for actually testing the hypothesis on matrix composition bias. They could have randomized their data sets and remove the genes that they think cause the problems of the human house. They could also randomize the selection of taxa.
Other problems of the study include the small taxon sampling and the lack of samples of some important lineages such as stoneflies and Mantodea (and several more). Those were recognized by the author.
The situation of phylogenomic studies reminds me what Sydney Brenner called the "low input, high throughput, no output science". Huge amount of data has been generated to 'resolve' insect relationships, deep or shallow. Have we come to some trustworthy conclusions yet? Have we improved insect classifications? I am afraid not, not much. The results often either corroborate existing hypotheses (aka no change in classification) or create wild relationships that no one would ever believe (well, not just superstitiously) and thus no one dares or is willing to change the classification.
Ok, enough grumbling there. We did have some good discussions on approaches and issues in phylogenomics and I briefly summarize them here.
Finally, I do not really intend to discredit phylogenomic approaches altogether. Any kind of data, taken as observations and evidence, should be considered and utilized. Since phylogenomics is still a relatively new player in the game, it should be given some more time to see where it is heading and what it can offer.
Writings related to insects, biodiversity and science in general