phiGT1 gp16 alignment and HMM models

Summary

The gene at the head of the podoviral internal virion protein module is usually first flagged by matching to a GNAT-acetyltransferase family, and subsequently found to be similar to P22 gene 14, or T7 IVPA. The purpose of model building here was to try to understand the relationship of the phage proteins to cellular acetyltransferases, and to test if their phylogenetic relationships to each other mirrored the relationships of the tubular tail proteins just upstream.

Technical

Starting from a 4 round psiblast matching set to P22 gene 14, SAM was used to assemble an alignment. SAM was then used to expand the developing family with homolog sets from the IVPA sets from T7, phiGT1, LUZ24, Mx8, phiM5 to create a global alignment recognizing all of these phage proteins. Representatives of each of numerous matching pdb models of cellular acetyltransferases were then similarly included. Other details were as described.  A preliminary tree revealed that the cellular acetyltransferases clustered separately from the phage proteins. The tree was then broken down into clades and clades were scored against each other by HMM to HMM comparison by HHpred. There were sufficient discrepancies between the SAM interclade alignments and the HHpred interclade alignments that I opted to realign the clades to conform to the HHpred results. The two nexus files contain a selected set of aligned sequences, one including some cellular acetyltransferase member, including an expansion of the 5NNP model which is a eukaryotic family, and the other confined to phage sequences. The division is because in tree analysis without a clock, the cellular acetyltransferases diverge more slowly, and hence can't be included in the phage gene timtree without corrupting the time scale. But they can be used to discover that the root of the most divergent phage clades is at the same time as the root of the most divergent cellular clades. I assume this time to be at the beginning of life. The slower divergence of the cellular lineages combined with the general lack of quality models of the phage genes explains why the phage genes tend to be recognized by matching to cellular families rather than to other phage genes, even though by tree analysis phage and cellular lineages appear to have been isolated since the inception of the acetyl-CoA driven acetyltransferse.

Files included