phiGT1 gp16 alignment and HMM models
Summary
The gene at the head of the podoviral internal virion protein module
is usually first flagged by matching to a GNAT-acetyltransferase
family, and subsequently found to be similar to P22 gene 14, or T7
IVPA. The purpose of model building here was to try to understand
the relationship of the phage proteins to cellular
acetyltransferases, and to test if their phylogenetic relationships
to each other mirrored the relationships of the tubular tail
proteins just upstream.
Technical
Starting from a 4 round psiblast matching set to P22 gene 14, SAM
was used to assemble an alignment. SAM was then used to expand the
developing family with homolog sets from the IVPA sets from T7,
phiGT1, LUZ24, Mx8, phiM5 to create a global alignment recognizing
all of these phage proteins. Representatives of each of numerous
matching pdb models of cellular acetyltransferases were then
similarly included. Other details were as
described. A preliminary tree revealed that the cellular
acetyltransferases clustered separately from the phage proteins. The
tree was then broken down into clades and clades were scored against
each other by HMM to HMM comparison by HHpred. There were sufficient
discrepancies between the SAM interclade alignments and the HHpred
interclade alignments that I opted to realign the clades to conform
to the HHpred results. The two nexus files contain a selected set of
aligned sequences, one including some cellular acetyltransferase
member, including an expansion of the 5NNP model which is a
eukaryotic family, and the other confined to phage sequences. The
division is because in tree analysis without a clock, the cellular
acetyltransferases diverge more slowly, and hence can't be included
in the phage gene timtree without corrupting the time scale. But
they can be used to discover that the root of the most divergent
phage clades is at the same time as the root of the most divergent
cellular clades. I assume this time to be at the beginning of life.
The slower divergence of the cellular lineages combined with the
general lack of quality models of the phage genes explains why the
phage genes tend to be recognized by matching to cellular families
rather than to other phage genes, even though by tree analysis phage
and cellular lineages appear to have been isolated since the
inception of the acetyl-CoA driven acetyltransferse.
Files included
- GT1gp16.documentation.html - this file.
- GT1gp16.p3.a2m - SAM alignment.
- GT1gp16.p3.asc.mod - SAM-formatted HMM
- GT1gp16.p3.hmm - Hmmer3-formatted HMM
- GT1gp16.p3.hhm - HHpred-formatted HMM
- GT1gp16.phage.set1.mb.nex - nexus file used to calculate the
timetree in [1]
- GT1gp16.phage_and_cell.mb.nex - nexus file with cellular
acetyltransferases included.