phiGT1 gp7 scaffold alignment and HMMs

Summary

The purpose of the scaffold family build was to define the phylogenetic neighborhood around phiGT1 scaffold by a timetree for testing if it had descended coherently with the other phiGT1 structural genes, and to ascertain if there is ancient common ancestry with the P22 scaffold protein.

Technical.

A four round psiblast search of nr  plus envnr was aligned and filtered for sequences significantly matching for at least 200 characters.  Most of the shorter matches removed were marine metagenome sequences, which are often partial.  PhiGT1gp7 is 312 residues long, of which 296 were considered to be in significant alignment. Numerous named phages were in the alignment.. These include: EPV2, HIM624-A, VPp1, vB_ValP_IME271, phiM5, UFV-P2, S-CBP2, NV1, tf, 1.262.O._10N.286.51.A9, Bjorn, SL4, PaP4, LUZ24, phiIBB-PAA2=TL, phiIBB-PAA2, TL, vB_PaeP_C2-10_Ab22, DL54, vB_PaeP_C1-14_Or, vB_PaeP_p2-10_Or1, MR299-2, PaP3, vB_CsaP_Ss1, PhiCHU, NJ01=172-1, NJ01, 172-1, myPSH2311, phiEco32, ECBP2, LAMP, vB_EcoP_SU10, SE131, 7-11, KBNP1711, 1, vB_CsaP_GAP52, pCB2047-C=pCB2047-A=NYA-2014a, pCB2047-C, pCB2047-A, NYA-2014a, KSP100, and Vp_R1. In those examples examined, the gene was upstream of major capsid protein. Many of these have been annotated as scaffold, presumably because of this relationship.  The deepest common ancestor of the collection thus far explored is estimated to be at about 1.5 Gya. These sequences do not match any Pfam scaffold family at interpro, and match to the P22 scaffold model at the hhpred server with an insignificant score.  However, a one to one HMM to HMM comparison using hhsearch matches the GT1 family HMM to a comparable P22 family HMM at E=1.9 x 10^-6. Because, E values are proportional to numbers of models screened, this would correspond to E ~= 0.01 in a full HMM to HMM search of Pfam, if both the query and the target model were expanded to the same extent. In combination with the genes being of similar size, in syntenic position, and of similar secondary structure, I consider this to confirm a common ancestor. The common ancestor to P22, based on the surrounding structural genes, is expected to be > 2.5 Gya. Because the quality of alignment indicated by residue posterior scores only indicates a couple of dozen residues in confident alignment, I have not as of this time attempted to fuse the GT1 and P22 family alignments.  Other tecnhical details are as described.

Files included

GT1gp7.scaffold.documentation.html  -  this file.
GT1gp7.a3.over200.a2m  -  The SAM alignment
GT1gp7.a3.over200.asc.mod  -  SAM HMM
GT1gp7.a3.over200.hmm  -  hmmer3 HMM
GT1gp7.a3.over200.hhm  -  HHpred HMM
GT1gp7.PP54.set2.mb.nex  -   nexus file with alignment used for timetree in [1]
GT1gp7.P22.hhr  -  GT1 to P22 HHpred comparison

Relationship to other scaffold proteins.

There is some structural information concerning the P22 scaffold protein. PMID: Sun et al., 2000, pdb 2GP8. The scaffold protein is present in copious amounts inside the prohead and aids in formation of the capsid structure. It leaves the capsid during maturation. Many, but not all, tailed phages have a scaffold protein of this description. Sometimes it is attached to another head protein as a propeptide cleaved away by a prohead protease as in HK97. It has not been established to my knowledge if all scaffold proteins share a common ancestor, or act by the same mechanism, although the P22-like scaffold is also about 300 residues.  The HK97 scaffold is only about 100 residues. The bacteriophage lambda scaffold, nu3, is 201 residues. Some works apply the the term "scaffold" to less analogous functions, such as head decorations that remain on the exterior of the mature capsid and stabilize the structure.

Citations

Sun Y, Parker MH, Weigele P, Casjens S, Prevelige PE Jr., Krishna NR.  2000.  Structure of the coat protein-binding domain of the scaffolding protein from a double-stranded DNA virus.  J Mol Biol 297:1105-202.