phiGT1 gp7 scaffold alignment and HMMs
Summary
The purpose of the scaffold family build was to define the
phylogenetic neighborhood around phiGT1 scaffold by a timetree for
testing if it had descended coherently with the other phiGT1
structural genes, and to ascertain if there is ancient common
ancestry with the P22 scaffold protein.
Technical.
A four round psiblast search of nr plus envnr was aligned and
filtered for sequences significantly matching for at least 200
characters. Most of the shorter matches removed were marine
metagenome sequences, which are often partial. PhiGT1gp7 is
312 residues long, of which 296 were considered to be in significant
alignment. Numerous named phages were in the alignment.. These
include: EPV2, HIM624-A, VPp1, vB_ValP_IME271, phiM5, UFV-P2,
S-CBP2, NV1, tf, 1.262.O._10N.286.51.A9, Bjorn, SL4, PaP4, LUZ24,
phiIBB-PAA2=TL, phiIBB-PAA2, TL, vB_PaeP_C2-10_Ab22, DL54,
vB_PaeP_C1-14_Or, vB_PaeP_p2-10_Or1, MR299-2, PaP3, vB_CsaP_Ss1,
PhiCHU, NJ01=172-1, NJ01, 172-1, myPSH2311, phiEco32, ECBP2, LAMP,
vB_EcoP_SU10, SE131, 7-11, KBNP1711, 1, vB_CsaP_GAP52,
pCB2047-C=pCB2047-A=NYA-2014a, pCB2047-C, pCB2047-A, NYA-2014a,
KSP100, and Vp_R1. In those examples examined, the gene was upstream
of major capsid protein. Many of these have been annotated as
scaffold, presumably because of this relationship. The deepest
common ancestor of the collection thus far explored is estimated to
be at about 1.5 Gya. These sequences do not match any Pfam scaffold
family at interpro, and match to the P22 scaffold model at the
hhpred server with an insignificant score. However, a one to
one HMM to HMM comparison using hhsearch matches the GT1 family HMM
to a comparable P22 family HMM at E=1.9 x 10^-6. Because, E values
are proportional to numbers of models screened, this would
correspond to E ~= 0.01 in a full HMM to HMM search of Pfam, if both
the query and the target model were expanded to the same extent. In
combination with the genes being of similar size, in syntenic
position, and of similar secondary structure, I consider this to
confirm a common ancestor. The common ancestor to P22, based on the
surrounding structural genes, is expected to be > 2.5 Gya.
Because the quality of alignment indicated by residue posterior
scores only indicates a couple of dozen residues in confident
alignment, I have not as of this time attempted to fuse the GT1 and
P22 family alignments. Other tecnhical details are as
described.
Files included
GT1gp7.scaffold.documentation.html - this file.
GT1gp7.a3.over200.a2m - The SAM alignment
GT1gp7.a3.over200.asc.mod - SAM HMM
GT1gp7.a3.over200.hmm - hmmer3 HMM
GT1gp7.a3.over200.hhm - HHpred HMM
GT1gp7.PP54.set2.mb.nex - nexus file with
alignment used for timetree in [1]
GT1gp7.P22.hhr - GT1 to P22 HHpred comparison
Relationship to other scaffold proteins.
There is some structural information concerning the P22 scaffold
protein. PMID: Sun et al., 2000, pdb 2GP8. The scaffold protein is
present in copious amounts inside the prohead and aids in formation
of the capsid structure. It leaves the capsid during maturation.
Many, but not all, tailed phages have a scaffold protein of this
description. Sometimes it is attached to another head protein as a
propeptide cleaved away by a prohead protease as in HK97. It has not
been established to my knowledge if all scaffold proteins share a
common ancestor, or act by the same mechanism, although the P22-like
scaffold is also about 300 residues. The HK97 scaffold is only
about 100 residues. The bacteriophage lambda scaffold, nu3, is 201
residues. Some works apply the the term "scaffold" to less analogous
functions, such as head decorations that remain on the exterior of
the mature capsid and stabilize the structure.
Citations
Sun Y, Parker MH, Weigele P, Casjens S, Prevelige PE Jr., Krishna
NR. 2000. Structure of the coat protein-binding domain
of the scaffolding protein from a double-stranded DNA virus. J
Mol Biol 297:1105-202.