Documentation for large terminase model
Summary
This large terminase alignment was derived from an effort to include
the large terminase genes of all known tailed phages (Serwer et al.,
2004). This was done with repeated application of the target2k
script of the UCSC Sequence Alignment and Modeling system
(SAM). This alignment has been expanded many times since its
inception with inclusion of newly sequenced phages, as well as
prospective prophage sequences and metagenomic sequences to flesh
out various clades of interest. It has periodically been thinned as
a matter of practicality. It includes all large terminase families
from Pfam. It is a two domain model, including the N terminal P-loop
ATPase domain and the C-terminal ruvC domain. Unlike Pfam models, it
is not curated to avoid significant matching of its HMM to other
ATPases or other ruvC family enzymes. In a phage genome, it can be
expected to match a variety of helicases in addition to the large
terminase, and sometimes an ruvC recombinase, But for identification
of large terminase genes it is generally unambiguous as long as one
insists on matching to both domains. The picovirus packaging ATPases
match both domains, although the catalytic residues in the ruvC
domain are not conserved, hence alignment in that domain is less
accurate for them. The main function of this superfamily construct
in my work is to act as an alignment tool allowing construction of
timetrees across the full scope of tailed phages. In my studies, the
large terminase is used to establish the time scale for the other
structural proteins by congruence (Hardies et al., 2016).
Technical
This version of the large terminase alignment was augmented with
homologs of phiGT1, LUZ24, and P22 large terminases to improve
definition in those clades. Based on prior information about the
root of the large terminase tree, the bacteriophage lambda and T4
clades were retained to represent the root. For treemaking, it has
been observed that large terminase genes in phages can be
recombinant, particularly between the N- and C- terminal
domains. The .a2m file was filtered to retain only sequences
with over 350 residues in alignment. 710 sequences were retained.
Otherwise as
described.
Files included in archive
- lg-ter.documentation.html - this file
- lg-ter.k3.over350.a2m - the SAM alignment
- lg-ter.k3.over350.asc.mod - SAM formatted HMM
- lg-ter.k3.over350.hmm - hmmer3 formatted HMM
- lg-ter.k3.over350.hhm - HHpred formatted HMM
- lg-ter.pfuLUZset4.mb.nex - nexus file used in [1]
Citations
Serwer P, Hayes SJ, Zaman S, Lieman K, Rolando M, Hardies SC.
2004. Improved isolation of undersampled bacteriophages: finding
of distant terminase genes. Virology 329:412-24. doi:
10.1016/j.virol.2004.08.021.
Hardies SC, Thomas JA, Black L, Weintraub L, Hwang CY, Cho BC.
2016. Identification of strutural and morphogenesis genes of Pseudoalteromonas
phage phiRIO-1 and placement within the evolutionary history of Podoviridae.
Virology 489:116-27. doi: 10.1016/j.virol.2015.12.005.