Documentation for large terminase model

Summary

This large terminase alignment was derived from an effort to include the large terminase genes of all known tailed phages (Serwer et al., 2004). This was done with repeated application of the target2k script of the UCSC Sequence Alignment and Modeling system (SAM).  This alignment has been expanded many times since its inception with inclusion of newly sequenced phages, as well as prospective prophage sequences and metagenomic sequences to flesh out various clades of interest. It has periodically been thinned as a matter of practicality. It includes all large terminase families from Pfam. It is a two domain model, including the N terminal P-loop ATPase domain and the C-terminal ruvC domain. Unlike Pfam models, it is not curated to avoid significant matching of its HMM to other ATPases or other ruvC family enzymes. In a phage genome, it can be expected to match a variety of helicases in addition to the large terminase, and sometimes an ruvC recombinase, But for identification of large terminase genes it is generally unambiguous as long as one insists on matching to both domains. The picovirus packaging ATPases match both domains, although the catalytic residues in the ruvC domain are not conserved, hence alignment in that domain is less accurate for them. The main function of this superfamily construct in my work is to act as an alignment tool allowing construction of timetrees across the full scope of tailed phages. In my studies, the large terminase is used to establish the time scale for the other structural proteins by congruence (Hardies et al., 2016).

Technical

This version of the large terminase alignment was augmented with homologs of phiGT1, LUZ24, and P22 large terminases to improve definition in those clades. Based on prior information about the root of the large terminase tree, the bacteriophage lambda and T4 clades were retained to represent the root. For treemaking, it has been observed that large terminase genes in phages can be recombinant, particularly between the N- and C- terminal domains.  The .a2m file was filtered to retain only sequences with over 350 residues in alignment. 710 sequences were retained. Otherwise as described.

Files included in archive

Citations

Serwer P, Hayes SJ, Zaman S, Lieman K, Rolando M, Hardies SC. 2004. Improved isolation of undersampled bacteriophages: finding of distant terminase genes.  Virology 329:412-24.  doi: 10.1016/j.virol.2004.08.021.

Hardies SC, Thomas JA, Black L, Weintraub L, Hwang CY, Cho BC. 2016. Identification of strutural and morphogenesis genes of Pseudoalteromonas phage phiRIO-1 and placement within the evolutionary history of Podoviridae. Virology 489:116-27. doi: 10.1016/j.virol.2015.12.005.