Program source:
https://users.soe.ucsc.edu/~karplus/projects-compbio-html/sam2src/
Citations:
The program system is still available, but hasn't been maintained
and may be difficult to install.
General use:
target2k -seed {fasta file, or prior .a2m file} -homologs {fasta
file of proposed homologs} -full_seq_align 1 -out {initial
alignment name} >log 2>&1
target2k -seed {initial alignment.a2m} -tuneup -full_seq_align 1
-out {tuned alignment name} >log 2>&1
Generally I retain alignments and may add additional sequences
from homolog sets from a variety of sources:
Characteristics of the output:
SAM iteratively constructs an HMM of the input alignment (or
initially of an individual sequence) and screens the homologs set
for statistical significance. Sequences accepted are added
to the alignment and the whole of the alignment is reoptimized for
use in the next iteration. The alignment algorithm is not
progressive like it is with clustal, but rather uses the
Baum-Welch algorithm. That confers a capability to improve
its ascertainment of where gaps should be excluded as more
sequences are added with the result that it more effectively
sweeps gaps out of regions with conserved secondary
structure. Within the aligned sequences, segments not
meeting a threshold for significant alignment across at least half
the entries are demoted to "insert" status. These are marked
in the .a2m file in lower case, and will not be passed on to the
tree making algorithm. However, they may be rescued by the
addition of nw sequences in a later step. There is typcally
a marked increase in aligned residues during the tuneup operation.
Post processing:
After tuneup the product .a2m file has two copies of each
sequence.
Either:
In SAM w0.5 {file.a2m} {file.mod} creates a SAM formatted HMM
hmmscore {result name} -i {file.mod} -db {multifasta file or
another .a2m file} -sw 0 scores a sequence collection
against the SAM HMM
To convert to a Hmmer3 HMM
hmmconvert(from SAM package) {modelname.asc} -model_file
{file.mod} converts the SAM binary file to an ascii
format
S3H2convert.pl is:
#Program obtained from
http://www.mrc-lmb.cam.ac.uk/genomes/julian/convert/convert.html
#Authors: Martin Madera, Julian Gough
S3H2convert.pl {SAM ascii model} produces a hmmer2
formatted file with extension .con.hmm
hmmconvert(from hmmer package) {hmmer2 formated HMM}
>{hmmer3 formatted HMM}
The hmmer3 HMM can be used with hmmeralign in the hmmer package
or clustal omega to align an arbitrary set of homologs to be
consistent with the alignment from which the original HMM was
made.
To convert to a HHpred-style HMM
addss (from the hhsuite package) {.a2m file} {.a3m file}
-a3m Adds secondary structure to the alignment with
Psipred, but using the .a2m alignment itself rather than doing a
psiblast search.
hhmake -i {.a3m file} -o {hhpred model.hhm}
hhsearch -cal -i {hhpred model.hhm} -d {scope database from
hhpred libraries} -o {hpred.cal.hhr} calibrates the
model
hhsearch -i {one hhpred model} -d {another hhpred model} -o
{model1x2.hhr} Gives a detailed report of how
well two models correspond.
Tree checking
When alignments are expanded to the limits of similarity
detection, there is a risk that a cluster of nonhomologous
sequences will be included, or that portions of the alignment are
of insufficient accuracy to support quality tree production.
In these cases, I typically make an NJ tree for the full
alignment, and select representative of each major clade for a
MrBayes tree. Then for the most distantly linked clades, I
extract the sequences for each clade, make a new alignment, and
then do the HMM to HMM comparison to discover what part of the
sequence if any have good posterior residue alignment
scores. Based on control experiments where there is
structural homology determined as an arbiter of "correct"
alignment, segments with mostly 7, 8, or 9 scores tend to
correspond to regions of structural homology.