EMBOSS: scopalign


Program scopalign

Function

Generate alignments for SCOP families

Description

scopalign parses a SCOP classification file in EMBL-like format generated by the EMBOSS applications scope or nrscope, and domain coordinate files generated by the EMBOSS application domainer, and calls stamp to generate structural alignments for each SCOP family in turn.

VERY IMPORTANT NOTE

scopalign will only run with with a version of stamp which has been modified so that PDB ID codes of length greater than 4 characters are acceptable. This involves a trivial change to the stamp module getdomain.c (around line number 155), a 4 must be changed to a 7 as follows:
temp=getfile(domain[0].id,dirfile,4,OUTPUT);
temp=getfile(domain[0].id,dirfile,7,OUTPUT);

The modified code is kept on the HGMP file system in /packages/stamp/src2 WHEN RUNNING SCOPALIGN AT THE HGMP IT IS ESSENTIAL THAT THE COMMAND 'use stamp2' (which runs the script /packages/menu/USE/stamp2) IS GIVEN BEFORE SCOPALIGN IS RUN. This will ensure that the modified version of stamp is used.

Usage

Here is a sample session with scopalign:

% scopalign

Command line arguments

   Mandatory qualifiers:
  [-scopf]             infile     Name of scop file for input (embl-like
                                  format)
  [-path]              string     Location of alignment files for output

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-scopf]
(Parameter 1)
Name of scop file for input (embl-like format) Input file Escop.dat
[-path]
(Parameter 2)
Location of alignment files for output Any string is accepted ./
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

scopalign parses a SCOP classification file in EMBL-like format generated by the EMBOSS applications scope or nrscope, and domain coordinate files generated by the EMBOSS application domainer.

Output file format

The names of the output files are identical to the names of the families given in the SCOP classification records, except that if a file of a certain name already exists, then an "_1", "_2" etc will be added as appropriate.

The format of the scopalign output file (Figure 1) is similar to the output file generated by stamp when issued with the following three types of command:

(1) stamp -l ./stamps_file.dom -s -n 2 -slide 5 -prefix ./stamps_file -d ./stamps_file.set;sorttrans -f ./stamps_file.scan -s Sc 2.5 > ./stamps_file.sort;stamp -l ./stamps_file.sort -prefix ./stamps_file > ./stamps_file.log

(2) poststamp -f ./stamps_file.3 -min 0.5

(3) ver2hor -f ./stamps_file.3.post > ./stamps_file.out

However, the SCOP classification records for the appopriate family are written above the alignment, no dssp assignments are given, and only the 'Post similar' line is given. Also, 7 character domain identifier codes taken from the scop classificaiton file are given.

Figure 1 Example of scopalign output file

CL   All alpha proteins
XX
FO   Globin-like
XX
SF   Globin-like
XX
FA   Globins
XX
Number               10        20        30        40        50    
d1vrea_              LSAAQRQVVASTWKDIAgsdngAGVGKECFTKFLSAHHDMAAV f gFS
d3sdhb_      svydaaaqLTADVKKDLRDSWKVIG sd kKGNGVALMTTLFADNQETIGYfkrlGN
d3hbia_      svydaaaqLTADVKKDLRDSWKVIG sd kKGNGVALMTTLFADNQETIGYfkrlGN
d3sdha_      svydaaaqLTADVKKDLRDSWKVIG sd kKGNGVALMTTLFADNQETIGYfkrlGN
Post_similar --------11111111111111111-00-1111111111111111111111-0-111

Number        60        70        80        90       100       110 
d1vrea_      GAS   dpGVADLGAKVLAQIGVAVSHLgDEGKMVAEMKAVGVRHKgygnkhIKAEY
d3sdhb_      VSQgmandKLRGHSITLMYALQNFIDQLdNPDDLVCVVEKFAVNHI  t rkISAAE
d3hbia_      VSQgmandKLRGHSITLMYALQNFIDQLdNPDDLVCVVEKLAVNHI  t rkISAAE
d3sdha_      VSQgmandKLRGHSITLMYALQNFIDQLdNPDDLVCVVEKFAVNHI  t rkISAAE
Post_similar 111---0011111111111111111111011111111111111111--0-0011111

Number          120       130       140       150       160
d1vrea_      FEPlGASL LSAMEhriggkMNAAAKDAWAAAYADisgalisglqs
d3sdhb_      FGK INGPiKKVLA s k nFGDKYANAWAKLVAVvqa al     
d3hbia_      FGK INGPiKKVLA s k nFGDKYANAWAKLVAVvqa al     
d3sdha_      FGK INGPiKKVLA s k nFGDKYANAWAKLVAVvqa al     
Post_similar 111-1111-11111-0-0-1111111111111111100-00-----

Data files

None

Notes

scopalign will only run with with a version of stamp which has been modified so that PDB ID codes of length greater than 4 characters are acceptable. This involves a trivial change to the stamp module getdomain.c (around line number 155), a 4 must be changed to a 7 as follows:
temp=getfile(domain[0].id,dirfile,4,OUTPUT);
temp=getfile(domain[0].id,dirfile,7,OUTPUT);

The modified code is kept on the HGMP file system in /packages/stamp/src2 WHEN RUNNING SCOPALIGN AT THE HGMP IT IS ESSENTIAL THAT THE COMMAND 'use stamp2' (which runs the script /packages/menu/USE/stamp2) IS GIVEN BEFORE SCOPALIGN IS RUN. This will ensure that the modified version of stamp is used.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
contactsReads coordinate files and writes contact files
dichetParse dictionary of heterogen groups
interfaceReads coordinate files and writes inter-chain contact files
psiblastsRuns PSI-BLAST given scopalign alignments
seqsortRemoves ambiguities from a set of hits resulting from a database search
siggenGenerates a sparse protein signature
sigscanScans a sparse protein signature against swissprot

Author(s)

This application was written by Jon Ison (jison@hgmp.mrc.ac.uk)

History

Written (May 2001) - Jon Ison

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments