contacts

 

Function

Reads CCF files (clean coordinate files) and writes CON files (contact files) of intra-chain residue-residue contact data

Description

This program is part of a suite of EMBOSS applications that directly or indirectly make use of the protein structure databases pdb and scop. This program is part of an experimental analysis pipeline described in an accompanying document. We provide the software in the hope that it will be useful. The applications were designed for specific research purposes and may not be useful or reliable in contexts other than the described pipeline. The development of the suite was coordinated by Jon Ison to whom enquiries and bug reports should be sent (email jison@hgmp.mrc.ac.uk). ** Knowledge of the physical contacts that amino acid residues within a protein or domain make with one another is required for several different analyses. contacts calculates intra-chain residue-residue contact data from protein and domain coordinate files.

Algorithm

Contact between two residues is defined as when the van der Waals surface of any atom of the first residue comes within the threshold contact distance of the van der Waals surface of any atom of the second residue. The threshold contact distance is a user-defined distance with a default value of 1 Angstrom.

Usage

Here is a sample session with contacts


% contacts 
Reads CCF files (clean coordinate files) and writes CON files (contact
files) of intra-chain residue-residue contact data.
Location of CCF files (clean coordinate files) (input) [./]: structure
Name of data file with van der Waals radii [Evdw.dat]: 
Threshold contact distance [1.0]: 1
Location of CON files (contact files) (output) [./]: 
Name of log file for the build [contacts.log]: 

1cs4
1ii7
D1CS4A_
D1II7A_

Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-cpdbdir]           dirlist    This option specifies the location of CCF
                                  files (clean coordinate files) (input). A
                                  'clean cordinate file' contains protein
                                  coordinate and derived data for a single PDB
                                  file ('protein clean coordinate file') or a
                                  single domain from SCOP or CATH ('domain
                                  clean coordinate file'), in CCF format
                                  (EMBL-like). The files, generated by using
                                  PDBPARSE (PDB files) or DOMAINER (domains),
                                  contain 'cleaned-up' data that is
                                  self-consistent and error-corrected. Records
                                  for residue solvent accessibility and
                                  secondary structure are added to the file by
                                  using PDBPLUS.
   -vdwfile            datafile   This option specifies the name of the data
                                  file with van der Waals radii of atoms for
                                  different amino acid residues.
   -threshold          float      Contact between two residues is defined as
                                  when the van der Waals surface of any atom
                                  of the first residue comes within the
                                  threshold contact distance of the van der
                                  Waals surface of any atom of the second
                                  residue. The threshold contact distance is a
                                  user-defined distance with a default value
                                  of 1 Angstrom.
  [-conoutdir]         outdir     This option specifies the location of CON
                                  files (contact files) (output). A 'contact
                                  file' contains contact data for a protein or
                                  a domain from SCOP or CATH, in the CON
                                  format (EMBL-like). The contacts may be
                                  intra-chain residue-residue, inter-chain
                                  residue-residue or residue-ligand. The files
                                  are generated by using CONTACTS, INTERFACE
                                  and FUNKY.
   -conerrfile         outfile    The log file contains messages about any
                                  errors arising while contacts ran.

   Additional (Optional) qualifiers:
   -[no]ccfnaming      boolean    This option specifies whether to use pdbid
                                  code to name the output files. If set, the
                                  PDB identifier code (from the PDB file) is
                                  used to name the file. Otherwise, the output
                                  files have the same names as the input
                                  files.
   -skip               boolean    Whether to calculate contacts between
                                  residue adjacent in sequence.
   -ignore             float      If any two atoms from two different residues
                                  are at least this distance apart then no
                                  futher inter-atomic contacts will be checked
                                  for for that residue pair . This speeds the
                                  calculation up considerably.

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-conerrfile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths


Standard (Mandatory) qualifiers Allowed values Default
[-cpdbdir]
(Parameter 1)
This option specifies the location of CCF files (clean coordinate files) (input). A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. Directory with files ./
-vdwfile This option specifies the name of the data file with van der Waals radii of atoms for different amino acid residues. Data file Evdw.dat
-threshold Contact between two residues is defined as when the van der Waals surface of any atom of the first residue comes within the threshold contact distance of the van der Waals surface of any atom of the second residue. The threshold contact distance is a user-defined distance with a default value of 1 Angstrom. Any numeric value 1.0
[-conoutdir]
(Parameter 2)
This option specifies the location of CON files (contact files) (output). A 'contact file' contains contact data for a protein or a domain from SCOP or CATH, in the CON format (EMBL-like). The contacts may be intra-chain residue-residue, inter-chain residue-residue or residue-ligand. The files are generated by using CONTACTS, INTERFACE and FUNKY. Output directory ./
-conerrfile The log file contains messages about any errors arising while contacts ran. Output file contacts.log
Additional (Optional) qualifiers Allowed values Default
-[no]ccfnaming This option specifies whether to use pdbid code to name the output files. If set, the PDB identifier code (from the PDB file) is used to name the file. Otherwise, the output files have the same names as the input files. Boolean value Yes/No Yes
-skip Whether to calculate contacts between residue adjacent in sequence. Boolean value Yes/No No
-ignore If any two atoms from two different residues are at least this distance apart then no futher inter-atomic contacts will be checked for for that residue pair . This speeds the calculation up considerably. Any numeric value 20.0
Advanced (Unprompted) qualifiers Allowed values Default
(none)

Input file format

contacts reads any normal sequence USAs.

The format of the clean coordinate file is described in pdbparse

Output file format

Output files for usage example

File: 1cs4.con

XX   Intra-chain residue-residue contact data.
XX
TY   INTRA
XX
EX   THRESH 1.0; IGNORE 20.0; NMOD 1; NCHA 1
XX
NE   1
XX
EN   [1]
XX
ID   PDB 1cs4; DOM .; LIG .
XX
CN   MO 1; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 52; NRES2 .
XX
S1   SEQUENCE    52 AA;   5817 MW;  47362A43 CRC32;
     ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS
XX
NC   SM 163; LI .
XX
SM   ASP 2 ; ILE 3
SM   ASP 2 ; GLU 4
SM   ASP 2 ; ASP 46
SM   ASP 2 ; CYS 47
SM   ILE 3 ; GLU 4
SM   ILE 3 ; GLY 5
SM   ILE 3 ; PHE 6
SM   ILE 3 ; LEU 9
SM   ILE 3 ; LEU 25
SM   ILE 3 ; ASP 46
SM   GLU 4 ; GLY 5
SM   GLU 4 ; PHE 6
SM   GLY 5 ; PHE 6
SM   GLY 5 ; THR 7
SM   GLY 5 ; SER 8
SM   GLY 5 ; LEU 9
SM   PHE 6 ; THR 7
SM   PHE 6 ; SER 8
SM   PHE 6 ; LEU 9
SM   PHE 6 ; ALA 10
SM   PHE 6 ; LEU 18
SM   PHE 6 ; LEU 22
SM   PHE 6 ; GLY 45
SM   PHE 6 ; ASP 46
SM   THR 7 ; SER 8
SM   THR 7 ; LEU 9
SM   THR 7 ; ALA 10
SM   THR 7 ; SER 11
SM   SER 8 ; LEU 9
SM   SER 8 ; ALA 10
SM   SER 8 ; SER 11


  [Part of this file has been deleted for brevity]

SM   PHE 29 ; LYS 31
SM   PHE 29 ; LEU 32
SM   PHE 29 ; ALA 33
SM   ASP 30 ; LYS 31
SM   ASP 30 ; LEU 32
SM   ASP 30 ; ALA 33
SM   ASP 30 ; ALA 34
SM   ASP 30 ; ARG 40
SM   LYS 31 ; LEU 32
SM   LYS 31 ; ALA 33
SM   LYS 31 ; ALA 34
SM   LYS 31 ; GLU 35
SM   LEU 32 ; ALA 33
SM   LEU 32 ; ALA 34
SM   LEU 32 ; GLU 35
SM   LEU 32 ; ASN 36
SM   ALA 33 ; ALA 34
SM   ALA 33 ; GLU 35
SM   ALA 33 ; ASN 36
SM   ALA 33 ; HIS 37
SM   ALA 33 ; CYS 38
SM   ALA 34 ; GLU 35
SM   ALA 34 ; ASN 36
SM   ALA 34 ; HIS 37
SM   GLU 35 ; ASN 36
SM   GLU 35 ; HIS 37
SM   ASN 36 ; HIS 37
SM   ASN 36 ; CYS 38
SM   HIS 37 ; CYS 38
SM   HIS 37 ; LEU 39
SM   CYS 38 ; LEU 39
SM   CYS 38 ; ARG 40
SM   LEU 39 ; ARG 40
SM   LEU 39 ; ILE 41
SM   ARG 40 ; ILE 41
SM   ARG 40 ; LYS 42
SM   ARG 40 ; ILE 43
SM   ILE 41 ; LYS 42
SM   LYS 42 ; ILE 43
SM   LYS 42 ; LEU 44
SM   LYS 42 ; CYS 47
SM   ILE 43 ; LEU 44
SM   ILE 43 ; GLY 45
SM   ILE 43 ; CYS 47
SM   LEU 44 ; GLY 45
SM   LEU 44 ; ASP 46
SM   LEU 44 ; CYS 47
SM   GLY 45 ; ASP 46
SM   GLY 45 ; CYS 47
SM   ASP 46 ; CYS 47
//

File: 1ii7.con

XX   Intra-chain residue-residue contact data.
XX
TY   INTRA
XX
EX   THRESH 1.0; IGNORE 20.0; NMOD 1; NCHA 1
XX
NE   1
XX
EN   [1]
XX
ID   PDB 1ii7; DOM .; LIG .
XX
CN   MO 1; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 65; NRES2 .
XX
S1   SEQUENCE    65 AA;   7396 MW;  0CFB92A3 CRC32;
     MKFAHLADIH LGYEQFHKPQ REEEFAEAFK NALEIAVQEN VDFILIAGDL FHSSRPSPGT
     LKKAI
XX
NC   SM 151; LI .
XX
SM   ASP 8 ; ILE 9
SM   ASP 8 ; HIS 10
SM   ASP 8 ; GLY 48
SM   ASP 8 ; ASP 49
SM   ILE 9 ; HIS 10
SM   ILE 9 ; LEU 11
SM   ILE 9 ; PHE 25
SM   ILE 9 ; PHE 29
SM   ILE 9 ; ILE 46
SM   ILE 9 ; ASP 49
SM   ILE 9 ; LEU 50
SM   HIS 10 ; LEU 11
SM   HIS 10 ; GLY 12
SM   HIS 10 ; TYR 13
SM   HIS 10 ; PHE 25
SM   HIS 10 ; ASP 49
SM   HIS 10 ; LEU 50
SM   LEU 11 ; GLY 12
SM   LEU 11 ; TYR 13
SM   LEU 11 ; ALA 26
SM   LEU 11 ; PHE 29
SM   LEU 11 ; LEU 50
SM   GLY 12 ; TYR 13
SM   GLY 12 ; GLU 14
SM   GLY 12 ; GLU 22
SM   TYR 13 ; GLU 14
SM   TYR 13 ; GLN 15
SM   TYR 13 ; GLU 22
SM   TYR 13 ; PHE 25
SM   GLU 14 ; GLN 15


  [Part of this file has been deleted for brevity]

SM   ASN 31 ; ILE 35
SM   ALA 32 ; LEU 33
SM   ALA 32 ; GLU 34
SM   ALA 32 ; ILE 35
SM   ALA 32 ; ALA 36
SM   LEU 33 ; GLU 34
SM   LEU 33 ; ILE 35
SM   LEU 33 ; ALA 36
SM   LEU 33 ; VAL 37
SM   LEU 33 ; ILE 44
SM   GLU 34 ; ILE 35
SM   GLU 34 ; ALA 36
SM   GLU 34 ; VAL 37
SM   GLU 34 ; GLN 38
SM   ILE 35 ; ALA 36
SM   ILE 35 ; VAL 37
SM   ILE 35 ; GLN 38
SM   ILE 35 ; GLU 39
SM   ALA 36 ; VAL 37
SM   ALA 36 ; GLN 38
SM   ALA 36 ; GLU 39
SM   ALA 36 ; ASN 40
SM   ALA 36 ; VAL 41
SM   ALA 36 ; ILE 44
SM   VAL 37 ; GLN 38
SM   VAL 37 ; GLU 39
SM   VAL 37 ; ASN 40
SM   GLN 38 ; GLU 39
SM   GLN 38 ; ASN 40
SM   GLU 39 ; ASN 40
SM   GLU 39 ; VAL 41
SM   ASN 40 ; VAL 41
SM   ASN 40 ; ASP 42
SM   VAL 41 ; ASP 42
SM   VAL 41 ; PHE 43
SM   VAL 41 ; ILE 44
SM   ASP 42 ; PHE 43
SM   PHE 43 ; ILE 44
SM   PHE 43 ; LEU 45
SM   ILE 44 ; LEU 45
SM   ILE 44 ; ILE 46
SM   LEU 45 ; ILE 46
SM   LEU 45 ; ALA 47
SM   ILE 46 ; ALA 47
SM   ILE 46 ; GLY 48
SM   ILE 46 ; LEU 50
SM   ALA 47 ; GLY 48
SM   GLY 48 ; ASP 49
SM   GLY 48 ; LEU 50
SM   ASP 49 ; LEU 50
//

File: d1cs4a_.con

XX   Intra-chain residue-residue contact data.
XX
TY   INTRA
XX
EX   THRESH 1.0; IGNORE 20.0; NMOD 1; NCHA 1
XX
NE   1
XX
EN   [1]
XX
ID   PDB 1CS4; DOM D1CS4A_; LIG .
XX
CN   MO 1; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 225; NRES2 .
XX
S1   SEQUENCE   225 AA;  25486 MW;  437C8290 CRC32;
     MHHHHHHAME MKADINAKQE DMMFHKIYIQ KHDNVSILFA DIEGFTSLAS QCTAQELVMT
     LNELFARFDK LAAENHCLRI KILGDCYYCV SGLPEARADH AHCCVEMGMD MIEAISLVRE
     MTGVNVNMRV GIHSGRVHCG VLGLRKWQFD VWSNDVTLAN HMEAGGKAGR IHITKATLSY
     LNGDYEVEPG CGGERNAYLK EHSIETFLIL RCTQKRKEEK AMIAK
XX
NC   SM 843; LI .
XX
SM   MET 22 ; MET 23
SM   MET 22 ; PHE 24
SM   MET 22 ; HIS 25
SM   MET 22 ; LYS 26
SM   MET 23 ; PHE 24
SM   MET 23 ; HIS 25
SM   PHE 24 ; HIS 25
SM   PHE 24 ; LYS 26
SM   HIS 25 ; LYS 26
SM   HIS 25 ; ILE 27
SM   HIS 25 ; LEU 144
SM   LYS 26 ; ILE 27
SM   LYS 26 ; TYR 28
SM   ILE 27 ; TYR 28
SM   ILE 27 ; ILE 29
SM   TYR 28 ; ILE 29
SM   TYR 28 ; GLY 140
SM   TYR 28 ; VAL 141
SM   TYR 28 ; TRP 147
SM   ILE 29 ; GLN 30
SM   ILE 29 ; HIS 138
SM   ILE 29 ; CYS 139
SM   ILE 29 ; GLY 140
SM   ILE 29 ; VAL 141
SM   ILE 29 ; LEU 142
SM   ILE 29 ; TRP 152
SM   GLN 30 ; LYS 31
SM   GLN 30 ; HIS 32


  [Part of this file has been deleted for brevity]

SM   GLY 192 ; GLU 194
SM   GLY 192 ; ARG 195
SM   GLY 192 ; ASN 196
SM   GLY 192 ; LEU 199
SM   GLY 192 ; THR 206
SM   GLY 193 ; GLU 194
SM   GLY 193 ; ARG 195
SM   GLY 193 ; ASN 196
SM   GLY 193 ; LEU 199
SM   GLY 193 ; LYS 200
SM   GLU 194 ; ARG 195
SM   GLU 194 ; ASN 196
SM   ARG 195 ; ASN 196
SM   ASN 196 ; ALA 197
SM   ASN 196 ; TYR 198
SM   ASN 196 ; LEU 199
SM   ASN 196 ; LYS 200
SM   ALA 197 ; TYR 198
SM   ALA 197 ; LEU 199
SM   ALA 197 ; LYS 200
SM   ALA 197 ; GLU 201
SM   TYR 198 ; LEU 199
SM   TYR 198 ; LYS 200
SM   TYR 198 ; GLU 201
SM   TYR 198 ; HIS 202
SM   LEU 199 ; LYS 200
SM   LEU 199 ; GLU 201
SM   LEU 199 ; HIS 202
SM   LEU 199 ; SER 203
SM   LEU 199 ; ILE 204
SM   LEU 199 ; THR 206
SM   LYS 200 ; GLU 201
SM   LYS 200 ; HIS 202
SM   LYS 200 ; SER 203
SM   GLU 201 ; HIS 202
SM   GLU 201 ; SER 203
SM   HIS 202 ; SER 203
SM   HIS 202 ; ILE 204
SM   SER 203 ; ILE 204
SM   SER 203 ; GLU 205
SM   ILE 204 ; GLU 205
SM   ILE 204 ; THR 206
SM   GLU 205 ; THR 206
SM   GLU 205 ; PHE 207
SM   THR 206 ; PHE 207
SM   PHE 207 ; LEU 208
SM   PHE 207 ; ILE 209
SM   LEU 208 ; ILE 209
SM   LEU 208 ; LEU 210
SM   ILE 209 ; LEU 210
//

File: d1ii7a_.con

XX   Intra-chain residue-residue contact data.
XX
TY   INTRA
XX
EX   THRESH 1.0; IGNORE 20.0; NMOD 1; NCHA 1
XX
NE   1
XX
EN   [1]
XX
ID   PDB 1II7; DOM D1II7A_; LIG .
XX
CN   MO 1; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 65; NRES2 .
XX
S1   SEQUENCE    65 AA;   7396 MW;  0CFB92A3 CRC32;
     MKFAHLADIH LGYEQFHKPQ REEEFAEAFK NALEIAVQEN VDFILIAGDL FHSSRPSPGT
     LKKAI
XX
NC   SM 151; LI .
XX
SM   ASP 8 ; ILE 9
SM   ASP 8 ; HIS 10
SM   ASP 8 ; GLY 48
SM   ASP 8 ; ASP 49
SM   ILE 9 ; HIS 10
SM   ILE 9 ; LEU 11
SM   ILE 9 ; PHE 25
SM   ILE 9 ; PHE 29
SM   ILE 9 ; ILE 46
SM   ILE 9 ; ASP 49
SM   ILE 9 ; LEU 50
SM   HIS 10 ; LEU 11
SM   HIS 10 ; GLY 12
SM   HIS 10 ; TYR 13
SM   HIS 10 ; PHE 25
SM   HIS 10 ; ASP 49
SM   HIS 10 ; LEU 50
SM   LEU 11 ; GLY 12
SM   LEU 11 ; TYR 13
SM   LEU 11 ; ALA 26
SM   LEU 11 ; PHE 29
SM   LEU 11 ; LEU 50
SM   GLY 12 ; TYR 13
SM   GLY 12 ; GLU 14
SM   GLY 12 ; GLU 22
SM   TYR 13 ; GLU 14
SM   TYR 13 ; GLN 15
SM   TYR 13 ; GLU 22
SM   TYR 13 ; PHE 25
SM   GLU 14 ; GLN 15


  [Part of this file has been deleted for brevity]

SM   ASN 31 ; ILE 35
SM   ALA 32 ; LEU 33
SM   ALA 32 ; GLU 34
SM   ALA 32 ; ILE 35
SM   ALA 32 ; ALA 36
SM   LEU 33 ; GLU 34
SM   LEU 33 ; ILE 35
SM   LEU 33 ; ALA 36
SM   LEU 33 ; VAL 37
SM   LEU 33 ; ILE 44
SM   GLU 34 ; ILE 35
SM   GLU 34 ; ALA 36
SM   GLU 34 ; VAL 37
SM   GLU 34 ; GLN 38
SM   ILE 35 ; ALA 36
SM   ILE 35 ; VAL 37
SM   ILE 35 ; GLN 38
SM   ILE 35 ; GLU 39
SM   ALA 36 ; VAL 37
SM   ALA 36 ; GLN 38
SM   ALA 36 ; GLU 39
SM   ALA 36 ; ASN 40
SM   ALA 36 ; VAL 41
SM   ALA 36 ; ILE 44
SM   VAL 37 ; GLN 38
SM   VAL 37 ; GLU 39
SM   VAL 37 ; ASN 40
SM   GLN 38 ; GLU 39
SM   GLN 38 ; ASN 40
SM   GLU 39 ; ASN 40
SM   GLU 39 ; VAL 41
SM   ASN 40 ; VAL 41
SM   ASN 40 ; ASP 42
SM   VAL 41 ; ASP 42
SM   VAL 41 ; PHE 43
SM   VAL 41 ; ILE 44
SM   ASP 42 ; PHE 43
SM   PHE 43 ; ILE 44
SM   PHE 43 ; LEU 45
SM   ILE 44 ; LEU 45
SM   ILE 44 ; ILE 46
SM   LEU 45 ; ILE 46
SM   LEU 45 ; ALA 47
SM   ILE 46 ; ALA 47
SM   ILE 46 ; GLY 48
SM   ILE 46 ; LEU 50
SM   ALA 47 ; GLY 48
SM   GLY 48 ; ASP 49
SM   GLY 48 ; LEU 50
SM   ASP 49 ; LEU 50
//

File: contacts.log

1cs4
1ii7
D1CS4A_
D1II7A_

contacts reads a directory of domain or protein coordinate files and writes a contacts file of intra-chain residue-residue contact data in embl-like format for each file in the input directory. Each output file contains residue contact data for every chain of every model in a protein coordinate file, or contact data for the single scop domain where a domain coordinate file is read. The paths and extensions for the coordinate (input) and contact (output) files are specified by the user. The scop domain or pdb identifier codes are used as appropriate to name the output files. A log file is also written.

The embl-like format used for the contact files (below) uses the following records:

(1) ID - either the 4-character PDB identifier code (where clean protein coordinate files are used as input) or the 7-character domain identifier code taken from scop (where domain coordinate files were used as input; see documentation for the EMBOSS application scope for further info.)

(2) DE - bibliographic information. The text "Residue-residue contact data" is always given.

(4) EX - experimental information. The value of the threshold contact distance is given as a floating point number after 'THRESH'. The number of models and number of polypeptide chains are given after 'NMOD' and 'NCHA' respectively. domain coordinate files a 1 is always given. Following the EX record, the file will have a section containing a CN, IN and SM records (see below) for each chain. The sections for each chain of a model are given after the MO record.

(5) MO - model number. The number given in brackets after this record indicates the start of a section of model-specific data.

(6) CN - chain number. The number given in brackets after this record indicates the start of a section of chain-specific data.

(7) IN - chain specific data. The character given after ID is the PDB chain identifier taken from the input file, (a '.` given in cases where a chain identifier was not specified in the original pdb file or, for domain coordinate files, the domain is comprised of more than one domain). The number of amino acid residues comprising the chain (or the chains from which a domain is comprised) is given after NR. The number of residue-residue contacts is given after NSMCON.

(8) SM - Line of residue contact data. Pairs of amino acid identifiers and residue numbers are delimited by a ';'. Residue numbers are taken from the clean coordinate file and give a correct index into the sequence (i.e. they are not necessarily the same as the original pdb file).

(9) XX - used for spacing.

(10) // - given on the last line of the file only.

Note - SM records are used for contacts between either either side-chain or main-chain atoms as defined above. In a future implementation, SS will be used for side-chain only contacts, MM will be used for main-chain only contacts, and there will probably be several other forms of contact too.

Excerpt from contacts output file

ID   D1HBBB_
XX
DE   Residue-residue side-chain contact data
XX
EX   THRESH 10.0; NMOD 1; NCHA 1;
XX
MO   [1]
XX
CN   [1]
XX
IN   ID B; NR 146; NSMCON 2514;
XX
SM   VAL 1 ; HIS 2
SM   VAL 1 ; LEU 3
SM   VAL 1 ; THR 4
SM   VAL 1 ; PRO 5
SM   VAL 1 ; GLU 6
SM   VAL 1 ; GLU 7
SM   VAL 1 ; LYS 8
SM   VAL 1 ; VAL 11
SM   VAL 1 ; PHE 71

Data files

contacts uses a data file containing van der Waals radii for atoms in proteins (below). The file Evdw.dat is such a data file and is part of the emboss distribution.

DE   File of van der Waals radii for atoms in proteins
XX
NR   24
XX
AA   ALA
XX
ID   A
XX
NN   12
XX
AT   N    ; 1.7
AT   CA   ; 1.9
AT   C    ; 1.7
AT   O    ; 1.4
AT   CB   ; 1.9
AT   OXT  ; 1.4
AT   H    ; 1.2
AT   OH   ; 1.4
AT   HA   ; 1.2
AT   HB   ; 1.2
AT   HG   ; 1.2
AT   D    ; 1.2
//
AA   ARG
XX
ID   R
XX
NN   31
XX
AT   N    ; 1.7
AT   CA   ; 1.9
AT   C    ; 1.7
AT   O    ; 1.4
AT   N    ; 1.7
**
< data ommitted for clarity >
**
//
AA   XAA
XX
ID   X
XX
NN   6
XX
AT   C    ; 1.9
AT   N    ; 1.7
AT   O    ; 1.4
AT   H    ; 1.2
AT   S    ; 1.8
AT   D    ; 1.2
//              

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

contacts generates a log file an excerpt of which is shown (below). If there is a problem in processing a coordinate file, three lines containing the record '//', the scop domain or pdb identifier code and an error message respectively are written. The text 'WARN file open error filename', 'ERROR file read error filename' or 'ERROR file write error filename ' will be reported when an error was encountered during a file open, read or write respectively. Various other error messages may also be given (in case of difficulty email Jon Ison, jison@hgmp.mrc.ac.uk).

Excerpt of log file
//
DS002__
WARN  Could not open for reading cpdb file s002.pxyz
//
DS003__
WARN  Could not open for reading cpdb file s003.pxyz   

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
contactcountCounts specific versus non-specific contacts in a directory of cleaned protein chain contact files
domainalignGenerates DAF files (domain alignment files) of structure-based sequence alignments for nodes in a DCF file (domain classification file)
domainrepReorder DCF file (domain classification file) so that the representative structure of each user-specified node is given first
domainresoRemoves low resolution domains from a DCF file (domain classification file)
interfaceReads CCF files (clean coordinate files) and writes CON files (contact files) of inter-chain residue-residue contact data
libgenGenerates various types of discriminating elements for each alignment in a directory
psiphiCalculates phi and psi torsion angles from cleaned EMBOSS-style protein co-ordinate file
roconReads a DHF file (domain hits file) of hits (sequences of unknown structural classification) and a DHF file of validation sequences (known classification) and writes a 'hits file' for the hits, which are classified and rank-ordered on the basis of score
rocplotProvides interpretation and graphical display of the performance of discriminating elements (e.g. profiles for protein families). rocplot reads file(s) of hits from discriminator-database search(es), performs ROC analysis on the hits, and writes graphs illustrating the diagnostic performance of the discriminating elements
seqalignReads a DAF file (domain alignment file) and a DHF file (domain hits file) and writes a DAF file extended with the hits
seqfraggleRemoves fragments from DHF files (domain hits files) or other files of sequences
seqsearchGenerate database hits (sequences) for nodes in a DCF file (domain classification file) by using PSI-BLAST
seqsortReads DHF files (domain hits files) of database hits (sequences) and removes hits of ambiguous classification
seqwordsGenerates DHF files (domain hits files) of database hits (sequences) for nodes in a DCF file (domain classification file) by keyword search of UniProt
siggenGenerates a sparse protein signature from an alignment and residue contact data
sigscanGenerates a DHF file (domain hits file) of hits (sequences) from scanning a signature against a sequence database
A 'protein coordinate file' contains protein coordinate and other data extracted from a single pdb file. The files, generated by pdbparse, are in embl-like format and contain 'cleaned-up' data that is self-consistent and error-corrected.

A 'domain coordinate file' contains coordinate and other data for a single scop domain. The files are generated by domainer and are in embl-like and pdb formats.

siggen uses contacts files as input.

Author(s)

Jon Ison (jison © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Written (2003) - Jon Ison

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.