![]() |
pdbtosp |
Some research applications require knowledge of the database sequence(s) that corresponds to the sequence(s) given in a pdb file. A 'swissprot:pdb equivalence' file listing accession numbers and swissprot database identifiers for certain pdb code is available, but is not in a format that is consistent with flat file formats used for protein structural data in emboss. pdbtosp parses the swissprot:pdb equivalence in its raw format and converts it to an embl-like format.
pdbtosp parses the swissprot:pdb equivalence table available at URL (1) (1) http://www.expasy.ch/cgi-bin/lists?pdbtosp.txt and writes the data out in embl-like format file (Figure 2). The raw (input) file can be obtained by doing a "save as ... text format" from the web page (1). No changes are made to the data other than changing the format in which it is held. The input and output files are specified by the user.
% pdbtosp Convert raw swissprot:PDB equivalence file to EMBL-like format Name of raw swissprot:PDB equivalence file (input): pdbtosp.txt Name of swissprot:PDB equivalence file (EMBL-like format output) [Epdbtosp.dat]: |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-infile] infile This option specifies the name of raw swissprot:PDB equivalence file (input). HETPARSE parses this file, which is available at URL http://www.expasy.ch/cgi-bin/lists?pdbtosp.txt [-outfile] outfile This option specifies the name of swissprot:PDB equivalence file (EMBL-like format). This is the PDBTOSP output file. Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-outfile" associated qualifiers -odirectory2 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-infile] (Parameter 1) |
This option specifies the name of raw swissprot:PDB equivalence file (input). HETPARSE parses this file, which is available at URL http://www.expasy.ch/cgi-bin/lists?pdbtosp.txt | Input file | Required |
[-outfile] (Parameter 2) |
This option specifies the name of swissprot:PDB equivalence file (EMBL-like format). This is the PDBTOSP output file. | Output file | Epdbtosp.dat |
Additional (Optional) qualifiers | Allowed values | Default | |
(none) | |||
Advanced (Unprompted) qualifiers | Allowed values | Default | |
(none) |
------------------------------------------------------------------------ ExPASy Home page Site Map Search ExPASy Contact us SWISS-PROT Hosted by SIB Mirror USA[new] Switzerland sites: AustraliaCanada China Korea Taiwan ------------------------------------------------------------------------ ---------------------------------------------------------------------------- SWISS-PROT Protein Knowledgebase Swiss Institute of Bioinformatics (SIB); Geneva, Switzerland European Bioinformatics Institute (EBI); Hinxton, United Kingdom ---------------------------------------------------------------------------- Description: Index of Protein Data Bank (PDB) entries referenced in SWISS-PROT Name: PDBTOSP.TXT Release: 40.9 of 31-Jan-2002 ---------------------------------------------------------------------------- The PDB database is available at the following URL: USA: http://www.rcsb.org/pdb/ EBI: http://www2.ebi.ac.uk/pdb/ - Number of PDB entries referenced in SWISS-PROT: 9901 - Number of SWISS-PROT entries with one or more pointers to PDB: 3260 PDB Last revision code date SWISS-PROT entry name(s) ____ ___________ __________________________________________ 101M (08-APR-98) : MYG_PHYCA (P02185) 102L (31-OCT-93) : LYCV_BPT4 (P00720) 102M (08-APR-98) : MYG_PHYCA (P02185) 103L (31-OCT-93) : LYCV_BPT4 (P00720) 103M (08-APR-98) : MYG_PHYCA (P02185) 9XIA (15-JUL-92) : XYLA_STRRU (P24300) 9XIM (15-JUL-93) : XYLA_ACTMI (P12851) ---------------------------------------------------------------------------- SWISS-PROT is copyright. It is produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL Outstation - the European Bioinformatics Institute. There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme see: http://www.isb-sib.ch/announce/ or send an email to license@isb-sib.ch. ---------------------------------------------------------------------------- ------------------------------------------------------------------------ ExPASy Home page Site Map Search ExPASy Contact us SWISS-PROT Hosted by SIB Mirror USA[new] Switzerland sites: AustraliaCanada China Korea Taiwan ------------------------------------------------------------------------ |
EN 101M XX NE 1 XX IN MYG_PHYCA ID; P02185 ACC; XX // EN 102L XX NE 1 XX IN LYCV_BPT4 ID; P00720 ACC; XX // EN 102M XX NE 1 XX IN MYG_PHYCA ID; P02185 ACC; XX // EN 103L XX NE 1 XX IN LYCV_BPT4 ID; P00720 ACC; XX // EN 103M XX NE 1 XX IN MYG_PHYCA ID; P02185 ACC; XX // EN 9XIA XX NE 1 XX IN XYLA_STRRU ID; P24300 ACC; XX // EN 9XIM XX NE 1 XX IN XYLA_ACTMI ID; P12851 ACC; XX // |
An excerpt from the swissprot:pdb equivalence file in embl-like format is shown below. The records used to describe an entry are as follows.
(1) EN - PDB identifier code. This is the 4-character PDB identifier code.
(2) NE - Number of entries. This is the number of accession numbers that were given for that pdb code in the equivalence file.
(3) IN - Code line. The swissprot database identifier code and accession number are given preceeding ID and ACC respectively.
Excerpt from embl-like format swissprot:pdb equivalence file
EN 3SDH XX NE 2 XX IN LEU3_THETH ID; P00351 ACC; IN LEU4_THEFF ID; P02351 ACC; XX // EN 2SDH XX NE 1 XX IN LEU1_FDFTH ID; P11351 ACC; XX //
Program name | Description |
---|---|
aaindexextract | Extract data from AAINDEX |
allversusall | Does an all-versus-all global alignment for each set of sequences in an input directory and writes files of sequence similarity values |
cathparse | Reads raw CATH classification files and writes DCF file (domain classification file) |
cutgextract | Extract data from CUTG |
domainer | Reads CCF files (clean coordinate files) for proteins and writes CCF files for domains, taken from a DCF file (domain classification file) |
domainnr | Removes redundant domains from a DCF file (domain classification file). The file must contain domain sequence information, which can be added by using DOMAINSEQS |
domainseqs | Adds sequence records to a DCF file (domain classification file) |
domainsse | Adds secondary structure records to a DCF file (domain classification file) |
hetparse | Converts raw dictionary of heterogen groups to a file in EMBL-like format |
pdbparse | Parses PDB files and writes CCF files (clean coordinate files) for proteins |
pdbplus | Add residue solvent accessibility and secondary structure data to a CCF file (clean coordinate file) for a protein or domain |
printsextract | Extract data from PRINTS |
prosextract | Builds the PROSITE motif database for patmatmotifs to search |
rebaseextract | Extract data from REBASE |
scopparse | Reads raw SCOP classification files and writes a DCF file (domain classification file) |
seqnr | Removes redundancy from DHF files (domain hits files) or other files of sequences |
sites | Reads CCF files (clean coordinate files) and writes CON files (contact files) of residue-ligand contact data for domains in a DCF file (domain classification file) |
ssematch | Searches a DCF file (domain classification file) for secondary structure matches |
tfextract | Extract data from TRANSFAC |
scopseqs uses the pdbtosp output file as input.