![]() |
SEQSEARCH documentation |
ID D1CS4A_ XX EN 1CS4 XX SI 53931 CL; 54861 FO; 55073 SF; 55074 FA; 55077 DO; 55078 SO; 39418 DD; XX CL Alpha and beta proteins (a+b) XX FO Ferredoxin-like XX SF Adenylyl and guanylyl cyclase catalytic domain XX FA Adenylyl and guanylyl cyclase catalytic domain XX DO Adenylyl cyclase VC1, domain C1a XX OS Dog (Canis familiaris) XX NC 1 XX CN [1] XX CH A CHAIN; . START; . END; // ID D1II7A_ XX EN 1II7 XX SI 53931 CL; 56299 FO; 56300 SF; 64427 FA; 64428 DO; 64429 SO; 62415 DD; XX CL Alpha and beta proteins (a+b) XX FO Metallo-dependent phosphatases XX SF Metallo-dependent phosphatases XX FA DNA double-strand break repair nuclease XX DO Mre11 XX OS Archaeon Pyrococcus furiosus XX NC 1 XX CN [1] XX CH A CHAIN; . START; . END; // |
>ODO2_FUGRU Q90512 DIHYDROLIPOAMIDE SUCCINYLTRANSFERASE COMPONENT OF 2-OXOGLUTARATE DEHYDROGENASE COMPLEX PRECURSOR (EC 2.3.1.61) (E2) (E2K) (FRAGMENT). SSVCRRLIFRTSRPGERASSQNSFHVRYFRTSVVHRDDLVTVKTPAFAESVTEGDVRWEK AVGDSVTEDEVVCEIETDKTSVQVPSPAAGVIEELLVPDGGKVEGGTPLFKLRKGAAAEA APSSVTEPVTAAPPPPPPPVSAPTAMPSVPPVPTQALQAKPVPAPTLPEPSTLGGRGESR VKMSRMRLRIAQRLKEAQNTCAMLTTFNEVDMSNIQEMRTLHKDAFLKKHSIKLGFMSAF VKAAAHALTDQPAVNAVIDGATNEIVYRDYVDISVAVATPKGLVVPVIRNVETMNFADIE RTINALGEKARNNELAVEDMDGGTFTISNGGVFGSLFGTPIINPPQSAILGMHGIFQRPV AVDGKAEIRPMMYVALTYDHRLVDGREAVTFLRKIKAAVEDPRALLLDM >TM21_FUGRU Q90515 TRANSMEMBRANE PROTEIN TMP21 PRECURSOR (S31III125). MARLTALLFLPVLIESAFSISFFLPVNTRKCLREEIHKDVLVTGEYEISEQVVTVHTSST VVGDGSIFKITDSSSHTLYSKEDATKGKFAFTTEDYDMFEVCFESKCTGRVPDQLVNLDM KHGVEAKNYEEIAKVEKLKPLEVELRRLEDLSESIVNDFAYMKKREEEMRDTNESTNTRV LYFSIFSMFCLIGLATWQVFYLRRFFKAKKLIE >CO9_FUGRU P79755 COMPLEMENT COMPONENT C9 PRECURSOR. MRTEAALQLGFCALCVMLALLGEGMGRELPDPPAVNCVWSRWAPWSSCDPCTNTRRRSRG VEVFGQFAGIACQGSVGDREYCITNAKCNLPPPRECSDSEFQCESGSCIKLRLKCNGDYD CEDGSDEDCEPLRKTCPPTVLDTNEQGRTAGYGINILGADPRMNPFNNDFFNGRCDKVRN PNTLQLDRLPWNIGVLNYQTLVEETASREIYEDSYSLLREMLKEMSIKVDAGLSFKFKST EPSMSNNSLKLDASLEYEKKTMIKDVSELTNIKNKSFMRVKGRLQLSTYRMRSHQLQVAD EFVAHVKSLPLEYEKGIYYAFLEDYGTHYTKNGKSGGEYELVYVLNQDTIKAKNLTERKI QECLKIGIEAEFATTSVQDGKAHAKLNKCDDVTTKSQGDVEGKAVVDNVMTSVKGGSLES AVTMRAKLNKEGVMDIATYQNWARTIASAPALINSEPEPIYMLIPTDIPGANSRIANLKQ ATADYVAEYNVCKCRPCHNGGTLALLDGKCICMCSNLFEGLGCQNFKGDKARVPAARPAV TQEGNWSCWSSWSNCQGQKRSRTRYCNTEGVLGAECRGEIRSEEYC >E2BB_FUGRU Q90511 TRANSLATION INITIATION FACTOR EIF-2B BETA SUBUNIT (EIF-2B GDP-GTP EXCHANGE FACTOR) (S20I15). MPGADKEVDLTERIEAFLSDLKRGGSGTGPLRGSSETARETTALLRRITAQARWSSAGDL MEIIRKEGRRLIAAQPSETTVGNMIRRVLKIIREEYARSRGSSEEADQQESLHKLLTSGG LSEENFRQHFAALRANVIEAINELLTELEGTTDNIAMQALEHIHSNEVIMTVGRSRTVEA FLKDAARKRKFHVIVAECAPFCQGHKMATSLSTAGIETTVIADAAIFAVMSRVNKVIIGT QTVLANGGLRAVNGTHTLALAAKHHSTPLIVCAPMFKLSPQFPNEEDTFHKFVSPHEVLP FTEGEILSKVNVHCPVFDYVPPELITLFISNIGGHAPSYIYRLMSELYHPEDHEL >FOS_FUGRU P53450 P55-C-FOS PROTO-ONCOGENE PROTEIN. MMFTSFNAECDSSSRCSASPVGDNLYYPSPAGSYSSMGSPQSQDFTDLTASSASFIPTVT AISTSPDLQWMVQPLISSVAPSHRAHPYSPSPSYKRTVMRSAASKAHGKRSRVEQTTPEE EEKKRIRRERNKQAAAKCRNRRRELTDTLQAETDQLEDEKSSLQNDIANLLKEKERLEFI LAAHQPICKIPSQMDTDFSVVSMSPVHACLSTTVSTQLQTSIPEATTVTSSHSTFTSTSN SIFSGSSDSLLSTATVSNSVVKMTDLDSSVLEESLDLLAKTEAETARSVPDVNLSNSLFA AQDWEPLHATISSSDFEPLCTPVVTCTPACTTLTSSFVFTFPEAETFPTCGVAHRRRSNS NDQSSDSLSSPTLLAL >FABP_FUGRU O42494 FATTY ACID BINDING PROTEIN. MSFSGKYQQVSQENFEPFMKAIGLPDEVIQQVKELKSTSEIEQNGNDFKITITTGPKVTV NKFTIGKETEMDTITGEKIKTVFHLDGNKLKVSLKGIESVTELADPNTITMTLGDVVYKT TSKRM >KPB2_FUGRU Q9W6R1 PHOSPHORYLASE B KINASE ALPHA REGULATORY CHAIN, LIVER ISOFORM (PHOSPHORYLASE KINASE ALPHA L SUBUNIT). MRSRSNSGVRLDGYARLVHETILGFQNPVTGLLPASVQKKDAWVRDNVYSILAVWGLGMA YRKNADRDEDKAKAYELEQSVVKLMQGLLHCMMRQVAKVEKFKHTQSTTDCLHAKYDTST CATVVGDDQWGHLQVDATSIYLLMLAQMTASGLRIISNLDEVAFVQNLVFYIEAAYKVAD YGMWERGDKTNQGLPELNGSSVGMAKAALEAIDELDLFGAHGGPKSVIHVLPDEVEHCQS ILCSMLPRASPSKEIDAGLLSVISFPAFAVEDADLVTITKSEIINKLQGRYGCCRFIRDG YHCPKEDPTRLHYDPAELKLFENIECEWPVFWTYLILDGIFAEDQVQVQEYREALEGVLI RGKNGIKLLPELYTVPFDKVEEEYRNPHSVDREATGQLPHMWEQSLYILGCLLAEGFLAP GEIDPLNRRFSTSFKPDVVVQVCVLAESQEIKALLSEQGMVVQTVAEVLPIRVMSARVLS QIYVRLGNCKKLSLSGRPYRHIGVLGTSKFYEIRNHTYTFTPQFLDQHHFYLALDNQMIV EMLRTELAYLSSCWRMTGRPTLTFPVTRSMLVEDGDAVDPCILSTLRKLQDGYFAGARVQ MSDLSTFQTTSFHTRLSFLDEEHDDSLLEDDEEQEEEEEDKFEDDYNNYGPSGNNQVCYV SKDKFDQYLTQLLHSTTQKCHLPPIQRGQHHVFSAEHTTRDILSFMAQVQGLNVPKSSMY LPVTPLKSKHRRSLNLLDVPHPQHGPHLKQNKVGTFNSVLAADLHLPRDPQGKTDFATLV KQLKECPTLQDQADILYILNTSKGADWLVELSGPGQGGVSVHTLLEELYIQAGACKEWGL IRYISGILRKRVEVLAEACTDLISHHKQLTVGLPPEPRERVITVPLPPEELNTLIYEASG QDISVAVLTQEIMVYLAMYIRSQPALFGDMLRLRIGLIMQVMATELARSLHCSGEEASES LMSLSPFDMKNLLHHILSGKEFGVERSMRPIQSTATSPAISIHEIGHTGATKTERTGIRK LKSEIKQRCSSPSTPSGILSPVGPGPADGQLHWVERQGQWLRRRRLDGAINRVPVGFYQK VWKILQKCHGLSIDGYVLPSSTTREMTAGEIKFAVQVESVLNHVPQPEYRQLLVESVMVL GLVADVDVESIGSIIYVDRILHLANDLFLTDQKSYSAGDYFLEKDPETGICNFFYDSAPS GIYGTMTYLSKAAVTYIQDFLPSSSCIMQ >NEUI_FUGRU O42493 ISOTICIN-NEUROPHYSIN IT 1 PRECURSOR [CONTAINS: ISOTOCIN (IT); NEUROPHYSIN IT 1]. MTGTAISVCLLFLLSVCSACYISNCPIGGKRSIMDAPQRKCMSCGPGDRGRCFGPGICCG ESFGCLMGSPESARCAEENYLLTPCQAGGRPCGSEGGLCASSGLCCDAESCTMDQSCLSE EEGDERGSLFDGSDSGDVILKLLRLAGLTSPHQTH >NEUV_FUGRU O42499 VASOTOCIN-NEUROPHYSIN VT 1 PRECURSOR [CONTAINS: VASOTOCIN (VT); NEUROPHYSIN VT 1]. MPQCALLLSLLGLLALSSACYIQNCPRGGKRALPETGIRQCMSCGPRDRGRCFGPNICCG EALGCLMGSPETARCAGENYLLTPCQAGGRPCGSEGGRCAVSGLCCNSESCAVDSDCLGE TESLEPGDSSADSSPTELLLRLLHMSSRGQSEY |
CL Alpha and beta proteins (a+b) XX FO Ferredoxin-like XX SF Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain XX FA Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain XX SI 54894 XX NS 0 XX |
CL Alpha and beta proteins (a+b) XX FO Ferredoxin-like XX SF Adenylyl and guanylyl cyclase catalytic domain XX FA Adenylyl and guanylyl cyclase catalytic domain XX SI 55074 XX NS 0 XX |
// /ebi/services/idata/pmr/hgmp/test/data/structure/54894.salign // /ebi/services/idata/pmr/hgmp/test/data/structure/55074.salign |
Standard (Mandatory) qualifiers: -mode menu This option specifies the mode of SEQSEARCH operation. SEQSEARCH takes as input a directory of either i. single sequences, ii. set of sequences (unaligned or aligned, but typically aligned sequences within a domain alignment file)). The user has to specify which. [-inseqspath] dirlist This option specifies the location of sequences, e.g. DAF files (domain alignment files) (input). SEQSEARCH takes as input a database of either i. single sequences, ii. sets of unaligned sequences or iii. sets of aligned sequences, e.g. a domain alignment file. A 'domain alignment file' contains a sequence alignment of domains belonging to the same SCOP or CATH family. The file is in clustal format annotated with domain family classification information. The files generated by using SCOPALIGN will contain a structure-based sequence alignment of domains of known structure only. Such alignments can be extended with sequence relatives (of unknown structure) by using SEQALIGN. [-database] string Name of BLAST-indexed database to search. -niter integer This option specifies the number of PSIBLAST iterations. This option specifies the number of PSIBLAST iterations that are performed in a search. -evalue float This option specifies the threshold E-value for inclusion in family. This option specifies the threshold E-value for a PSIBLAST hit to be retained. -maxhits integer This option specifies the maximum number of hits. This option specifies the maximum number of PSIBLAST hit that are retained. It should normally be set high so that nothing is discarded. [-dhfoutdir] outdir This option specifies the location of DHF files (domain hits files) (output). A 'domain hits file' contains database hits (sequences) with domain classification information, in FASTA format. The hits are relatives to a SCOP or CATH family and are found from a search of a sequence database. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH. -logfile outfile This option specifies the name of log file for the build. The log file contains messages about any errors arising while SEQSEARCH ran. Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-logfile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths
Standard (Mandatory) qualifiers | Allowed values | Default | |||||
---|---|---|---|---|---|---|---|
-mode | This option specifies the mode of SEQSEARCH operation. SEQSEARCH takes as input a directory of either i. single sequences, ii. set of sequences (unaligned or aligned, but typically aligned sequences within a domain alignment file)). The user has to specify which. |
|
1 | ||||
[-inseqspath] (Parameter 1) |
This option specifies the location of sequences, e.g. DAF files (domain alignment files) (input). SEQSEARCH takes as input a database of either i. single sequences, ii. sets of unaligned sequences or iii. sets of aligned sequences, e.g. a domain alignment file. A 'domain alignment file' contains a sequence alignment of domains belonging to the same SCOP or CATH family. The file is in clustal format annotated with domain family classification information. The files generated by using SCOPALIGN will contain a structure-based sequence alignment of domains of known structure only. Such alignments can be extended with sequence relatives (of unknown structure) by using SEQALIGN. | Directory with files | ./ | ||||
[-database] (Parameter 2) |
Name of BLAST-indexed database to search. | Any string is accepted | swissprot | ||||
-niter | This option specifies the number of PSIBLAST iterations. This option specifies the number of PSIBLAST iterations that are performed in a search. | Any integer value | 1 | ||||
-evalue | This option specifies the threshold E-value for inclusion in family. This option specifies the threshold E-value for a PSIBLAST hit to be retained. | Any numeric value | 0.001 | ||||
-maxhits | This option specifies the maximum number of hits. This option specifies the maximum number of PSIBLAST hit that are retained. It should normally be set high so that nothing is discarded. | Any integer value | 1000 | ||||
[-dhfoutdir] (Parameter 3) |
This option specifies the location of DHF files (domain hits files) (output). A 'domain hits file' contains database hits (sequences) with domain classification information, in FASTA format. The hits are relatives to a SCOP or CATH family and are found from a search of a sequence database. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH. | Output directory | ./ | ||||
-logfile | This option specifies the name of log file for the build. The log file contains messages about any errors arising while SEQSEARCH ran. | Output file | seqsearch.log | ||||
Additional (Optional) qualifiers | Allowed values | Default | |||||
(none) | |||||||
Advanced (Unprompted) qualifiers | Allowed values | Default | |||||
(none) |
% seqsearch Generate files of hits for families in a scop classification file by using PSI-BLAST with seed alignments. Name of scop classification file (embl format input): all.scop Location of scop alignment files (input) [./]: structure Extension of scop alignment files (input) [.salign]: Name of BLAST-indexed database to search [swissprot]: swnew Number of PSIBLAST iterations [1]: 1 Threshold E-value for inclusion in family [0.001]: 0.001 Maximum number of hits [1000]: 100 Location of scop hits files (output) [./]: Extension of scop hits files (output) [.hits]: Name of log file for the build [seqsearch.log]: [blastpgp] WARNING: posFindAlignmentDimensions: Attempting to recover data from multiple alignment file [blastpgp] WARNING: posProcessAlignment: Alignment recovered successfully [blastpgp] WARNING: posFindAlignmentDimensions: Attempting to recover data from multiple alignment file [blastpgp] WARNING: posProcessAlignment: Alignment recovered successfully PROCESSING /ebi/services/idata/pmr/hgmp/test/data/structure/54894.salign blastpgp -i ./seqsearch-1095239004.25358.seqin -B ./seqsearch-1095239004.25358.seqsin -j 1 -e 0.001000 -b 100 -v 100 -d ../../data/swnew > ./seqsearch-1095239004.25358.psiout PROCESSING /ebi/services/idata/pmr/hgmp/test/data/structure/55074.salign blastpgp -i ./seqsearch-1095239004.6149.seqin -B ./seqsearch-1095239004.6149.seqsin -j 1 -e 0.001000 -b 100 -v 100 -d ../../data/swnew > ./seqsearch-1095239004.6149.psiout |
Go to the input files for this example
Go to the output files for this example
All domain alignment files (with the file extension of .daf specified
in the ACD file) were read from the directory /test_data; in this case
two domain alignment files 54894.salign and 55074.salign were read.
Sets of sequences extracted from these files were used to search the
sequence database swissprot by using psiblast. psiblast was
configured to perform 1 iteration with a threshold E-value for
acceptance of a hit of 0.0001 and no more than 100 hits were generated
from each iteration. Domain hits files were written to
/test_data/seqsearch ( the file extension .dhf was specified in the
ACD file); in this case two files /test_data/54894.dhf and
/test_data/55074.dhf were written. A log file called
/test_data/seqsearch/seqsearch.log was also written.
FILE TYPE | FORMAT | DESCRIPTION | CREATED BY | SEE ALSO |
Domain hits file | DHF format (FASTA-like format with domain classification information). | Database hits (sequences) with domain classification information. The hits are relatives to a SCOP or CATH family and are found from a search of a sequence database. | SEQSEARCH (hits retrieved by PSIBLAST) | N.A. |
Domain alignment file | DAF format (CLUSTAL-like format with domain classification information). | Contains a sequence alignment of domains belonging to the same SCOP or CATH family. The file is annotated with domain family classification information. | DOMAINALIGN (structure-based sequence alignment of domains of known structure). | DOMAINALIGN alignments can be extended with sequence relatives (of unknown structure) to the family in question by using SEQALIGN. |
Program name | Description |
---|---|
contactcount | Counts specific versus non-specific contacts in a directory of cleaned protein chain contact files |
contacts | Reads CCF files (clean coordinate files) and writes CON files (contact files) of intra-chain residue-residue contact data |
domainalign | Generates DAF files (domain alignment files) of structure-based sequence alignments for nodes in a DCF file (domain classification file) |
domainrep | Reorder DCF file (domain classification file) so that the representative structure of each user-specified node is given first |
domainreso | Removes low resolution domains from a DCF file (domain classification file) |
interface | Reads CCF files (clean coordinate files) and writes CON files (contact files) of inter-chain residue-residue contact data |
libgen | Generates various types of discriminating elements for each alignment in a directory |
psiphi | Calculates phi and psi torsion angles from cleaned EMBOSS-style protein co-ordinate file |
rocon | Reads a DHF file (domain hits file) of hits (sequences of unknown structural classification) and a DHF file of validation sequences (known classification) and writes a 'hits file' for the hits, which are classified and rank-ordered on the basis of score |
rocplot | Provides interpretation and graphical display of the performance of discriminating elements (e.g. profiles for protein families). rocplot reads file(s) of hits from discriminator-database search(es), performs ROC analysis on the hits, and writes graphs illustrating the diagnostic performance of the discriminating elements |
seqalign | Reads a DAF file (domain alignment file) and a DHF file (domain hits file) and writes a DAF file extended with the hits |
seqfraggle | Removes fragments from DHF files (domain hits files) or other files of sequences |
seqsort | Reads DHF files (domain hits files) of database hits (sequences) and removes hits of ambiguous classification |
seqwords | Generates DHF files (domain hits files) of database hits (sequences) for nodes in a DCF file (domain classification file) by keyword search of UniProt |
siggen | Generates a sparse protein signature from an alignment and residue contact data |
sigscan | Generates a DHF file (domain hits file) of hits (sequences) from scanning a signature against a sequence database |
See also http://emboss.sourceforge.net/