![]() |
EMBOSS: tfextract |
TRANSFAC started 1988 with a printed compilation (Nucleic Acids Res. 16: 1879-1902, 1988) and was transferred into computer-readable format in 1990 (BioTechForum - Advances in Molecular Genetics (J. Collins, A.J. Driesel, eds.) 4:95-108, 1991). The basic structures of Table 1 and 2 of the compilation were taken as the core of the emergent database. The aim of the early compilation as well as of the TRANSFAC database is
The program tfextract extracts data from the TRANSFAC database file site.dat. This file contains information on individual (putatively) regulatory protein binding sites. About half of these refer to sites within eukaryotic genes. Just under half of them resulted from mutagenesis studies, in vitro selection procedures starting from random oligonucleotide mixtures or from specific theoretical considerations. And finally, there are about 5% with consensus binding sequences given in the IUPAC code, many of them being taken from the compilation of Faisst and Meyer (Nucleic Acids Res. 20:3-26, 1992). A number of consensi have been generated by the TRANSFAC team, generally derived from the profiles stored in the MATRIX table.
The data is split up by taxonomic groups:
and placed in individual files:
These files are stored in the EMBOSS data directory, see Data Files below.
% tfextract Extract data from TRANSFAC Full pathname of transfac SITE.DAT: /data/transfac/site.dat
Mandatory qualifiers: [-inf] infile Full pathname of transfac SITE.DAT Optional qualifiers: (none) Advanced qualifiers: (none) General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-inf] (Parameter 1) |
Full pathname of transfac SITE.DAT | Input file | Required |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
(none) |
http://transfac.gbf.de/cgi-bin/download/download.pl
These files are used by the tfscan program to search for TRANSFAC sites in sequences.
% ls -1s emboss/data/tf* 18 emboss/data/tffungi 17 emboss/data/tfinsect 56 emboss/data/tfother 4 emboss/data/tfplant 112 emboss/data/tfvertebrate
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
Program name | Description |
---|---|
cutgextract | Extract data from CUTG |
domainer | Build domain coordinate files |
nrscope | Converts redundant EMBL-format SCOP file to non-redundant one |
pdbtosp | Convert raw swissprot:pdb equivalence file to embl-like format |
printsextract | Extract data from PRINTS |
prosextract | Builds the PROSITE motif database for patmatmotifs to search |
rebaseextract | Extract data from REBASE |
scope | Convert raw scop classification file to embl-like format |
scopparse | Reads raw scop classifications file and writes embl-like format scop classification file |
seqnr | Converts redundant database results to a non-redundant set of hits |