org.biojava.bio.seq.db
Class EmblCDROMIndexStore

java.lang.Object
  extended by org.biojava.bio.seq.db.EmblCDROMIndexStore
All Implemented Interfaces:
IndexStore

public class EmblCDROMIndexStore
extends Object
implements IndexStore

EmblCDROMIndexStores implement a read-only IndexStore backed by EMBL CD-ROM format binary indices. The required index files are typically named "division.lkp" and "entrynam.idx". As an IndexStore performs lookups by sequence ID, the index files "acnum.trg" and "acnum.hit" (which store additional accession number data) are not used.

The sequence IDs are found using a binary search via a pointer into the index file. The whole file is not read unless a request for all the IDs is made using the getIDs() method. The set of IDs is then cached after the first pass. This class also has a close() method to free resources associated with the underlying RandomAccessFile.

The binary index files may be created using the EMBOSS programs dbifasta, dbiblast, dbiflat or dbigcg. The least useful from the BioJava perspective is dbigcg because we do not have a SequenceFormat implementation for GCG format files.

The Index instances returned by this class do not have the record length set because this information is not available in the binary index. The value -1 is used instead, as described in the Index interface.

Since:
1.2
Author:
Keith James

Constructor Summary
EmblCDROMIndexStore(File pathPrefix, File divisionLkp, File entryNamIdx, SequenceFormat format, SequenceBuilderFactory factory, SymbolTokenization parser)
          Creates a new EmblCDROMIndexStore backed by a random access binary index.
EmblCDROMIndexStore(File divisionLkp, File entryNamIdx, SequenceFormat format, SequenceBuilderFactory factory, SymbolTokenization parser)
          Creates a new EmblCDROMIndexStore backed by a random access binary index.
 
Method Summary
 void close()
          close closes the underlying EntryNamRandomAccess which in turn closes the lower level RandomAccessFile.
 void commit()
          commit commits changes.
 Index fetch(String id)
          Fetch an Index based upon an ID.
 Set getFiles()
          Retrieve the Set of files that are currently indexed.
 SequenceFormat getFormat()
          Retrieve the format of the index file.
 Set getIDs()
          Retrieve the set of all current IDs.
 String getName()
          getName returns the database name as defined within the EMBL CD-ROM index.
 File getPathPrefix()
          getPathPrefix returns the abstract path currently being appended to the raw sequence database filenames extracted from the binary index.
 SequenceBuilderFactory getSBFactory()
          Retrieve the SequenceBuilderFactory used to build Sequence instances.
 SymbolTokenization getSymbolParser()
          Retrieve the symbol parser used to turn the sequence characters into Symobl objects.
 void rollback()
          rollback rolls back changes made since the last commit.
 void setPathPrefix(File pathPrefix)
          setPathPrefix sets the abstract path to be appended to sequence database filenames retrieved from the binary index.
 void store(Index index)
          store adds an Index to the store.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

EmblCDROMIndexStore

public EmblCDROMIndexStore(File divisionLkp,
                           File entryNamIdx,
                           SequenceFormat format,
                           SequenceBuilderFactory factory,
                           SymbolTokenization parser)
                    throws IOException
Creates a new EmblCDROMIndexStore backed by a random access binary index.

Parameters:
divisionLkp - a File containing the master index.
entryNamIdx - a File containing the sequence IDs and offsets.
format - a SequenceFormat.
factory - a SequenceBuilderFactory.
parser - a SymbolTokenization.
Throws:
IOException - if an error occurs.

EmblCDROMIndexStore

public EmblCDROMIndexStore(File pathPrefix,
                           File divisionLkp,
                           File entryNamIdx,
                           SequenceFormat format,
                           SequenceBuilderFactory factory,
                           SymbolTokenization parser)
                    throws IOException
Creates a new EmblCDROMIndexStore backed by a random access binary index.

Parameters:
pathPrefix - a File containing the abstract path to be appended to sequence database filenames retrieved from the binary index.
divisionLkp - a File containing the master index.
entryNamIdx - a File containing the sequence IDs and offsets.
format - a SequenceFormat.
factory - a SequenceBuilderFactory.
parser - a SymbolTokenization.
Throws:
IOException - if an error occurs.
Method Detail

getPathPrefix

public File getPathPrefix()
getPathPrefix returns the abstract path currently being appended to the raw sequence database filenames extracted from the binary index. This value defaults to the empty abstract path.

Returns:
a File.

setPathPrefix

public void setPathPrefix(File pathPrefix)
setPathPrefix sets the abstract path to be appended to sequence database filenames retrieved from the binary index. E.g. if the binary index refers to the database as 'SWALL' and the pathPrefix is set to "/usr/local/share/data/seq/", then the IndexStore will know the database path as "/usr/local/share/data/seq/swall" and any Index instances produced by the store will return the latter path when their getFile() method is called. This value defaults to the empty abstract path.

Parameters:
pathPrefix - a File prefix specifying the abstract path to append.

getName

public String getName()
getName returns the database name as defined within the EMBL CD-ROM index.

Specified by:
getName in interface IndexStore
Returns:
a String value.

store

public void store(Index index)
           throws IllegalIDException,
                  BioException
store adds an Index to the store. As EMBL CD-ROM indices are read-only, this implementation throws a BioException.

Specified by:
store in interface IndexStore
Parameters:
index - an Index.
Throws:
IllegalIDException - if an error occurs.
BioException - if an error occurs.

commit

public void commit()
            throws BioException
commit commits changes. As EMBL CD-ROM indices are read-only, this implementation throws a BioException.

Specified by:
commit in interface IndexStore
Throws:
BioException - if an error occurs.

rollback

public void rollback()
rollback rolls back changes made since the last commit. As EMBL CD-ROM indices are read-only, this implementation does nothing.

Specified by:
rollback in interface IndexStore

fetch

public Index fetch(String id)
            throws IllegalIDException,
                   BioException
Description copied from interface: IndexStore
Fetch an Index based upon an ID.

Specified by:
fetch in interface IndexStore
Parameters:
id - The ID of the sequence Index to retrieve
Throws:
IllegalIDException - if the ID couldn't be found
BioException - if the fetch fails in the underlying storage mechanism

getIDs

public Set getIDs()
Description copied from interface: IndexStore
Retrieve the set of all current IDs.

This set should either be immutable, or modifiable totally separately from the IndexStore.

Specified by:
getIDs in interface IndexStore
Returns:
a Set of all legal IDs

getFiles

public Set getFiles()
Description copied from interface: IndexStore
Retrieve the Set of files that are currently indexed.

Specified by:
getFiles in interface IndexStore

getFormat

public SequenceFormat getFormat()
Description copied from interface: IndexStore
Retrieve the format of the index file.

This set should either be immutable, or modifiable totally separately from the IndexStore.

Specified by:
getFormat in interface IndexStore
Returns:
a Set of all indexed files

getSBFactory

public SequenceBuilderFactory getSBFactory()
Description copied from interface: IndexStore
Retrieve the SequenceBuilderFactory used to build Sequence instances.

Specified by:
getSBFactory in interface IndexStore
Returns:
the associated SequenceBuilderFactory

getSymbolParser

public SymbolTokenization getSymbolParser()
Description copied from interface: IndexStore
Retrieve the symbol parser used to turn the sequence characters into Symobl objects.

Specified by:
getSymbolParser in interface IndexStore
Returns:
the associated SymbolParser

close

public void close()
           throws IOException
close closes the underlying EntryNamRandomAccess which in turn closes the lower level RandomAccessFile. This frees the resources associated with the file.

Throws:
IOException - if an error occurs.