Class Bio::NCBI::REST
In: lib/bio/io/ncbirest.rb
Parent: Object

Methods

Classes and Modules

Class Bio::NCBI::REST::EFetch
Class Bio::NCBI::REST::ESearch

Constants

NCBI_INTERVAL = 1   Make no more than one request every 1 seconds. (NCBI‘s restriction is "Make no more than 3 requests every 1 second.", but limited to 1/sec partly because of keeping the value in integer.)

Public Class methods

[Source]

     # File lib/bio/io/ncbirest.rb, line 252
252:   def self.efetch(*args)
253:     self.new.efetch(*args)
254:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 240
240:   def self.einfo
241:     self.new.einfo
242:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 244
244:   def self.esearch(*args)
245:     self.new.esearch(*args)
246:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 248
248:   def self.esearch_count(*args)
249:     self.new.esearch_count(*args)
250:   end

Public Instance methods

Retrieve database entries by given IDs and using E-Utils (efetch) service.

For information on the possible arguments, see

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
 ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
 ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

 Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
 Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
 Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Arguments:

  • ids: list of NCBI entry IDs (required)
  • hash: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"}
    • db: "sequences", "nucleotide", "protein", "pubmed", "omim", …
    • retmode: "text", "xml", "html", …
    • rettype: "gb", "gbc", "medline", "count",…
  • step: maximum number of entries retrieved at a time
Returns:String

[Source]

     # File lib/bio/io/ncbirest.rb, line 212
212:   def efetch(ids, hash = {}, step = 100)
213:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
214:     opts = {
215:       "tool"     => "bioruby",
216:       "retmode"  => "text",
217:     }
218:     opts.update(hash)
219: 
220:     case ids
221:     when Array
222:       list = ids
223:     else
224:       list = ids.to_s.split(/\s*,\s*/)
225:     end
226: 
227:     result = ""
228:     0.step(list.size, step) do |i|
229:       opts["id"] = list[i, step].join(',')
230:       unless opts["id"].empty?
231:         ncbi_access_wait
232:         response = Bio::Command.post_form(serv, opts)
233:         result += response.body
234:       end
235:     end
236:     return result.strip
237:     #return result.strip.split(/\n\n+/)
238:   end

List the NCBI database names E-Utils (einfo) service

 pubmed protein nucleotide nuccore nucgss nucest structure genome
 books cancerchromosomes cdd gap domains gene genomeprj gensat geo
 gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
 popset probe proteinclusters pcassay pccompound pcsubstance snp
 taxonomy toolkit unigene unists

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.einfo

 Bio::NCBI::REST.einfo

Returns:array of string (database names)

[Source]

    # File lib/bio/io/ncbirest.rb, line 68
68:   def einfo
69:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
70:     opts = {}
71:     response = Bio::Command.post_form(serv, opts)
72:     result = response.body
73:     list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten
74:     return list
75:   end

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.

For information on the possible arguments, see

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
 ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
 ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

 Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
 Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
 Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Arguments:

  • str: query string (required)
  • hash: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"}
    • db: "sequences", "nucleotide", "protein", "pubmed", "taxonomy", …
    • retmode: "text", "xml", "html", …
    • rettype: "gb", "medline", "count", …
    • retmax: integer (default 100)
    • retstart: integer
    • field:
      • "titl": Title [TI]
      • "tiab": Title/Abstract [TIAB]
      • "word": Text words [TW]
      • "auth": Author [AU]
      • "affl": Affiliation [AD]
      • "jour": Journal [TA]
      • "vol": Volume [VI]
      • "iss": Issue [IP]
      • "page": First page [PG]
      • "pdat": Publication date [DP]
      • "ptyp": Publication type [PT]
      • "lang": Language [LA]
      • "mesh": MeSH term [MH]
      • "majr": MeSH major topic [MAJR]
      • "subh": Mesh sub headings [SH]
      • "mhda": MeSH date [MHDA]
      • "ecno": EC/RN Number [rn]
      • "si": Secondary source ID [SI]
      • "uid": PubMed ID (PMID) [UI]
      • "fltr": Filter [FILTER] [SB]
      • "subs": Subset [SB]
    • reldate: 365
    • mindate: 2001
    • maxdate: 2002/01/01
    • datetype: "edat"
  • limit: maximum number of entries to be returned (0 for unlimited; nil for the "retmax" value in the hash or the internal default value (=100))
  • step: maximum number of entries retrieved at a time
Returns:array of entry IDs or a number of results

[Source]

     # File lib/bio/io/ncbirest.rb, line 135
135:   def esearch(str, hash = {}, limit = nil, step = 10000)
136:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
137:     opts = {
138:       "tool"   => "bioruby",
139:       "term"   => str,
140:     }
141:     opts.update(hash)
142: 
143:     case opts["rettype"]
144:     when "count"
145:       count = esearch_count(str, opts)
146:       return count
147:     else
148:       retstart = 0
149:       retstart = hash["retstart"].to_i if hash["retstart"]
150: 
151:       limit ||= hash["retmax"].to_i if hash["retmax"]
152:       limit ||= 100 # default limit is 100
153:       limit = esearch_count(str, opts) if limit == 0   # unlimit
154: 
155:       list = []
156:       0.step(limit, step) do |i|
157:         retmax = [step, limit - i].min
158:         opts.update("retmax" => retmax, "retstart" => i + retstart)
159:         ncbi_access_wait
160:         response = Bio::Command.post_form(serv, opts)
161:         result = response.body
162:         list += result.scan(/<Id>(.*?)<\/Id>/m).flatten
163:       end
164:       return list
165:     end
166:   end
Arguments:same as esearch method
Returns:array of entry IDs or a number of results

[Source]

     # File lib/bio/io/ncbirest.rb, line 170
170:   def esearch_count(str, hash = {})
171:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
172:     opts = {
173:       "tool"   => "bioruby",
174:       "term"   => str,
175:     }
176:     opts.update(hash)
177:     opts.update("rettype" => "count")
178:     #ncbi_access_wait
179:     response = Bio::Command.post_form(serv, opts)
180:     result = response.body
181:     count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i
182:     return count
183:   end

[Validate]