Class | Bio::PhyloXML::Parser |
In: |
lib/bio/db/phyloxml/phyloxml_parser.rb
|
Parent: | Object |
Bio::PhyloXML::Parser is for parsing phyloXML format files.
Libxml2 XML parser is required. Install libxml-ruby bindings from libxml.rubyforge.org or
gem install -r libxml-ruby
require 'bio' # Create new phyloxml parser phyloxml = Bio::PhyloXML::Parser.open('example.xml') # Print the names of all trees in the file phyloxml.each do |tree| puts tree.name end
www.phyloxml.org/documentation/version_100/phyloxml.xsd.html
other | [R] | After parsing all the trees, if there is anything else in other xml format, it is saved in this array of PhyloXML::Other objects |
Initializes LibXML::Reader and reads from the IO until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.for_io($stdin)
Arguments:
Returns: | Bio::PhyloXML::Parser object |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 218 218: def self.for_io(io, validate=true) 219: obj = new(nil, validate) 220: obj.instance_eval { 221: @reader = XML::Reader.io(io, 222: { :options => 223: LibXML::XML::Parser::Options::NONET }) 224: _skip_leader 225: } 226: obj 227: end
Initializes LibXML::Reader and reads the PhyloXML-formatted string until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
str = File.read("./phyloxml_examples.xml") p = Bio::PhyloXML::Parser.new(str)
Deprecated usage: Reads data from a file. <em>str<em> is a filename.
p = Bio::PhyloXML::Parser.new("./phyloxml_examples.xml")
Taking filename is deprecated. Use Bio::PhyloXML::Parser.open(filename).
Arguments:
Returns: | Bio::PhyloXML::Parser object |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 318 318: def initialize(str, validate=true) 319: 320: @other = [] 321: 322: return unless str 323: 324: # For compatibility, if filename-like string is given, 325: # treat it as a filename. 326: if /[\<\>\r\n]/ !~ str and File.exist?(str) then 327: # assume that str is filename 328: warn "Bio::PhyloXML::Parser.new(filename) is deprecated. Use Bio::PhyloXML::Parser.open(filename)." 329: filename = _secure_filename(str) 330: _validate(:file, filename) if validate 331: @reader = XML::Reader.file(filename) 332: _skip_leader 333: return 334: end 335: 336: # initialize for string 337: @reader = XML::Reader.string(str, 338: { :options => 339: LibXML::XML::Parser::Options::NONET }) 340: _skip_leader 341: end
Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.
Example: Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml")
If the optional code block is given, Bio::PhyloXML object is passed to the block as an argument. When the block terminates, the Bio::PhyloXML object is automatically closed, and the open method returns the value of the block.
Example: Get the first tree in the file.
tree = Bio::PhyloXML::Parser.open("example.xml") do |px| px.next_tree end
Arguments:
Returns: | (without block) Bio::PhyloXML::Parser object |
Returns: | (with block) the value of the block |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 102 102: def self.open(filename, validate=true) 103: obj = new(nil, validate) 104: obj.instance_eval { 105: filename = _secure_filename(filename) 106: _validate(:file, filename) if validate 107: # XML::Parser::Options::NONET for security reason 108: @reader = XML::Reader.file(filename, 109: { :options => 110: LibXML::XML::Parser::Options::NONET }) 111: _skip_leader 112: } 113: if block_given? then 114: begin 115: ret = yield obj 116: ensure 117: obj.close if obj and !obj.closed? 118: end 119: ret 120: else 121: obj 122: end 123: end
Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.
Create a new Bio::PhyloXML::Parser object.
p = Bio::PhyloXML::Parser.open_uri("http://www.phyloxml.org/examples/apaf.xml")
If the optional code block is given, Bio::PhyloXML object is passed to the block as an argument. When the block terminates, the Bio::PhyloXML object is automatically closed, and the open_uri method returns the value of the block.
Arguments:
Returns: | (without block) Bio::PhyloXML::Parser object |
Returns: | (with block) the value of the block |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 143 143: def self.open_uri(uri, validate=true) 144: case uri 145: when URI 146: uri = uri.to_s 147: else 148: # raises error if not a String 149: uri = uri.to_str 150: # raises error if invalid URI 151: URI.parse(uri) 152: end 153: 154: obj = new(nil, validate) 155: obj.instance_eval { 156: @reader = XML::Reader.file(uri) 157: _skip_leader 158: } 159: if block_given? then 160: begin 161: ret = yield obj 162: ensure 163: obj.close if obj and !obj.closed? 164: end 165: else 166: obj 167: end 168: end
Access the specified tree in the file. It parses trees until the specified tree is reached.
# Get 3rd tree in the file (starts counting from 0). parser = PhyloXML::Parser.open('phyloxml_examples.xml') tree = parser[2]
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 364 364: def [](i) 365: tree = nil 366: (i+1).times do 367: tree = self.next_tree 368: end 369: return tree 370: end
Closes the LibXML::Reader inside the object. It also closes the opened file if it is created by using Bio::PhyloXML::Parser.open method.
When closed object is closed again, or closed object is used, it raises LibXML::XML::Error.
Returns: | nil |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 188 188: def close 189: @reader.close 190: @reader = ClosedPhyloXMLParser.new 191: nil 192: end
If the object is closed by using the close method or equivalent, returns true. Otherwise, returns false.
Returns: | true or false |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 198 198: def closed? 199: if @reader.kind_of?(ClosedPhyloXMLParser) then 200: true 201: else 202: false 203: end 204: end
Iterate through all trees in the file.
phyloxml = Bio::PhyloXML::Parser.open('example.xml') phyloxml.each do |tree| puts tree.name end
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 351 351: def each 352: while tree = next_tree 353: yield tree 354: end 355: end
Parse and return the next phylogeny tree. If there are no more phylogeny element, nil is returned. If there is something else besides phylogeny elements, it is saved in the PhyloXML::Parser#other.
p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml") tree = p.next_tree
Returns: | Bio::PhyloXML::Tree |
# File lib/bio/db/phyloxml/phyloxml_parser.rb, line 381 381: def next_tree() 382: 383: if not is_element?('phylogeny') 384: if @reader.node_type == XML::Reader::TYPE_END_ELEMENT 385: if is_end_element?('phyloxml') 386: return nil 387: else 388: @reader.read 389: @reader.read 390: if is_end_element?('phyloxml') 391: return nil 392: end 393: end 394: end 395: # phyloxml can hold only phylogeny and "other" elements. If this is not 396: # phylogeny element then it is other. Also, "other" always comes after 397: # all phylogenies 398: @other << parse_other 399: #return nil for tree, since this is not valid phyloxml tree. 400: return nil 401: end 402: 403: tree = Bio::PhyloXML::Tree.new 404: 405: # keep track of current node in clades array/stack. Current node is the 406: # last element in the clades array 407: clades = [] 408: clades.push tree 409: 410: #keep track of current edge to be able to parse branch_length tag 411: current_edge = nil 412: 413: # we are going to parse clade iteratively by pointing (and changing) to 414: # the current node in the tree. Since the property element is both in 415: # clade and in the phylogeny, we need some boolean to know if we are 416: # parsing the clade (there can be only max 1 clade in phylogeny) or 417: # parsing phylogeny 418: parsing_clade = false 419: 420: while not is_end_element?('phylogeny') do 421: break if is_end_element?('phyloxml') 422: 423: # parse phylogeny elements, except clade 424: if not parsing_clade 425: 426: if is_element?('phylogeny') 427: @reader["rooted"] == "true" ? tree.rooted = true : tree.rooted = false 428: @reader["rerootable"] == "true" ? tree.rerootable = true : tree.rerootable = false 429: parse_attributes(tree, ["branch_length_unit", 'type']) 430: end 431: 432: parse_simple_elements(tree, [ "name", 'description', "date"]) 433: 434: if is_element?('confidence') 435: tree.confidences << parse_confidence 436: end 437: 438: end 439: 440: if @reader.node_type == XML::Reader::TYPE_ELEMENT 441: case @reader.name 442: when 'clade' 443: #parse clade element 444: 445: parsing_clade = true 446: 447: node= Bio::PhyloXML::Node.new 448: 449: branch_length = @reader['branch_length'] 450: 451: parse_attributes(node, ["id_source"]) 452: 453: #add new node to the tree 454: tree.add_node(node) 455: # The first clade will always be root since by xsd schema phyloxml can 456: # have 0 to 1 clades in it. 457: if tree.root == nil 458: tree.root = node 459: else 460: current_edge = tree.add_edge(clades[-1], node, 461: Bio::Tree::Edge.new(branch_length)) 462: end 463: clades.push node 464: #end if clade element 465: else 466: parse_clade_elements(clades[-1], current_edge) if parsing_clade 467: end 468: end 469: 470: #end clade element, go one parent up 471: if is_end_element?('clade') 472: 473: #if we have reached the closing tag of the top-most clade, then our 474: # curent node should point to the root, If thats the case, we are done 475: # parsing the clade element 476: if clades[-1] == tree.root 477: parsing_clade = false 478: else 479: # set current node (clades[-1) to the previous clade in the array 480: clades.pop 481: end 482: end 483: 484: #parsing phylogeny elements 485: if not parsing_clade 486: 487: if @reader.node_type == XML::Reader::TYPE_ELEMENT 488: case @reader.name 489: when 'property' 490: tree.properties << parse_property 491: 492: when 'clade_relation' 493: clade_relation = CladeRelation.new 494: parse_attributes(clade_relation, ["id_ref_0", "id_ref_1", "distance", "type"]) 495: 496: #@ add unit test for this 497: if not @reader.empty_element? 498: @reader.read 499: if is_element?('confidence') 500: clade_relation.confidence = parse_confidence 501: end 502: end 503: tree.clade_relations << clade_relation 504: 505: when 'sequence_relation' 506: sequence_relation = SequenceRelation.new 507: parse_attributes(sequence_relation, ["id_ref_0", "id_ref_1", "distance", "type"]) 508: if not @reader.empty_element? 509: @reader.read 510: if is_element?('confidence') 511: sequence_relation.confidence = parse_confidence 512: end 513: end 514: tree.sequence_relations << sequence_relation 515: when 'phylogeny' 516: #do nothing 517: else 518: tree.other << parse_other 519: #puts "Not recognized element. #{@reader.name}" 520: end 521: end 522: end 523: # go to next element 524: @reader.read 525: end #end while not </phylogeny> 526: #move on to the next tag after /phylogeny which is text, since phylogeny 527: #end tag is empty element, which value is nil, therefore need to move to 528: #the next meaningful element (therefore @reader.read twice) 529: @reader.read 530: @reader.read 531: 532: return tree 533: end