Class | Bio::Sequence::NA |
In: |
lib/bio/sequence/na.rb
lib/bio/sequence/compat.rb lib/bio/shell/plugin/midi.rb |
Parent: | String |
Generate an nucleic acid sequence object from a string.
s = Bio::Sequence::NA.new("aagcttggaccgttgaagt")
or maybe (if you have an nucleic acid sequence in a file)
s = Bio::Sequence:NA.new(File.open('dna.txt').read)
Nucleic Acid sequences are always all lowercase in bioruby
s = Bio::Sequence::NA.new("AAGcTtGG") puts s #=> "aagcttgg"
Whitespace is stripped from the sequence
seq = Bio::Sequence::NA.new("atg\nggg\ttt\r gc") puts s #=> "atggggttgc"
Arguments:
Returns: | Bio::Sequence::NA object |
# File lib/bio/sequence/na.rb, line 75 75: def initialize(str) 76: super 77: self.downcase! 78: self.tr!(" \t\n\r",'') 79: end
Generate a new random sequence with the given frequency of bases. The sequence length is determined by their cumulative sum. (See also Bio::Sequence::Common#randomize which creates a new randomized sequence object using the base composition of an existing sequence instance).
counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4} puts Bio::Sequence::NA.randomize(counts) #=> "ggcttgttac" (for example)
You may also feed the output of randomize into a block
actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0} Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1} actual_counts #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4}
Arguments:
Returns: | Bio::Sequence::NA object |
# File lib/bio/sequence/compat.rb, line 82 82: def self.randomize(*arg, &block) 83: self.new('').randomize(*arg, &block) 84: end
Calculate the ratio of AT / ATGC bases. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga') puts s.at_content #=> 0.444444444444444
Returns: | Float |
# File lib/bio/sequence/na.rb, line 317 317: def at_content 318: count = self.composition 319: at = count['a'] + count['t'] + count['u'] 320: gc = count['g'] + count['c'] 321: return 0.0 if at + gc == 0 322: return at.quo(at + gc) 323: end
Calculate the ratio of (A - T) / (A + T) bases. U is regarded as T.
s = Bio::Sequence::NA.new('atgttgttgttc') puts s.at_skew #=> -0.75
Returns: | Float |
# File lib/bio/sequence/na.rb, line 345 345: def at_skew 346: count = self.composition 347: a = count['a'] 348: t = count['t'] + count['u'] 349: return 0.0 if a + t == 0 350: return (a - t).quo(a + t) 351: end
Returns counts of each codon in the sequence in a hash.
s = Bio::Sequence::NA.new('atggcgtga') puts s.codon_usage #=> {"gcg"=>1, "tga"=>1, "atg"=>1}
This method does not validate codons! Any three letter group is a ‘codon’. So,
s = Bio::Sequence::NA.new('atggNNtga') puts s.codon_usage #=> {"tga"=>1, "gnn"=>1, "atg"=>1} seq = Bio::Sequence::NA.new('atgg--tga') puts s.codon_usage #=> {"tga"=>1, "g--"=>1, "atg"=>1}
Also, there is no option to work in any frame other than the first.
Returns: | Hash object |
# File lib/bio/sequence/na.rb, line 273 273: def codon_usage 274: hash = Hash.new(0) 275: self.window_search(3, 3) do |codon| 276: hash[codon] += 1 277: end 278: return hash 279: end
Example:
seq = Bio::Sequence::NA.new('gaattc') cuts = seq.cut_with_enzyme('EcoRI')
or
seq = Bio::Sequence::NA.new('gaattc') cuts = seq.cut_with_enzyme('g^aattc')
See Bio::RestrictionEnzyme::Analysis.cut
# File lib/bio/sequence/na.rb, line 479 479: def cut_with_enzyme(*args) 480: Bio::RestrictionEnzyme::Analysis.cut(self, *args) 481: end
Returns a new sequence object with any ‘u’ bases changed to ‘t’. The original sequence is not modified.
s = Bio::Sequence::NA.new('augc') puts s.dna #=> 'atgc' puts s #=> 'augc'
Returns: | new Bio::Sequence::NA object |
# File lib/bio/sequence/na.rb, line 423 423: def dna 424: self.tr('u', 't') 425: end
Changes any ‘u’ bases in the original sequence to ‘t’. The original sequence is modified.
s = Bio::Sequence::NA.new('augc') puts s.dna! #=> 'atgc' puts s #=> 'atgc'
Returns: | current Bio::Sequence::NA object (modified) |
# File lib/bio/sequence/na.rb, line 435 435: def dna! 436: self.tr!('u', 't') 437: end
Returns a new complementary sequence object (without reversing). The original sequence object is not modified.
s = Bio::Sequence::NA.new('atgc') puts s.forward_complement #=> 'tacg' puts s #=> 'atgc'
Returns: | new Bio::Sequence::NA object |
# File lib/bio/sequence/na.rb, line 100 100: def forward_complement 101: s = self.class.new(self) 102: s.forward_complement! 103: s 104: end
Converts the current sequence into its complement (without reversing). The original sequence object is modified.
seq = Bio::Sequence::NA.new('atgc') puts s.forward_complement! #=> 'tacg' puts s #=> 'tacg'
Returns: | current Bio::Sequence::NA object (modified) |
# File lib/bio/sequence/na.rb, line 114 114: def forward_complement! 115: if self.rna? 116: self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn') 117: else 118: self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn') 119: end 120: self 121: end
Calculate the ratio of GC / ATGC bases. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga') puts s.gc_content #=> 0.555555555555556
Returns: | Float |
# File lib/bio/sequence/na.rb, line 303 303: def gc_content 304: count = self.composition 305: at = count['a'] + count['t'] + count['u'] 306: gc = count['g'] + count['c'] 307: return 0.0 if at + gc == 0 308: return gc.quo(at + gc) 309: end
Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga') puts s.gc_percent #=> 55
Returns: | Fixnum |
# File lib/bio/sequence/na.rb, line 288 288: def gc_percent 289: count = self.composition 290: at = count['a'] + count['t'] + count['u'] 291: gc = count['g'] + count['c'] 292: return 0 if at + gc == 0 293: gc = 100 * gc / (at + gc) 294: return gc 295: end
Calculate the ratio of (G - C) / (G + C) bases.
s = Bio::Sequence::NA.new('atggcgtga') puts s.gc_skew #=> 0.6
Returns: | Float |
# File lib/bio/sequence/na.rb, line 331 331: def gc_skew 332: count = self.composition 333: g = count['g'] 334: c = count['c'] 335: return 0.0 if g + c == 0 336: return (g - c).quo(g + c) 337: end
Returns an alphabetically sorted array of any non-standard bases (other than ‘atgcu’).
s = Bio::Sequence::NA.new('atgStgQccR') puts s.illegal_bases #=> ["q", "r", "s"]
Returns: | Array object |
# File lib/bio/sequence/na.rb, line 360 360: def illegal_bases 361: self.scan(/[^atgcu]/).sort.uniq 362: end
Estimate molecular weight (using the values from BioPerl‘s SeqStats.pm module).
s = Bio::Sequence::NA.new('atggcgtga') puts s.molecular_weight #=> 2841.00708
RNA and DNA do not have the same molecular weights,
s = Bio::Sequence::NA.new('auggcguga') puts s.molecular_weight #=> 2956.94708
Returns: | Float object |
# File lib/bio/sequence/na.rb, line 376 376: def molecular_weight 377: if self.rna? 378: Bio::NucleicAcid.weight(self, true) 379: else 380: Bio::NucleicAcid.weight(self) 381: end 382: end
Generate the list of the names of each nucleotide along with the sequence (full name). Names used in bioruby are found in the Bio::AminoAcid::NAMES hash.
s = Bio::Sequence::NA.new('atg') puts s.names #=> ["Adenine", "Thymine", "Guanine"]
Returns: | Array object |
# File lib/bio/sequence/na.rb, line 407 407: def names 408: array = [] 409: self.each_byte do |x| 410: array.push(Bio::NucleicAcid.names[x.chr.upcase]) 411: end 412: return array 413: end
Returns a new sequence object with the reverse complement sequence to the original. The original sequence is not modified.
s = Bio::Sequence::NA.new('atgc') puts s.reverse_complement #=> 'gcat' puts s #=> 'atgc'
Returns: | new Bio::Sequence::NA object |
# File lib/bio/sequence/na.rb, line 131 131: def reverse_complement 132: s = self.class.new(self) 133: s.reverse_complement! 134: s 135: end
Converts the original sequence into its reverse complement. The original sequence is modified.
s = Bio::Sequence::NA.new('atgc') puts s.reverse_complement #=> 'gcat' puts s #=> 'gcat'
Returns: | current Bio::Sequence::NA object (modified) |
# File lib/bio/sequence/na.rb, line 145 145: def reverse_complement! 146: self.reverse! 147: self.forward_complement! 148: end
Returns a new sequence object with any ‘t’ bases changed to ‘u’. The original sequence is not modified.
s = Bio::Sequence::NA.new('atgc') puts s.dna #=> 'augc' puts s #=> 'atgc'
Returns: | new Bio::Sequence::NA object |
# File lib/bio/sequence/na.rb, line 447 447: def rna 448: self.tr('t', 'u') 449: end
Changes any ‘t’ bases in the original sequence to ‘u’. The original sequence is modified.
s = Bio::Sequence::NA.new('atgc') puts s.dna! #=> 'augc' puts s #=> 'augc'
Returns: | current Bio::Sequence::NA object (modified) |
# File lib/bio/sequence/na.rb, line 459 459: def rna! 460: self.tr!('t', 'u') 461: end
style:
Hash of :tempo, :scale, :tones
scale:
C C# D D# E F F# G G# A A# B 0 1 2 3 4 5 6 7 8 9 10 11
tones:
Hash of :prog, :base, :range -- tone, vol? or len?, octaves
drum:
true (with rhythm part), false (without rhythm part)
# File lib/bio/shell/plugin/midi.rb, line 351 351: def to_midi(style = {}, drum = true) 352: default = MidiTrack::Styles["Ichinose"] 353: if style.is_a?(String) 354: style = MidiTrack::Styles[style] || default 355: end 356: tempo = style[:tempo] || default[:tempo] 357: scale = style[:scale] || default[:scale] 358: tones = style[:tones] || default[:tones] 359: 360: track = [] 361: 362: tones.each_with_index do |tone, i| 363: ch = i 364: ch += 1 if i >= 9 # skip rythm track 365: track.push MidiTrack.new(ch, tone[:prog], tone[:base], tone[:range], scale) 366: end 367: 368: if drum 369: rhythm = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 370: track.push(MidiTrack.new(9, 0, 35, 2, rhythm)) 371: end 372: 373: cur = 0 374: window_search(4) do |s| 375: track[cur % track.length].push(s) 376: cur += 1 377: end 378: 379: track.each do |t| 380: t.push_silent(12) 381: end 382: 383: ans = track[0].header(track.length, tempo) 384: track.each do |t| 385: ans += t.encode 386: end 387: return ans 388: end
Create a ruby regular expression instance (Regexp)
s = Bio::Sequence::NA.new('atggcgtga') puts s.to_re #=> /atggcgtga/
Returns: | Regexp object |
# File lib/bio/sequence/na.rb, line 391 391: def to_re 392: if self.rna? 393: Bio::NucleicAcid.to_re(self.dna, true) 394: else 395: Bio::NucleicAcid.to_re(self) 396: end 397: end
Translate into an amino acid sequence.
s = Bio::Sequence::NA.new('atggcgtga') puts s.translate #=> "MA*"
By default, translate starts in reading frame position 1, but you can start in either 2 or 3 as well,
puts s.translate(2) #=> "WR" puts s.translate(3) #=> "GV"
You may also translate the reverse complement in one step by using frame values of -1, -2, and -3 (or 4, 5, and 6)
puts s.translate(-1) #=> "SRH" puts s.translate(4) #=> "SRH" puts s.reverse_complement.translate(1) #=> "SRH"
The default codon table in the translate function is the Standard Eukaryotic codon table. The translate function takes either a number or a Bio::CodonTable object for its table argument. The available tables are (NCBI):
1. "Standard (Eukaryote)" 2. "Vertebrate Mitochondrial" 3. "Yeast Mitochondorial" 4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma" 5. "Invertebrate Mitochondrial" 6. "Ciliate Macronuclear and Dasycladacean" 9. "Echinoderm Mitochondrial" 10. "Euplotid Nuclear" 11. "Bacteria" 12. "Alternative Yeast Nuclear" 13. "Ascidian Mitochondrial" 14. "Flatworm Mitochondrial" 15. "Blepharisma Macronuclear" 16. "Chlorophycean Mitochondrial" 21. "Trematode Mitochondrial" 22. "Scenedesmus obliquus mitochondrial" 23. "Thraustochytrium Mitochondrial"
If you are using anything other than the default table, you must specify frame in the translate method call,
puts s.translate #=> "MA*" (using defaults) puts s.translate(1,1) #=> "MA*" (same as above, but explicit) puts s.translate(1,2) #=> "MAW" (different codon table)
and using a Bio::CodonTable instance in the translate method call,
mt_table = Bio::CodonTable[2] puts s.translate(1, mt_table) #=> "MAW"
By default, any invalid or unknown codons (as could happen if the sequence contains ambiguities) will be represented by ‘X’ in the translated sequence. You may change this to any character of your choice.
s = Bio::Sequence::NA.new('atgcNNtga') puts s.translate #=> "MX*" puts s.translate(1,1,'9') #=> "M9*"
The translate method considers gaps to be unknown characters and treats them as such (i.e. does not collapse sequences prior to translation), so
s = Bio::Sequence::NA.new('atgc--tga') puts s.translate #=> "MX*"
Arguments:
Returns: | Bio::Sequence::AA object |
# File lib/bio/sequence/na.rb, line 232 232: def translate(frame = 1, table = 1, unknown = 'X') 233: if table.is_a?(Bio::CodonTable) 234: ct = table 235: else 236: ct = Bio::CodonTable[table] 237: end 238: naseq = self.dna 239: case frame 240: when 1, 2, 3 241: from = frame - 1 242: when 4, 5, 6 243: from = frame - 4 244: naseq.complement! 245: when -1, -2, -3 246: from = -1 - frame 247: naseq.complement! 248: else 249: from = 0 250: end 251: nalen = naseq.length - from 252: nalen -= nalen % 3 253: aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown} 254: return Bio::Sequence::AA.new(aaseq) 255: end