jksrcなmatrix scanツール: dnaMotifFind
% dnaMotifFind dnaMotifFind - Locate preexisting motifs in DNA sequence usage: dnaMotifFind motifFile sequence.fa output.tab options: -markov=level Level of Markov background model - 0 1 or 2 -background=seq.fa Sequence to use for background model -threshold=N significance threshold (ln based, default 8.0) -rc Include reverse complement
例によって、ファイルはオレオレかつ変態的なわけで
% cat ~/tmp/kent/src/hg/lib/dnaMotif.as table dnaMotif "A gapless DNA motif" ( string name; "Motif name." int columnCount; "Count of columns in motif." float[columnCount] aProb; "Probability of A's in each column." float[columnCount] cProb; "Probability of C's in each column." float[columnCount] gProb; "Probability of G's in each column." float[columnCount] tProb; "Probability of T's in each column." )
つまり、こんな感じでmatrixを扱う
% cat ~/tmp/kent/src/hg/geneBounds/dnaMotifFind/fkh1.motif FKH1 8 0.115000,0.037000,0.165000,0.017000,0.005000,0.019000,0.754000,0.051000, 0.085000,0.024000,0.017000,0.038000,0.012000,0.021000,0.068000,0.599000, 0.051000,0.025000,0.775000,0.021000,0.021000,0.042000,0.080000,0.035000, 0.750000,0.914000,0.042000,0.924000,0.962000,0.918000,0.098000,0.315000,
う〜ん、変態的。
で、そんなヒトとつきあうためのコード。
#!/usr/bin/env ruby def pfm2motif(infile) ### read buf = [] open(infile) do |ifh| ifh.each{|line| buf << line.strip.split(/\s+/)} end ### column check columnCount = buf.collect{|row| row.length}.uniq raise "Inconsistent column number" unless columnCount.length == 1 ### format name = File.basename(infile,".pfm") buf = buf.collect{|row| row.collect{|cell| cell.to_f/100}.join(",")+","} return [name, columnCount.first, buf].flatten.join("\t") end infiles = ARGV infiles.each do |infile| puts pfm2motif(infile) end