variables - Checking which lines of a file are not in another file time efficiently in perl -

- March 15, 2010

i want compare file file , find out lines found in input file not file being compared to

this script right now

#!/usr/bin/perl $data_file = "file.txt"; @data; {     open $fh, "<", $data_file or die qq{unable open "$data_file" input: $1}; while ( <$fh> ) {     next unless /\s/;     push @data, [ split ];     } } $found; while ( <> ) {     $found=0;     ($num, $spot, $sstart, $sstop, $name, $id, $start, $stop) = split;     $item ( @data ) {         ($unum, $uspotstart, $uspotstop, $uspot, $udontuse, $ustart, $ustop, $uname) = @$item;         if ( $uname eq $name , $start == $ustart , $stop == $ustop , $unum eq $num ) {             $found=1;             last;         }     }     if ($found==0){         print $_;     } }

the script works problem can never finish compiling because file.txt contains 200,000 lines , input file contains 20,000 lines

this example of in file.txt

1   1729    1858    25  g   6600    6700    sam 15  9302    9030    12  t   3900    4500    frodo 19  0   2000    13  y   3300    3800    merry 20  0   510 13  h   6300    6500    pippin

while input file program

1   25  1600    1700    sam 40  6600    6700 15  11  1500    2000    frodo   67  3900    4500 15  11  1500    2000    frodo   67  3800    4500 17  10  3000    3100    bilbo   50  2300    2600 19  20  3400    3700    merry   39  3300    3800 20  90  3900    4200    pippin  80  6300    6500

this should output

15  11  1500    2000    frodo   67  3800    4500 17  10  3000    3100    bilbo   50  2300    2600

and amount of lines i'm looking @ can't time efficiently

i want script less processes involved when used on larger scale

thank-you!

use hash instead of array. if file.txt large, hash smaller input file. can use concatenation of important input fields key, , rest value, or use hash of hashes each important field key of level, , remaining values value (as string or array).

$hash{$name}{$start}{$stop}{$num} = [ $spot, $sstart, $sstop, $id ];

Search This Blog

celery

variables - Checking which lines of a file are not in another file time efficiently in perl -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -