variables - Checking which lines of a file are not in another file time efficiently in perl -
i want compare file file , find out lines found in input file not file being compared to
this script right now
#!/usr/bin/perl $data_file = "file.txt"; @data; { open $fh, "<", $data_file or die qq{unable open "$data_file" input: $1}; while ( <$fh> ) { next unless /\s/; push @data, [ split ]; } } $found; while ( <> ) { $found=0; ($num, $spot, $sstart, $sstop, $name, $id, $start, $stop) = split; $item ( @data ) { ($unum, $uspotstart, $uspotstop, $uspot, $udontuse, $ustart, $ustop, $uname) = @$item; if ( $uname eq $name , $start == $ustart , $stop == $ustop , $unum eq $num ) { $found=1; last; } } if ($found==0){ print $_; } }
the script works problem can never finish compiling because file.txt contains 200,000 lines , input file contains 20,000 lines
this example of in file.txt
1 1729 1858 25 g 6600 6700 sam 15 9302 9030 12 t 3900 4500 frodo 19 0 2000 13 y 3300 3800 merry 20 0 510 13 h 6300 6500 pippin
while input file program
1 25 1600 1700 sam 40 6600 6700 15 11 1500 2000 frodo 67 3900 4500 15 11 1500 2000 frodo 67 3800 4500 17 10 3000 3100 bilbo 50 2300 2600 19 20 3400 3700 merry 39 3300 3800 20 90 3900 4200 pippin 80 6300 6500
this should output
15 11 1500 2000 frodo 67 3800 4500 17 10 3000 3100 bilbo 50 2300 2600
and amount of lines i'm looking @ can't time efficiently
i want script less processes involved when used on larger scale
thank-you!
use hash instead of array. if file.txt
large, hash smaller input file. can use concatenation of important input fields key, , rest value, or use hash of hashes each important field key of level, , remaining values value (as string or array).
$hash{$name}{$start}{$stop}{$num} = [ $spot, $sstart, $sstop, $id ];
Comments
Post a Comment