How do I filter data from a CSV file in Excel or with Ruby? -
i have huge csv file (over 57,000 rows , 50 columns) need analyze.
edit: hi guys, answers , comments, still confused how in ruby, , have no idea how use mysql. try more specific:
the csv files:
csv on storm data details 2015
csv on storm data details 2000
the questions: prior question start, answers, exclude rows have county/parish, zone, or marine name begins letters a, b, or c.
- find month in 2015 state of washington had largest number of storm events. how many days of storm-free weather occurred in month?
- how many storms impacting trees happened between 8pm est , 8am est in 2000?
- in year (2000 or 2015) did storms have higher monetary impact within boundaries of 13 original colonies?
the problems:
1) able use filters in excel determine "thunderstorm wind" events in washington happened in july (6 entries), , there 27 days of storm-free weather. however, when tried check work in spotfire, got different results. (7 entries in may, , 28 days of storm-free weather in may. excel found 2 thunderstorm wind events in may.) know causing discrepancy?
2) there 2 columns damage trees might mentioned: event_narrative , episode_narrative. possible search both columns "tree" , filter spreadsheet down results? multiple-column filtering apparently impossible in excel. need find way omit word "street" in results (because contains word "tree").
the method came time range filter est , ast results, filter begin_time 2000 2359 , 0 759 , repeat ranges filter end_time. appears work.
3) able filter states delaware, pennsylvania, new jersey, georgia, connecticut, massachusetts, maryland, south carolina, new hampshire, virginia, new york, north carolina, , rhode island. seems simple task add values in columns y , z (damage_property, damage_crops) , compare between 2 years, values written in form "32.79k" , cannot figure out how make adding equation work in format or convert values integers.
also, question asking original territory of colonies, not same territory states occupy. know of way resolve issue? if had time each city listed, there not seem database of cities in original 13 colonies online, , if there was, names of cities may different.
i learning ruby , people have suggested try use ruby csv library put data array. have looked @ tutorials sort of describe how that, still don't understand how filter data down need.
can help?
thank you!
i downloaded data play it. can record count pretty in ruby. did in irb:
require 'csv'  details = [] csv.foreach("stormevents_details-ftp_v1.0_d2015_c20160818.csv") |row|   details << row end  results = details.select |field|   [field[-2], field[-3]].any? { |el| el[/\btree\b/i] } && field[8] == "california" end  results.count  => 125 i used array indices. zip things , make hashes better readability.
Comments
Post a Comment