Can I read a CSV represented as a string into Apache Spark using spark-csv -
i know how read csv file spark using spark-csv (https://github.com/databricks/spark-csv), have csv file represented string , convert string directly dataframe. possible?
actually can, though it's using library internals , not advertised. create , use own csvparser instance. example works me on spark 1.6.0 , spark-csv_2.10-1.4.0 below
import com.databricks.spark.csv.csvparser val csvdata = """ |userid,organizationid,userfirstname,usermiddlename,userlastname,usertitle |1,1,user1,m1,l1,mr |2,2,user2,m2,l2,mr |3,3,user3,m3,l3,mr |""".stripmargin val rdd = sc.parallelize(csvdata.lines.tolist) val csvparser = new csvparser() .withuseheader(true) .withinferschema(true) val csvdataframe: dataframe = csvparser.csvrdd(sqlcontext, rdd)
Comments
Post a Comment