Spark 1.5.2: NaN while calculating stddev -
i getting nan while calculating standard deviation (stddev). simple use case described below:
val df = seq(("1",19603176695l),("2", 26438904194l),("3",29640527990l),("4",21034972928l),("5", 23975l)).todf("v","data") i have stddev defined udf:
def stddev(col: column) = { sqrt(mean(col*col) - mean(col)*mean(col)) } i'm getting nan when call udf shown below:
df.agg(stddev(col("data")).as("stddev")).show() it produces following:
+------+ |stddev| +------+ | nan| +------+ what doing wrong?
given data, both mean(col*col) , mean(col)*mean(col) larger maximum value of long. can try casting input columns double first:
df.agg(stddev(col("data").cast("double")).as("stddev")) but in general won't particularly stable on large numbers.
Comments
Post a Comment