Spark 1.5.2: NaN while calculating stddev -
i getting nan while calculating standard deviation (stddev). simple use case described below:
val df = seq(("1",19603176695l),("2", 26438904194l),("3",29640527990l),("4",21034972928l),("5", 23975l)).todf("v","data")
i have stddev defined udf:
def stddev(col: column) = { sqrt(mean(col*col) - mean(col)*mean(col)) }
i'm getting nan
when call udf shown below:
df.agg(stddev(col("data")).as("stddev")).show()
it produces following:
+------+ |stddev| +------+ | nan| +------+
what doing wrong?
given data, both mean(col*col)
, mean(col)*mean(col)
larger maximum value of long
. can try casting input columns double
first:
df.agg(stddev(col("data").cast("double")).as("stddev"))
but in general won't particularly stable on large numbers.
Comments
Post a Comment