Spark 1.5.2: NaN while calculating stddev -


i getting nan while calculating standard deviation (stddev). simple use case described below:

 val df = seq(("1",19603176695l),("2", 26438904194l),("3",29640527990l),("4",21034972928l),("5", 23975l)).todf("v","data") 

i have stddev defined udf:

def stddev(col: column) = {         sqrt(mean(col*col) - mean(col)*mean(col))  } 

i'm getting nan when call udf shown below:

df.agg(stddev(col("data")).as("stddev")).show()  

it produces following:

+------+ |stddev| +------+ |   nan| +------+ 

what doing wrong?

given data, both mean(col*col) , mean(col)*mean(col) larger maximum value of long. can try casting input columns double first:

df.agg(stddev(col("data").cast("double")).as("stddev")) 

but in general won't particularly stable on large numbers.


Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -