pyspark - Best way to do aggregations in Spark -


i'm running out of memory when try aggregation. works fine, slow on small subset of data. i'm running in pyspark. there alternative way take average of column based on specific group run better?

df = df.groupby("id", "timestamp").avg("accel_lat", "accel_long", "accel_vert") 

the other thing can think of data structure of id , timestamp. make sure 2 not strings. try reduce size of type or change schema of df.


Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -