pyspark - Best way to do aggregations in Spark -
i'm running out of memory when try aggregation. works fine, slow on small subset of data. i'm running in pyspark. there alternative way take average of column based on specific group run better?
df = df.groupby("id", "timestamp").avg("accel_lat", "accel_long", "accel_vert")
the other thing can think of data structure of id , timestamp. make sure 2 not strings. try reduce size of type or change schema of df.
Comments
Post a Comment