pyspark - Best way to do aggregations in Spark -

- April 15, 2015

i'm running out of memory when try aggregation. works fine, slow on small subset of data. i'm running in pyspark. there alternative way take average of column based on specific group run better?

df = df.groupby("id", "timestamp").avg("accel_lat", "accel_long", "accel_vert")

the other thing can think of data structure of id , timestamp. make sure 2 not strings. try reduce size of type or change schema of df.

Search This Blog

celery

pyspark - Best way to do aggregations in Spark -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

php - What are the best practices for creatiang a "settings" model in Laravel 5? -