Spark Streaming With Kafka Direct approach: horizontal scalability issue -

- July 15, 2013

i facing problem in using spark streaming apache kafka spark deployed on yarn. using direct approach (no receivers) read data kafka 1 topic , 48 partitions. setup on 5 node (4 worker) spark cluster (24 gb memory available on each machine) , spark configurations (spark.executor.memory=2gb, spark.executor.cores=1), there should 48 executors on spark cluster (12 executor on each machine).

spark streaming documentation confirms there one-to-one mapping between kafka , rdd partitions. 48 kafka partitions, there should 48 rdd partitions , each partition being executed 1 executor.but while running this, 12 executors created , spark cluster capacity remains unused & not able desired throughput.

it seems direct approach read data kafka in spark streaming not behaving according spark streaming documentation. can suggest, wrong doing here not able scale horizontally increase throughput.

Search This Blog

celery

Spark Streaming With Kafka Direct approach: horizontal scalability issue -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

php - What are the best practices for creatiang a "settings" model in Laravel 5? -