Output sorted text file from Google Cloud Dataflow -


i have pcollection<string> in google cloud dataflow , i'm outputting text files via textio.write.to:

pcollection<string> lines = ...; lines.apply(textio.write.to("gs://bucket/output.txt")); 

currently lines of each shard of output in random order.

is possible dataflow output lines in sorted order?

this not directly supported dataflow.

for bounded pcollection, if shard input finely enough, can write sorted files sink implementation sorts each shard. may want refer textsink implementation basic outline.


Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -