pyspark.RDD.pipe

RDD.pipe(command, env=None, checkCode=False)[source]

Return an RDD created by piping elements to a forked external process.

Parameters:
commandstr

command to run.

envdict, optional

environment variables to set.

checkCodebool, optional

whether or not to check the return value of the shell command.

Examples

>>> sc.parallelize(['1', '2', '', '3']).pipe('cat').collect()
['1', '2', '', '3']