pyspark.RDD.toLocalIterator#

RDD.toLocalIterator(prefetchPartitions=False)[source]#

Return an iterator that contains all of the elements in this RDD. The iterator will consume as much memory as the largest partition in this RDD. With prefetch it may consume up to the memory of the 2 largest partitions.

New in version 1.3.0.

Parameters

prefetchPartitionsbool, optional: If Spark should pre-fetch the next partition before it is needed.

Returns

collections.abc.Iterator: an iterator that contains all of the elements in this RDD

See also

RDD.collect()
pyspark.sql.DataFrame.toLocalIterator()

Examples

>>> rdd = sc.parallelize(range(10))
>>> [x for x in rdd.toLocalIterator()]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]