pyspark.pandas.Series.pipe¶
- 
Series.pipe(func: Callable[[…], Any], *args: Any, **kwargs: Any) → Any¶
- Apply func(self, *args, **kwargs). - Parameters
- func: function
- function to apply to the DataFrame. - args, and- kwargsare passed into- func. Alternatively a- (callable, data_keyword)tuple where- data_keywordis a string indicating the keyword of- callablethat expects the DataFrames.
- args: iterable, optional
- positional arguments passed into - func.
- kwargs: mapping, optional
- a dictionary of keyword arguments passed into - func.
 
- Returns
- object: the return type of func.
 
- object: the return type of 
 - Notes - Use - .pipewhen chaining together functions that expect Series, DataFrames or GroupBy objects. For example, given- >>> df = ps.DataFrame({'category': ['A', 'A', 'B'], ... 'col1': [1, 2, 3], ... 'col2': [4, 5, 6]}, ... columns=['category', 'col1', 'col2']) >>> def keep_category_a(df): ... return df[df['category'] == 'A'] >>> def add_one(df, column): ... return df.assign(col3=df[column] + 1) >>> def multiply(df, column1, column2): ... return df.assign(col4=df[column1] * df[column2]) - instead of writing - >>> multiply(add_one(keep_category_a(df), column="col1"), column1="col2", column2="col3") category col1 col2 col3 col4 0 A 1 4 2 8 1 A 2 5 3 15 - You can write - >>> (df.pipe(keep_category_a) ... .pipe(add_one, column="col1") ... .pipe(multiply, column1="col2", column2="col3") ... ) category col1 col2 col3 col4 0 A 1 4 2 8 1 A 2 5 3 15 - If you have a function that takes the data as the second argument, pass a tuple indicating which keyword expects the data. For example, suppose - ftakes its data as- df:- >>> def multiply_2(column1, df, column2): ... return df.assign(col4=df[column1] * df[column2]) - Then you can write - >>> (df.pipe(keep_category_a) ... .pipe(add_one, column="col1") ... .pipe((multiply_2, 'df'), column1="col2", column2="col3") ... ) category col1 col2 col3 col4 0 A 1 4 2 8 1 A 2 5 3 15 - You can use lambda as well - >>> ps.Series([1, 2, 3]).pipe(lambda x: (x + 1).rename("value")) 0 2 1 3 2 4 Name: value, dtype: int64