pyspark.sql.DataFrame.to#

DataFrame.to(schema)[source]#

Returns a new DataFrame where each row is reconciled to match the specified schema.

New in version 3.4.0.

Parameters

schemaStructType: Specified schema.

Returns

DataFrame: Reconciled DataFrame.

Notes

Reorder columns and/or inner fields by name to match the specified schema.
Project away columns and/or inner fields that are not needed by the specified schema.
Missing columns and/or inner fields (present in the specified schema but not input DataFrame) lead to failures.
Cast the columns and/or inner fields to match the data types in the specified schema,
if the types are compatible, e.g., numeric to numeric (error if overflows), but not string to int.
Carry over the metadata from the specified schema, while the columns and/or inner fields
still keep their own metadata if not overwritten by the specified schema.
Fail if the nullability is not compatible. For example, the column and/or inner field
is nullable but the specified schema requires them to be not nullable.

Supports Spark Connect.

Examples

>>> from pyspark.sql.types import StructField, StringType
>>> df = spark.createDataFrame([("a", 1)], ["i", "j"])
>>> df.schema
StructType([StructField('i', StringType(), True), StructField('j', LongType(), True)])

>>> schema = StructType([StructField("j", StringType()), StructField("i", StringType())])
>>> df2 = df.to(schema)
>>> df2.schema
StructType([StructField('j', StringType(), True), StructField('i', StringType(), True)])
>>> df2.show()
+---+---+
|  j|  i|
+---+---+
|  1|  a|
+---+---+