Clare S. Y. Huang Data Scientist | Atmospheric Dynamicist

Simple pyspark solutions

Here is a curation of some solutions to simple problems encountered when working with pyspark.

How to replace string in a column?

Source

from pyspark.sql.functions import *
new_df = df.withColumn(col_name, regexp_replace(col_name, pattern, replacement))

How to avoid duplicate columns when joining two dataframe on columns with the same name?

Source

df = left_df.join(right_df, ["name"])
<< Previous Page