Simple pyspark solutions
28 Nov 2018Here is a curation of some solutions to simple problems encountered when working with pyspark.
How to replace string in a column?
from pyspark.sql.functions import *
new_df = df.withColumn(col_name, regexp_replace(col_name, pattern, replacement))
How to avoid duplicate columns when joining two dataframe on columns with the same name?
df = left_df.join(right_df, ["name"])