Situation A: I have a dataset with favourite music genre = [rock, jazz, pop, jazz, folk, metal, funky, blues, soul, rock, pop].
For some really advanced and scientific reasons, I want to recode them to two groups, 1 — quite loud [pop, rock, metal, and funky] and all other geners as 0 — not quite loud.
The following function checks if the genre is defined as loud, if so it returns 1, if not, a 0.
def check_category(source):
list = [pop, rock, metal, and funky]
if source in list:
return 1
return 0
You can apply the function as follows and make a new variable called loudness:
df["loudness"] = df["genre"].apply(check_category)
Situation B: I want to create a new column that registers observations with sugar intake > 30g a day as “sweet-tooth” and observations with sugar intake < 30g a day as “healthy” (again, for really scientific reasons).
The following function should do it:
def generate_new_column(row):
if row["sugar"] > 30:
row["new column"] = "sweet_tooth"
if row["sugar"] < 30:
row["new column"] = "healthy"
return row
You can apply it as follows:
df = df.apply(generate_new_column, axis=1)
I try to write the functions above as simple as possible. But you can make it as complex as you need for the analysis by adding more conditions to it.