Recode data in Python

Lilian Li
1 min readMar 2, 2022

There are various situations depending on the type of data. We will walk through two of them.

Situation A: I have a dataset with favourite music genre = [rock, jazz, pop, jazz, folk, metal, funky, blues, soul, rock, pop].

For some really advanced and scientific reasons, I want to recode them to two groups, 1 — quite loud [pop, rock, metal, and funky] and all other geners as 0 — not quite loud.

The following function checks if the genre is defined as loud, if so it returns 1, if not, a 0.

def check_category(source):
list = [pop, rock, metal, and funky]
if source in list:
return 1
return 0

You can apply the function as follows and make a new variable called loudness:

df["loudness"] = df["genre"].apply(check_category)

Situation B: I want to create a new column that registers observations with sugar intake > 30g a day as “sweet-tooth” and observations with sugar intake < 30g a day as “healthy” (again, for really scientific reasons).

The following function should do it:

def generate_new_column(row):
if row["sugar"] > 30:
row["new column"] = "sweet_tooth"
if row["sugar"] < 30:
row["new column"] = "healthy"
return row

You can apply it as follows:

df = df.apply(generate_new_column, axis=1)

I try to write the functions above as simple as possible. But you can make it as complex as you need for the analysis by adding more conditions to it.

--

--