Recode data in Python

1 min readMar 2, 2022

There are various situations depending on the type of data. We will walk through two of them.

Situation A: I have a dataset with favourite music genre = [rock, jazz, pop, jazz, folk, metal, funky, blues, soul, rock, pop].

For some really advanced and scientific reasons, I want to recode them to two groups, 1 — quite loud [pop, rock, metal, and funky] and all other geners as 0 — not quite loud.

The following function checks if the genre is defined as loud, if so it returns 1, if not, a 0.

def check_category(source):
    list = [pop, rock, metal, and funky]
    if source in list:
        return 1
    return 0

You can apply the function as follows and make a new variable called loudness:

df["loudness"] = df["genre"].apply(check_category)

Situation B: I want to create a new column that registers observations with sugar intake > 30g a day as “sweet-tooth” and observations with sugar intake < 30g a day as “healthy” (again, for really scientific reasons).

The following function should do it:

def generate_new_column(row):
    if row["sugar"] > 30:
        row["new column"] = "sweet_tooth"
    if row["sugar"] < 30:
        row["new column"] = "healthy"
    return row

You can apply it as follows:

df = df.apply(generate_new_column, axis=1)

I try to write the functions above as simple as possible. But you can make it as complex as you need for the analysis by adding more conditions to it.

Recode data in Python

Written by Lilian Li