To split a column in your data frame is necessary when multiple variable values are contained in a single column. Data usually does not come all tidy like we want it. In cases where a single column provides multiple features, splitting a column is a must.
Suppose you have column in your data with headings:
The column ‘m014’, for example, represents the number of males in the 0-14 age group.
The first step could be to melt the data. The objective is to have only two distinct columns for gender and age group. If you recall from the post on melting data, the ‘country’ and ‘year’ columns are kept by making them id_vars.
df_melt = pd.melt(df, id_vars=[‘country’, ‘year’])
This melt method converts the ‘m014’ like columns to rows. A new heading of ‘variable’ holds the gender / age-group, and a new heading of ‘value’ holds the number of people in a certain age-group.
Use the python slice string method .str to slice the gender / age-group variables like so:
df_melt[‘gender’] = df_melt.variable.str
# Create the ‘age_group’ column
df_melt[‘age_group’] = df_melt.variable.str[1:]
Thus, two more columns created with the headings of ‘gender’ and ‘age group’. Print the head to see the results.