Tag rows based on other columns values

I have a pandas dataframe:

street_name        eircode

Malborough Road    BLT12

123 Fake Road      NaN

My Street          NaN

I would like to create another column called unique based on these conditions:

If it has eircode, return 'yes' in the unique column, THEN

If it doesn't have an eircode, check the first string in the street_name:
- if the first string is a digit, return 'yes' in the unique column
- if it is not, return 'no' in the unique column

I came up with this solution where:

I changed the data types to string for both columns street_name and eircode

Get the first string using a lambda function

Defined a tagging function to be applied to the data frame

# change data types df['eircode'] = df['eircode'].astype('str') df['street_name'] = df['street_name'].astype('str')

# get the first string from street_name column df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])

def tagging(x):

if x['eircode'] != 'nan':

    return 'yes'

elif x['first_str'].isdigit() == True:

    return 'yes'

else:

    return 'no'



df['unique'] = df.apply(tagging, axis=1)

The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?

edited Nov 23 '18 at 16:37

jpp

92.8k2054103

asked Nov 21 '18 at 17:40

mahf_i

679

add a comment |

I have a pandas dataframe:

street_name        eircode

Malborough Road    BLT12

123 Fake Road      NaN

My Street          NaN

I would like to create another column called unique based on these conditions:

If it has eircode, return 'yes' in the unique column, THEN

If it doesn't have an eircode, check the first string in the street_name:
- if the first string is a digit, return 'yes' in the unique column
- if it is not, return 'no' in the unique column

I came up with this solution where:

I changed the data types to string for both columns street_name and eircode

Get the first string using a lambda function

Defined a tagging function to be applied to the data frame

# change data types df['eircode'] = df['eircode'].astype('str') df['street_name'] = df['street_name'].astype('str')

# get the first string from street_name column df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])

def tagging(x):

if x['eircode'] != 'nan':

    return 'yes'

elif x['first_str'].isdigit() == True:

    return 'yes'

else:

    return 'no'



df['unique'] = df.apply(tagging, axis=1)

The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?

edited Nov 23 '18 at 16:37

jpp

92.8k2054103

asked Nov 21 '18 at 17:40

mahf_i

679

add a comment |

I have a pandas dataframe:

street_name        eircode

Malborough Road    BLT12

123 Fake Road      NaN

My Street          NaN

I would like to create another column called unique based on these conditions:

If it has eircode, return 'yes' in the unique column, THEN

If it doesn't have an eircode, check the first string in the street_name:
- if the first string is a digit, return 'yes' in the unique column
- if it is not, return 'no' in the unique column

I came up with this solution where:

I changed the data types to string for both columns street_name and eircode

Get the first string using a lambda function

Defined a tagging function to be applied to the data frame

# change data types df['eircode'] = df['eircode'].astype('str') df['street_name'] = df['street_name'].astype('str')

# get the first string from street_name column df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])

def tagging(x):

if x['eircode'] != 'nan':

    return 'yes'

elif x['first_str'].isdigit() == True:

    return 'yes'

else:

    return 'no'



df['unique'] = df.apply(tagging, axis=1)

The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?

edited Nov 23 '18 at 16:37

jpp

92.8k2054103

asked Nov 21 '18 at 17:40

mahf_i

679

I have a pandas dataframe:

street_name        eircode

Malborough Road    BLT12

123 Fake Road      NaN

My Street          NaN

I would like to create another column called unique based on these conditions:

If it has eircode, return 'yes' in the unique column, THEN

If it doesn't have an eircode, check the first string in the street_name:
- if the first string is a digit, return 'yes' in the unique column
- if it is not, return 'no' in the unique column

I came up with this solution where:

I changed the data types to string for both columns street_name and eircode

Get the first string using a lambda function

Defined a tagging function to be applied to the data frame

# change data types df['eircode'] = df['eircode'].astype('str') df['street_name'] = df['street_name'].astype('str')

# get the first string from street_name column df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])

def tagging(x):

if x['eircode'] != 'nan':

    return 'yes'

elif x['first_str'].isdigit() == True:

    return 'yes'

else:

    return 'no'



df['unique'] = df.apply(tagging, axis=1)

The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?

python string pandas

edited Nov 23 '18 at 16:37

jpp

92.8k2054103

asked Nov 21 '18 at 17:40

mahf_i

679

edited Nov 23 '18 at 16:37

jpp

92.8k2054103

asked Nov 21 '18 at 17:40

mahf_i

679

edited Nov 23 '18 at 16:37

jpp

92.8k2054103

edited Nov 23 '18 at 16:37

jpp

92.8k2054103

edited Nov 23 '18 at 16:37

jpp

92.8k2054103

asked Nov 21 '18 at 17:40

mahf_i

679

asked Nov 21 '18 at 17:40

mahf_i

679

asked Nov 21 '18 at 17:40

mahf_i

679

add a comment |

2 Answers
2

active

oldest

votes

You can supply those separate conditions using the | operator, then map the resulting Boolean array to yes and no. The first condition just looks wether eircode is null, and the second uses a regex to check that street_name starts with a digit:

df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})

>>> df

       street_name eircode unique

0  Malborough Road   BLT12    yes

1    123 Fake Road     NaN    yes

2        My Street     NaN     no

answered Nov 21 '18 at 17:45

sacul

29.9k41740

Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40

add a comment |

With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.

df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],

                   'eircode': ['BLT12', None, None]})



cond1 = df['eircode'].isnull()

cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()



df['unique'] = np.where(cond1 & cond2, 'no', 'yes')



print(df)



  eircode      street_name unique

0   BLT12  Malborough Road    yes

1    None    123 Fake Road    yes

2    None        My Street     no

answered Nov 21 '18 at 17:45

jpp

92.8k2054103

1

code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417777%2ftag-rows-based-on-other-columns-values%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})

>>> df

       street_name eircode unique

0  Malborough Road   BLT12    yes

1    123 Fake Road     NaN    yes

2        My Street     NaN     no

answered Nov 21 '18 at 17:45

sacul

29.9k41740

Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40

add a comment |

df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})

>>> df

       street_name eircode unique

0  Malborough Road   BLT12    yes

1    123 Fake Road     NaN    yes

2        My Street     NaN     no

answered Nov 21 '18 at 17:45

sacul

29.9k41740

Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40

add a comment |

df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})

>>> df

       street_name eircode unique

0  Malborough Road   BLT12    yes

1    123 Fake Road     NaN    yes

2        My Street     NaN     no

answered Nov 21 '18 at 17:45

sacul

29.9k41740

df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})

>>> df

       street_name eircode unique

0  Malborough Road   BLT12    yes

1    123 Fake Road     NaN    yes

2        My Street     NaN     no

answered Nov 21 '18 at 17:45

sacul

29.9k41740

answered Nov 21 '18 at 17:45

sacul

29.9k41740

answered Nov 21 '18 at 17:45

sacul

29.9k41740

answered Nov 21 '18 at 17:45

sacul

29.9k41740

Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40

add a comment |

Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40

Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40

add a comment |

With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.

df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],

                   'eircode': ['BLT12', None, None]})



cond1 = df['eircode'].isnull()

cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()



df['unique'] = np.where(cond1 & cond2, 'no', 'yes')



print(df)



  eircode      street_name unique

0   BLT12  Malborough Road    yes

1    None    123 Fake Road    yes

2    None        My Street     no

answered Nov 21 '18 at 17:45

jpp

92.8k2054103

1

code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35

add a comment |

With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.

df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],

                   'eircode': ['BLT12', None, None]})



cond1 = df['eircode'].isnull()

cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()



df['unique'] = np.where(cond1 & cond2, 'no', 'yes')



print(df)



  eircode      street_name unique

0   BLT12  Malborough Road    yes

1    None    123 Fake Road    yes

2    None        My Street     no

answered Nov 21 '18 at 17:45

jpp

92.8k2054103

1

code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35

add a comment |

With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.

df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],

                   'eircode': ['BLT12', None, None]})



cond1 = df['eircode'].isnull()

cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()



df['unique'] = np.where(cond1 & cond2, 'no', 'yes')



print(df)



  eircode      street_name unique

0   BLT12  Malborough Road    yes

1    None    123 Fake Road    yes

2    None        My Street     no

answered Nov 21 '18 at 17:45

jpp

92.8k2054103

With Pandas, it's best to use column-wise calculations; apply with a custom function represents an inefficient, Python-level row-wise loop.

df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],

                   'eircode': ['BLT12', None, None]})



cond1 = df['eircode'].isnull()

cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()



df['unique'] = np.where(cond1 & cond2, 'no', 'yes')



print(df)



  eircode      street_name unique

0   BLT12  Malborough Road    yes

1    None    123 Fake Road    yes

2    None        My Street     no

answered Nov 21 '18 at 17:45

jpp

92.8k2054103

answered Nov 21 '18 at 17:45

jpp

92.8k2054103

answered Nov 21 '18 at 17:45

jpp

92.8k2054103

answered Nov 21 '18 at 17:45

jpp

92.8k2054103

1

code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35

add a comment |

1

code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35

code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

4DEiARF Rwjtx7WhUym,dMN EgAoCn 3mA,HI9yauIe08o1 aopLAHVtmzbgDmu2V22iB,b7aHgS2BgGZWQ9,7S12QuTz

搜尋此網誌

Tukukkk