Tag rows based on other columns values
I have a pandas dataframe:
street_name eircode
Malborough Road BLT12
123 Fake Road NaN
My Street NaN
I would like to create another column called unique based on these conditions:
- If it has eircode, return 'yes' in the unique column, THEN
- If it doesn't have an eircode, check the first string in the street_name:
- if the first string is a digit, return 'yes' in the unique column
- if it is not, return 'no' in the unique column
I came up with this solution where:
- I changed the data types to string for both columns street_name and eircode
- Get the first string using a lambda function
- Defined a tagging function to be applied to the data frame
# change data types
df['eircode'] = df['eircode'].astype('str')
df['street_name'] = df['street_name'].astype('str')
# get the first string from street_name column
df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])
def tagging(x):
if x['eircode'] != 'nan':
return 'yes'
elif x['first_str'].isdigit() == True:
return 'yes'
else:
return 'no'
df['unique'] = df.apply(tagging, axis=1)
The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?
python string pandas
add a comment |
I have a pandas dataframe:
street_name eircode
Malborough Road BLT12
123 Fake Road NaN
My Street NaN
I would like to create another column called unique based on these conditions:
- If it has eircode, return 'yes' in the unique column, THEN
- If it doesn't have an eircode, check the first string in the street_name:
- if the first string is a digit, return 'yes' in the unique column
- if it is not, return 'no' in the unique column
I came up with this solution where:
- I changed the data types to string for both columns street_name and eircode
- Get the first string using a lambda function
- Defined a tagging function to be applied to the data frame
# change data types
df['eircode'] = df['eircode'].astype('str')
df['street_name'] = df['street_name'].astype('str')
# get the first string from street_name column
df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])
def tagging(x):
if x['eircode'] != 'nan':
return 'yes'
elif x['first_str'].isdigit() == True:
return 'yes'
else:
return 'no'
df['unique'] = df.apply(tagging, axis=1)
The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?
python string pandas
add a comment |
I have a pandas dataframe:
street_name eircode
Malborough Road BLT12
123 Fake Road NaN
My Street NaN
I would like to create another column called unique based on these conditions:
- If it has eircode, return 'yes' in the unique column, THEN
- If it doesn't have an eircode, check the first string in the street_name:
- if the first string is a digit, return 'yes' in the unique column
- if it is not, return 'no' in the unique column
I came up with this solution where:
- I changed the data types to string for both columns street_name and eircode
- Get the first string using a lambda function
- Defined a tagging function to be applied to the data frame
# change data types
df['eircode'] = df['eircode'].astype('str')
df['street_name'] = df['street_name'].astype('str')
# get the first string from street_name column
df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])
def tagging(x):
if x['eircode'] != 'nan':
return 'yes'
elif x['first_str'].isdigit() == True:
return 'yes'
else:
return 'no'
df['unique'] = df.apply(tagging, axis=1)
The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?
python string pandas
I have a pandas dataframe:
street_name eircode
Malborough Road BLT12
123 Fake Road NaN
My Street NaN
I would like to create another column called unique based on these conditions:
- If it has eircode, return 'yes' in the unique column, THEN
- If it doesn't have an eircode, check the first string in the street_name:
- if the first string is a digit, return 'yes' in the unique column
- if it is not, return 'no' in the unique column
I came up with this solution where:
- I changed the data types to string for both columns street_name and eircode
- Get the first string using a lambda function
- Defined a tagging function to be applied to the data frame
# change data types
df['eircode'] = df['eircode'].astype('str')
df['street_name'] = df['street_name'].astype('str')
# get the first string from street_name column
df['first_str'] = df['street_name'].apply(lambda x: x.split()[0])
def tagging(x):
if x['eircode'] != 'nan':
return 'yes'
elif x['first_str'].isdigit() == True:
return 'yes'
else:
return 'no'
df['unique'] = df.apply(tagging, axis=1)
The issue with this is that I have to change the data type and then have to make separate column. Is there a more elegant way or a more concise way to achieve the same result?
python string pandas
python string pandas
edited Nov 23 '18 at 16:37
jpp
92.8k2054103
92.8k2054103
asked Nov 21 '18 at 17:40
mahf_i
679
679
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
You can supply those separate conditions using the |
operator, then map the resulting Boolean array to yes
and no
. The first condition just looks wether eircode
is null, and the second uses a regex to check that street_name
starts with a digit:
df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
>>> df
street_name eircode unique
0 Malborough Road BLT12 yes
1 123 Fake Road NaN yes
2 My Street NaN no
Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40
add a comment |
With Pandas, it's best to use column-wise calculations; apply
with a custom function represents an inefficient, Python-level row-wise loop.
df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
'eircode': ['BLT12', None, None]})
cond1 = df['eircode'].isnull()
cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()
df['unique'] = np.where(cond1 & cond2, 'no', 'yes')
print(df)
eircode street_name unique
0 BLT12 Malborough Road yes
1 None 123 Fake Road yes
2 None My Street no
1
code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417777%2ftag-rows-based-on-other-columns-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can supply those separate conditions using the |
operator, then map the resulting Boolean array to yes
and no
. The first condition just looks wether eircode
is null, and the second uses a regex to check that street_name
starts with a digit:
df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
>>> df
street_name eircode unique
0 Malborough Road BLT12 yes
1 123 Fake Road NaN yes
2 My Street NaN no
Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40
add a comment |
You can supply those separate conditions using the |
operator, then map the resulting Boolean array to yes
and no
. The first condition just looks wether eircode
is null, and the second uses a regex to check that street_name
starts with a digit:
df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
>>> df
street_name eircode unique
0 Malborough Road BLT12 yes
1 123 Fake Road NaN yes
2 My Street NaN no
Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40
add a comment |
You can supply those separate conditions using the |
operator, then map the resulting Boolean array to yes
and no
. The first condition just looks wether eircode
is null, and the second uses a regex to check that street_name
starts with a digit:
df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
>>> df
street_name eircode unique
0 Malborough Road BLT12 yes
1 123 Fake Road NaN yes
2 My Street NaN no
You can supply those separate conditions using the |
operator, then map the resulting Boolean array to yes
and no
. The first condition just looks wether eircode
is null, and the second uses a regex to check that street_name
starts with a digit:
df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'})
>>> df
street_name eircode unique
0 Malborough Road BLT12 yes
1 123 Fake Road NaN yes
2 My Street NaN no
answered Nov 21 '18 at 17:45
sacul
29.9k41740
29.9k41740
Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40
add a comment |
Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40
Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40
Accepted this as an answer because the use of regex will evaluate any numbers and won't throw an error even if the number is a float type.
– mahf_i
Nov 23 '18 at 16:40
add a comment |
With Pandas, it's best to use column-wise calculations; apply
with a custom function represents an inefficient, Python-level row-wise loop.
df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
'eircode': ['BLT12', None, None]})
cond1 = df['eircode'].isnull()
cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()
df['unique'] = np.where(cond1 & cond2, 'no', 'yes')
print(df)
eircode street_name unique
0 BLT12 Malborough Road yes
1 None 123 Fake Road yes
2 None My Street no
1
code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35
add a comment |
With Pandas, it's best to use column-wise calculations; apply
with a custom function represents an inefficient, Python-level row-wise loop.
df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
'eircode': ['BLT12', None, None]})
cond1 = df['eircode'].isnull()
cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()
df['unique'] = np.where(cond1 & cond2, 'no', 'yes')
print(df)
eircode street_name unique
0 BLT12 Malborough Road yes
1 None 123 Fake Road yes
2 None My Street no
1
code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35
add a comment |
With Pandas, it's best to use column-wise calculations; apply
with a custom function represents an inefficient, Python-level row-wise loop.
df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
'eircode': ['BLT12', None, None]})
cond1 = df['eircode'].isnull()
cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()
df['unique'] = np.where(cond1 & cond2, 'no', 'yes')
print(df)
eircode street_name unique
0 BLT12 Malborough Road yes
1 None 123 Fake Road yes
2 None My Street no
With Pandas, it's best to use column-wise calculations; apply
with a custom function represents an inefficient, Python-level row-wise loop.
df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'],
'eircode': ['BLT12', None, None]})
cond1 = df['eircode'].isnull()
cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit()
df['unique'] = np.where(cond1 & cond2, 'no', 'yes')
print(df)
eircode street_name unique
0 BLT12 Malborough Road yes
1 None 123 Fake Road yes
2 None My Street no
answered Nov 21 '18 at 17:45
jpp
92.8k2054103
92.8k2054103
1
code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35
add a comment |
1
code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35
1
1
code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35
code for cond2 is excellent as I was trying to do the same (Selecting the first word after the split) but did not know the correct way to do it. great use of np.where which I've never used before. one issue I encountered using this solution is that when the number extracted is a float it will return an error.
– mahf_i
Nov 23 '18 at 16:35
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417777%2ftag-rows-based-on-other-columns-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown