pandas - filter on groups which have at least one column containing non-null values in a groupby
I have the following python pandas dataframe:
df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})
>>> print(df)
Id A B C
0 1 TRUE NaN NaN
1 1 TRUE NaN NaN
2 1 TRUE abc NaN
3 2 TRUE NaN NaN
4 2 TRUE NaN NaN
5 3 FALSE def 456
I want to end up with the following dataframe:
>>> print(dfout)
Id A B C
0 1 TRUE abc NaN
The same Id value can appear on multiple rows. Each Id will either have the value TRUE or FALSE in column A consistently on all its rows. Columns B and C can have any value, including NaN.
I want one row in dfout for each Id that has A=TRUE and show the max value seen in columns B and C. But if the only values seen in columns B and C = NaN for all of an Id's rows, then that Id is to be excluded from dfout.
- Id 1 has
A=TRUE
, and hasB=abc
in its third row, so it meets
the requirements. - Id 2 has
A=TRUE
, but columns B and C areNaN
for
both its rows, so it does not. - Id 3 has
A=FALSE
, so it does not
meet requirements.
I created a groupby
df on Id, then applied a mask to only include rows with A=TRUE. But having trouble understanding how to remove the rows with NaN
for all rows in columns B and C.
grouped = df.groupby(['Id'])
mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)
Id A B C
0 1 TRUE NaN NaN
1 1 TRUE NaN NaN
2 1 TRUE abc NaN
3 2 TRUE NaN NaN
4 2 TRUE NaN NaN
Then I tried several things along the lines of:
df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull
But getting errors, like:
" TypeError: unhashable type: 'list' ".
Using python 3.6, pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by
python pandas filter
add a comment |
I have the following python pandas dataframe:
df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})
>>> print(df)
Id A B C
0 1 TRUE NaN NaN
1 1 TRUE NaN NaN
2 1 TRUE abc NaN
3 2 TRUE NaN NaN
4 2 TRUE NaN NaN
5 3 FALSE def 456
I want to end up with the following dataframe:
>>> print(dfout)
Id A B C
0 1 TRUE abc NaN
The same Id value can appear on multiple rows. Each Id will either have the value TRUE or FALSE in column A consistently on all its rows. Columns B and C can have any value, including NaN.
I want one row in dfout for each Id that has A=TRUE and show the max value seen in columns B and C. But if the only values seen in columns B and C = NaN for all of an Id's rows, then that Id is to be excluded from dfout.
- Id 1 has
A=TRUE
, and hasB=abc
in its third row, so it meets
the requirements. - Id 2 has
A=TRUE
, but columns B and C areNaN
for
both its rows, so it does not. - Id 3 has
A=FALSE
, so it does not
meet requirements.
I created a groupby
df on Id, then applied a mask to only include rows with A=TRUE. But having trouble understanding how to remove the rows with NaN
for all rows in columns B and C.
grouped = df.groupby(['Id'])
mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)
Id A B C
0 1 TRUE NaN NaN
1 1 TRUE NaN NaN
2 1 TRUE abc NaN
3 2 TRUE NaN NaN
4 2 TRUE NaN NaN
Then I tried several things along the lines of:
df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull
But getting errors, like:
" TypeError: unhashable type: 'list' ".
Using python 3.6, pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by
python pandas filter
add a comment |
I have the following python pandas dataframe:
df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})
>>> print(df)
Id A B C
0 1 TRUE NaN NaN
1 1 TRUE NaN NaN
2 1 TRUE abc NaN
3 2 TRUE NaN NaN
4 2 TRUE NaN NaN
5 3 FALSE def 456
I want to end up with the following dataframe:
>>> print(dfout)
Id A B C
0 1 TRUE abc NaN
The same Id value can appear on multiple rows. Each Id will either have the value TRUE or FALSE in column A consistently on all its rows. Columns B and C can have any value, including NaN.
I want one row in dfout for each Id that has A=TRUE and show the max value seen in columns B and C. But if the only values seen in columns B and C = NaN for all of an Id's rows, then that Id is to be excluded from dfout.
- Id 1 has
A=TRUE
, and hasB=abc
in its third row, so it meets
the requirements. - Id 2 has
A=TRUE
, but columns B and C areNaN
for
both its rows, so it does not. - Id 3 has
A=FALSE
, so it does not
meet requirements.
I created a groupby
df on Id, then applied a mask to only include rows with A=TRUE. But having trouble understanding how to remove the rows with NaN
for all rows in columns B and C.
grouped = df.groupby(['Id'])
mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)
Id A B C
0 1 TRUE NaN NaN
1 1 TRUE NaN NaN
2 1 TRUE abc NaN
3 2 TRUE NaN NaN
4 2 TRUE NaN NaN
Then I tried several things along the lines of:
df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull
But getting errors, like:
" TypeError: unhashable type: 'list' ".
Using python 3.6, pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by
python pandas filter
I have the following python pandas dataframe:
df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})
>>> print(df)
Id A B C
0 1 TRUE NaN NaN
1 1 TRUE NaN NaN
2 1 TRUE abc NaN
3 2 TRUE NaN NaN
4 2 TRUE NaN NaN
5 3 FALSE def 456
I want to end up with the following dataframe:
>>> print(dfout)
Id A B C
0 1 TRUE abc NaN
The same Id value can appear on multiple rows. Each Id will either have the value TRUE or FALSE in column A consistently on all its rows. Columns B and C can have any value, including NaN.
I want one row in dfout for each Id that has A=TRUE and show the max value seen in columns B and C. But if the only values seen in columns B and C = NaN for all of an Id's rows, then that Id is to be excluded from dfout.
- Id 1 has
A=TRUE
, and hasB=abc
in its third row, so it meets
the requirements. - Id 2 has
A=TRUE
, but columns B and C areNaN
for
both its rows, so it does not. - Id 3 has
A=FALSE
, so it does not
meet requirements.
I created a groupby
df on Id, then applied a mask to only include rows with A=TRUE. But having trouble understanding how to remove the rows with NaN
for all rows in columns B and C.
grouped = df.groupby(['Id'])
mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)
Id A B C
0 1 TRUE NaN NaN
1 1 TRUE NaN NaN
2 1 TRUE abc NaN
3 2 TRUE NaN NaN
4 2 TRUE NaN NaN
Then I tried several things along the lines of:
df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull
But getting errors, like:
" TypeError: unhashable type: 'list' ".
Using python 3.6, pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by
python pandas filter
python pandas filter
edited Dec 7 '18 at 1:33
BilboC
asked Nov 22 '18 at 20:25
BilboCBilboC
247
247
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The solution has three parts to it.
- Filter dataframe to keep rows where column A is True
- Groupby Id and use first which will return first not null value
Use dropna on the resulting dataframe on columns B and C and how = all
df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
Id A B C
0 1 True abc NaN
1
perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example
– BilboC
Nov 22 '18 at 21:01
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437636%2fpandas-filter-on-groups-which-have-at-least-one-column-containing-non-null-val%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The solution has three parts to it.
- Filter dataframe to keep rows where column A is True
- Groupby Id and use first which will return first not null value
Use dropna on the resulting dataframe on columns B and C and how = all
df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
Id A B C
0 1 True abc NaN
1
perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example
– BilboC
Nov 22 '18 at 21:01
add a comment |
The solution has three parts to it.
- Filter dataframe to keep rows where column A is True
- Groupby Id and use first which will return first not null value
Use dropna on the resulting dataframe on columns B and C and how = all
df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
Id A B C
0 1 True abc NaN
1
perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example
– BilboC
Nov 22 '18 at 21:01
add a comment |
The solution has three parts to it.
- Filter dataframe to keep rows where column A is True
- Groupby Id and use first which will return first not null value
Use dropna on the resulting dataframe on columns B and C and how = all
df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
Id A B C
0 1 True abc NaN
The solution has three parts to it.
- Filter dataframe to keep rows where column A is True
- Groupby Id and use first which will return first not null value
Use dropna on the resulting dataframe on columns B and C and how = all
df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
Id A B C
0 1 True abc NaN
answered Nov 22 '18 at 20:42
VaishaliVaishali
19.2k41030
19.2k41030
1
perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example
– BilboC
Nov 22 '18 at 21:01
add a comment |
1
perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example
– BilboC
Nov 22 '18 at 21:01
1
1
perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example
– BilboC
Nov 22 '18 at 21:01
perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example
– BilboC
Nov 22 '18 at 21:01
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437636%2fpandas-filter-on-groups-which-have-at-least-one-column-containing-non-null-val%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown