pandas - filter on groups which have at least one column containing non-null values in a groupby

I have the following python pandas dataframe:

df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})



>>> print(df)

  Id      A    B    C

0  1   TRUE  NaN  NaN

1  1   TRUE  NaN  NaN

2  1   TRUE  abc  NaN

3  2   TRUE  NaN  NaN

4  2   TRUE  NaN  NaN

5  3  FALSE  def  456

I want to end up with the following dataframe:

>>> print(dfout)

  Id     A    B   C

0  1  TRUE  abc NaN

The same Id value can appear on multiple rows. Each Id will either have the value TRUE or FALSE in column A consistently on all its rows. Columns B and C can have any value, including NaN.

I want one row in dfout for each Id that has A=TRUE and show the max value seen in columns B and C. But if the only values seen in columns B and C = NaN for all of an Id's rows, then that Id is to be excluded from dfout.

Id 1 has A=TRUE, and has B=abc in its third row, so it meets
the requirements.

Id 2 has A=TRUE, but columns B and C are NaN for
both its rows, so it does not.

Id 3 has A=FALSE, so it does not
meet requirements.

I created a groupby df on Id, then applied a mask to only include rows with A=TRUE. But having trouble understanding how to remove the rows with NaN for all rows in columns B and C.

grouped = df.groupby(['Id'])

mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)

df.loc[mask].reset_index(drop=True)



  Id     A    B    C

0  1  TRUE  NaN  NaN

1  1  TRUE  NaN  NaN

2  1  TRUE  abc  NaN

3  2  TRUE  NaN  NaN

4  2  TRUE  NaN  NaN

Then I tried several things along the lines of:

df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull

But getting errors, like:

" TypeError: unhashable type: 'list' ".

Using python 3.6, pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by

edited Dec 7 '18 at 1:33

asked Nov 22 '18 at 20:25

BilboC

247

add a comment |

I have the following python pandas dataframe:

df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})



>>> print(df)

  Id      A    B    C

0  1   TRUE  NaN  NaN

1  1   TRUE  NaN  NaN

2  1   TRUE  abc  NaN

3  2   TRUE  NaN  NaN

4  2   TRUE  NaN  NaN

5  3  FALSE  def  456

I want to end up with the following dataframe:

>>> print(dfout)

  Id     A    B   C

0  1  TRUE  abc NaN

Id 1 has A=TRUE, and has B=abc in its third row, so it meets
the requirements.

Id 2 has A=TRUE, but columns B and C are NaN for
both its rows, so it does not.

Id 3 has A=FALSE, so it does not
meet requirements.

I created a groupby df on Id, then applied a mask to only include rows with A=TRUE. But having trouble understanding how to remove the rows with NaN for all rows in columns B and C.

grouped = df.groupby(['Id'])

mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)

df.loc[mask].reset_index(drop=True)



  Id     A    B    C

0  1  TRUE  NaN  NaN

1  1  TRUE  NaN  NaN

2  1  TRUE  abc  NaN

3  2  TRUE  NaN  NaN

4  2  TRUE  NaN  NaN

Then I tried several things along the lines of:

df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull

But getting errors, like:

" TypeError: unhashable type: 'list' ".

Using python 3.6, pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by

edited Dec 7 '18 at 1:33

asked Nov 22 '18 at 20:25

BilboC

247

add a comment |

I have the following python pandas dataframe:

df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})



>>> print(df)

  Id      A    B    C

0  1   TRUE  NaN  NaN

1  1   TRUE  NaN  NaN

2  1   TRUE  abc  NaN

3  2   TRUE  NaN  NaN

4  2   TRUE  NaN  NaN

5  3  FALSE  def  456

I want to end up with the following dataframe:

>>> print(dfout)

  Id     A    B   C

0  1  TRUE  abc NaN

Id 1 has A=TRUE, and has B=abc in its third row, so it meets
the requirements.

Id 2 has A=TRUE, but columns B and C are NaN for
both its rows, so it does not.

Id 3 has A=FALSE, so it does not
meet requirements.

I created a groupby df on Id, then applied a mask to only include rows with A=TRUE. But having trouble understanding how to remove the rows with NaN for all rows in columns B and C.

grouped = df.groupby(['Id'])

mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)

df.loc[mask].reset_index(drop=True)



  Id     A    B    C

0  1  TRUE  NaN  NaN

1  1  TRUE  NaN  NaN

2  1  TRUE  abc  NaN

3  2  TRUE  NaN  NaN

4  2  TRUE  NaN  NaN

Then I tried several things along the lines of:

df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull

But getting errors, like:

" TypeError: unhashable type: 'list' ".

Using python 3.6, pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by

edited Dec 7 '18 at 1:33

asked Nov 22 '18 at 20:25

BilboC

247

I have the following python pandas dataframe:

df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'], 'B': [np.nan, np.nan, 'abc', np.nan, np.nan, 'def'],'C': [np.nan, np.nan, np.nan, np.nan, np.nan, '456']})



>>> print(df)

  Id      A    B    C

0  1   TRUE  NaN  NaN

1  1   TRUE  NaN  NaN

2  1   TRUE  abc  NaN

3  2   TRUE  NaN  NaN

4  2   TRUE  NaN  NaN

5  3  FALSE  def  456

I want to end up with the following dataframe:

>>> print(dfout)

  Id     A    B   C

0  1  TRUE  abc NaN

Id 1 has A=TRUE, and has B=abc in its third row, so it meets
the requirements.

Id 2 has A=TRUE, but columns B and C are NaN for
both its rows, so it does not.

Id 3 has A=FALSE, so it does not
meet requirements.

I created a groupby df on Id, then applied a mask to only include rows with A=TRUE. But having trouble understanding how to remove the rows with NaN for all rows in columns B and C.

grouped = df.groupby(['Id'])

mask = grouped['A'].transform(lambda x: 'TRUE' == x.max()).astype(bool)

df.loc[mask].reset_index(drop=True)



  Id     A    B    C

0  1  TRUE  NaN  NaN

1  1  TRUE  NaN  NaN

2  1  TRUE  abc  NaN

3  2  TRUE  NaN  NaN

4  2  TRUE  NaN  NaN

Then I tried several things along the lines of:

df.loc[mask].reset_index(drop=True).all(['B'],['C']).isnull

But getting errors, like:

" TypeError: unhashable type: 'list' ".

Using python 3.6, pandas 0.23.0; Looked here for help: keep dataframe rows meeting a condition into each group of the same dataframe grouped by

python pandas filter

edited Dec 7 '18 at 1:33

asked Nov 22 '18 at 20:25

BilboC

247

edited Dec 7 '18 at 1:33

asked Nov 22 '18 at 20:25

BilboC

247

edited Dec 7 '18 at 1:33

asked Nov 22 '18 at 20:25

BilboC

247

asked Nov 22 '18 at 20:25

BilboC

247

asked Nov 22 '18 at 20:25

BilboC

247

add a comment |

1 Answer
1

active

oldest

votes

The solution has three parts to it.

Filter dataframe to keep rows where column A is True

Groupby Id and use first which will return first not null value

Use dropna on the resulting dataframe on columns B and C and how = all

df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
```
    Id  A       B   C

0   1   True    abc NaN
```

answered Nov 22 '18 at 20:42

Vaishali

19.2k41030

1

perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example

– BilboC
Nov 22 '18 at 21:01

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437636%2fpandas-filter-on-groups-which-have-at-least-one-column-containing-non-null-val%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The solution has three parts to it.

Filter dataframe to keep rows where column A is True

Groupby Id and use first which will return first not null value

Use dropna on the resulting dataframe on columns B and C and how = all

df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
```
    Id  A       B   C

0   1   True    abc NaN
```

answered Nov 22 '18 at 20:42

Vaishali

19.2k41030

1

perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example

– BilboC
Nov 22 '18 at 21:01

add a comment |

The solution has three parts to it.

Filter dataframe to keep rows where column A is True

Groupby Id and use first which will return first not null value

Use dropna on the resulting dataframe on columns B and C and how = all

df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
```
    Id  A       B   C

0   1   True    abc NaN
```

answered Nov 22 '18 at 20:42

Vaishali

19.2k41030

1

perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example

– BilboC
Nov 22 '18 at 21:01

add a comment |

The solution has three parts to it.

Filter dataframe to keep rows where column A is True

Groupby Id and use first which will return first not null value

Use dropna on the resulting dataframe on columns B and C and how = all

df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
```
    Id  A       B   C

0   1   True    abc NaN
```

answered Nov 22 '18 at 20:42

Vaishali

19.2k41030

The solution has three parts to it.

Filter dataframe to keep rows where column A is True

Groupby Id and use first which will return first not null value

Use dropna on the resulting dataframe on columns B and C and how = all

df.loc[df['A'] == True].groupby('Id', as_index = False).first().dropna(subset = ['B', 'C'], how = 'all')
```
    Id  A       B   C

0   1   True    abc NaN
```

answered Nov 22 '18 at 20:42

Vaishali

19.2k41030

answered Nov 22 '18 at 20:42

Vaishali

19.2k41030

answered Nov 22 '18 at 20:42

Vaishali

19.2k41030

answered Nov 22 '18 at 20:42

Vaishali

19.2k41030

1

perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example

– BilboC
Nov 22 '18 at 21:01

add a comment |

1

perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example

– BilboC
Nov 22 '18 at 21:01

perfect - very clean; i modified the bit to df['A'] == 'TRUE' for my example

– BilboC
Nov 22 '18 at 21:01

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk