Use issubset to compare set values between two pandas dataframe columns
up vote
4
down vote
favorite
I have a pandas dataframe with two columns that are filled with pandas sets. I want to check that all values in one column are a subset of the other column. I thought the code below would work but it seems you cannot apply .issubset() to two series with sets.
Ex:
data = [[['one','orange','green'],['one','orange']],[['milk','honey'],['Clarke', 'honey']]]
df = pd.DataFrame(data, columns=['Column_1','Column_2'])
Are_all_column_2_values_valid = df.loc[:, 'Column_2'].apply(set).issubset(df.loc[:, 'Column_1'])
desired_output = pd.series([True,False])
All values in both sets will be strings.
Any help would greatly be appreciated!
python python-3.x pandas dataframe set
add a comment |
up vote
4
down vote
favorite
I have a pandas dataframe with two columns that are filled with pandas sets. I want to check that all values in one column are a subset of the other column. I thought the code below would work but it seems you cannot apply .issubset() to two series with sets.
Ex:
data = [[['one','orange','green'],['one','orange']],[['milk','honey'],['Clarke', 'honey']]]
df = pd.DataFrame(data, columns=['Column_1','Column_2'])
Are_all_column_2_values_valid = df.loc[:, 'Column_2'].apply(set).issubset(df.loc[:, 'Column_1'])
desired_output = pd.series([True,False])
All values in both sets will be strings.
Any help would greatly be appreciated!
python python-3.x pandas dataframe set
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I have a pandas dataframe with two columns that are filled with pandas sets. I want to check that all values in one column are a subset of the other column. I thought the code below would work but it seems you cannot apply .issubset() to two series with sets.
Ex:
data = [[['one','orange','green'],['one','orange']],[['milk','honey'],['Clarke', 'honey']]]
df = pd.DataFrame(data, columns=['Column_1','Column_2'])
Are_all_column_2_values_valid = df.loc[:, 'Column_2'].apply(set).issubset(df.loc[:, 'Column_1'])
desired_output = pd.series([True,False])
All values in both sets will be strings.
Any help would greatly be appreciated!
python python-3.x pandas dataframe set
I have a pandas dataframe with two columns that are filled with pandas sets. I want to check that all values in one column are a subset of the other column. I thought the code below would work but it seems you cannot apply .issubset() to two series with sets.
Ex:
data = [[['one','orange','green'],['one','orange']],[['milk','honey'],['Clarke', 'honey']]]
df = pd.DataFrame(data, columns=['Column_1','Column_2'])
Are_all_column_2_values_valid = df.loc[:, 'Column_2'].apply(set).issubset(df.loc[:, 'Column_1'])
desired_output = pd.series([True,False])
All values in both sets will be strings.
Any help would greatly be appreciated!
python python-3.x pandas dataframe set
python python-3.x pandas dataframe set
edited Nov 20 at 0:16
jpp
86.2k194898
86.2k194898
asked Nov 19 at 23:36
S M
212
212
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
2
down vote
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 at 18:24
add a comment |
up vote
2
down vote
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 at 18:24
add a comment |
up vote
2
down vote
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 at 18:24
add a comment |
up vote
2
down vote
up vote
2
down vote
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
edited Nov 20 at 0:28
answered Nov 20 at 0:15
jpp
86.2k194898
86.2k194898
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 at 18:24
add a comment |
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 at 18:24
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 at 18:24
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 at 18:24
add a comment |
up vote
2
down vote
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
add a comment |
up vote
2
down vote
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
add a comment |
up vote
2
down vote
up vote
2
down vote
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
edited Nov 20 at 14:50
answered Nov 19 at 23:46
sacul
28.5k41639
28.5k41639
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384190%2fuse-issubset-to-compare-set-values-between-two-pandas-dataframe-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown