Multidimensional grouper for a groupby
How could I use a multidimensional Grouper, in this case another dataframe, as a Grouper for another dataframe? Can it be done in one step?
My question is essentially regarding how to perform an actual grouping under these circumstances, but to make it more specific, say I want to then transform
and take the sum
.
Consider for example:
df1 = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]})
print(df1)
a b
0 1 5
1 2 6
2 3 7
3 4 8
df2 = pd.DataFrame({'a':['A','B','A','B'], 'b':['A','A','B','B']})
print(df2)
a b
0 A A
1 B A
2 A B
3 B B
Then, the expected output would be:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Where columns a
and b
in df1
have been grouped by columns a
and b
from df2
respectively.
python pandas
add a comment |
How could I use a multidimensional Grouper, in this case another dataframe, as a Grouper for another dataframe? Can it be done in one step?
My question is essentially regarding how to perform an actual grouping under these circumstances, but to make it more specific, say I want to then transform
and take the sum
.
Consider for example:
df1 = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]})
print(df1)
a b
0 1 5
1 2 6
2 3 7
3 4 8
df2 = pd.DataFrame({'a':['A','B','A','B'], 'b':['A','A','B','B']})
print(df2)
a b
0 A A
1 B A
2 A B
3 B B
Then, the expected output would be:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Where columns a
and b
in df1
have been grouped by columns a
and b
from df2
respectively.
python pandas
1
can you elaborate on the desired output? not clear what the rule is
– Yuca
2 hours ago
Sure, added a brief explanation. Let me know if still not clear
– yatu
2 hours ago
What do you group by? Your output has the same number of rows and columns as the input.
– Zoe
2 hours ago
So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?
– Yuca
2 hours ago
Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values
– yatu
2 hours ago
add a comment |
How could I use a multidimensional Grouper, in this case another dataframe, as a Grouper for another dataframe? Can it be done in one step?
My question is essentially regarding how to perform an actual grouping under these circumstances, but to make it more specific, say I want to then transform
and take the sum
.
Consider for example:
df1 = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]})
print(df1)
a b
0 1 5
1 2 6
2 3 7
3 4 8
df2 = pd.DataFrame({'a':['A','B','A','B'], 'b':['A','A','B','B']})
print(df2)
a b
0 A A
1 B A
2 A B
3 B B
Then, the expected output would be:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Where columns a
and b
in df1
have been grouped by columns a
and b
from df2
respectively.
python pandas
How could I use a multidimensional Grouper, in this case another dataframe, as a Grouper for another dataframe? Can it be done in one step?
My question is essentially regarding how to perform an actual grouping under these circumstances, but to make it more specific, say I want to then transform
and take the sum
.
Consider for example:
df1 = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]})
print(df1)
a b
0 1 5
1 2 6
2 3 7
3 4 8
df2 = pd.DataFrame({'a':['A','B','A','B'], 'b':['A','A','B','B']})
print(df2)
a b
0 A A
1 B A
2 A B
3 B B
Then, the expected output would be:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Where columns a
and b
in df1
have been grouped by columns a
and b
from df2
respectively.
python pandas
python pandas
edited 2 hours ago
yatu
asked 2 hours ago
yatuyatu
6,6321825
6,6321825
1
can you elaborate on the desired output? not clear what the rule is
– Yuca
2 hours ago
Sure, added a brief explanation. Let me know if still not clear
– yatu
2 hours ago
What do you group by? Your output has the same number of rows and columns as the input.
– Zoe
2 hours ago
So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?
– Yuca
2 hours ago
Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values
– yatu
2 hours ago
add a comment |
1
can you elaborate on the desired output? not clear what the rule is
– Yuca
2 hours ago
Sure, added a brief explanation. Let me know if still not clear
– yatu
2 hours ago
What do you group by? Your output has the same number of rows and columns as the input.
– Zoe
2 hours ago
So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?
– Yuca
2 hours ago
Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values
– yatu
2 hours ago
1
1
can you elaborate on the desired output? not clear what the rule is
– Yuca
2 hours ago
can you elaborate on the desired output? not clear what the rule is
– Yuca
2 hours ago
Sure, added a brief explanation. Let me know if still not clear
– yatu
2 hours ago
Sure, added a brief explanation. Let me know if still not clear
– yatu
2 hours ago
What do you group by? Your output has the same number of rows and columns as the input.
– Zoe
2 hours ago
What do you group by? Your output has the same number of rows and columns as the input.
– Zoe
2 hours ago
So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?
– Yuca
2 hours ago
So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?
– Yuca
2 hours ago
Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values
– yatu
2 hours ago
Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values
– yatu
2 hours ago
add a comment |
4 Answers
4
active
oldest
votes
Try using apply
to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
Output:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case
– yatu
2 hours ago
No, I don't think you can apply two different groupings to a dataframe based on a column.
– Scott Boston
1 hour ago
1
Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept
– yatu
1 hour ago
add a comment |
Using stack
and unstack
df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
Out[291]:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Thanks @W-B interesting approach!!
– yatu
55 mins ago
add a comment |
You will have to group each column individually since each column uses a different grouping scheme.
If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat
on the resultant series:
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
a b
0 4 11
1 6 11
2 4 15
3 6 15
Not to say there's anything wrong with using apply
as in the other answer, just that I don't like apply
, so this is my suggestion :-)
Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.
%%timeit
(df1.stack()
.groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
.transform('sum').unstack())
%%timeit
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
%%timeit
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Not to say apply
is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.
Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.
– Scott Boston
1 hour ago
1
@ScottBoston I have already upvoted your answer for its simplicity B)
– coldspeed
1 hour ago
Thanks!! Yes, using a list comprehension wiithpd.concat
was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)
– yatu
59 mins ago
add a comment |
You could do something like the following:
res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
.assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))
Results:
a b
0 4 11
1 6 11
2 4 15
3 6 15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54202615%2fmultidimensional-grouper-for-a-groupby%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try using apply
to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
Output:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case
– yatu
2 hours ago
No, I don't think you can apply two different groupings to a dataframe based on a column.
– Scott Boston
1 hour ago
1
Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept
– yatu
1 hour ago
add a comment |
Try using apply
to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
Output:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case
– yatu
2 hours ago
No, I don't think you can apply two different groupings to a dataframe based on a column.
– Scott Boston
1 hour ago
1
Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept
– yatu
1 hour ago
add a comment |
Try using apply
to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
Output:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Try using apply
to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
Output:
a b
0 4 11
1 6 11
2 4 15
3 6 15
edited 2 hours ago
answered 2 hours ago
Scott BostonScott Boston
52.7k72955
52.7k72955
Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case
– yatu
2 hours ago
No, I don't think you can apply two different groupings to a dataframe based on a column.
– Scott Boston
1 hour ago
1
Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept
– yatu
1 hour ago
add a comment |
Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case
– yatu
2 hours ago
No, I don't think you can apply two different groupings to a dataframe based on a column.
– Scott Boston
1 hour ago
1
Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept
– yatu
1 hour ago
Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case
– yatu
2 hours ago
Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case
– yatu
2 hours ago
No, I don't think you can apply two different groupings to a dataframe based on a column.
– Scott Boston
1 hour ago
No, I don't think you can apply two different groupings to a dataframe based on a column.
– Scott Boston
1 hour ago
1
1
Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept
– yatu
1 hour ago
Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept
– yatu
1 hour ago
add a comment |
Using stack
and unstack
df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
Out[291]:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Thanks @W-B interesting approach!!
– yatu
55 mins ago
add a comment |
Using stack
and unstack
df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
Out[291]:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Thanks @W-B interesting approach!!
– yatu
55 mins ago
add a comment |
Using stack
and unstack
df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
Out[291]:
a b
0 4 11
1 6 11
2 4 15
3 6 15
Using stack
and unstack
df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
Out[291]:
a b
0 4 11
1 6 11
2 4 15
3 6 15
answered 1 hour ago
W-BW-B
104k73165
104k73165
Thanks @W-B interesting approach!!
– yatu
55 mins ago
add a comment |
Thanks @W-B interesting approach!!
– yatu
55 mins ago
Thanks @W-B interesting approach!!
– yatu
55 mins ago
Thanks @W-B interesting approach!!
– yatu
55 mins ago
add a comment |
You will have to group each column individually since each column uses a different grouping scheme.
If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat
on the resultant series:
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
a b
0 4 11
1 6 11
2 4 15
3 6 15
Not to say there's anything wrong with using apply
as in the other answer, just that I don't like apply
, so this is my suggestion :-)
Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.
%%timeit
(df1.stack()
.groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
.transform('sum').unstack())
%%timeit
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
%%timeit
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Not to say apply
is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.
Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.
– Scott Boston
1 hour ago
1
@ScottBoston I have already upvoted your answer for its simplicity B)
– coldspeed
1 hour ago
Thanks!! Yes, using a list comprehension wiithpd.concat
was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)
– yatu
59 mins ago
add a comment |
You will have to group each column individually since each column uses a different grouping scheme.
If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat
on the resultant series:
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
a b
0 4 11
1 6 11
2 4 15
3 6 15
Not to say there's anything wrong with using apply
as in the other answer, just that I don't like apply
, so this is my suggestion :-)
Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.
%%timeit
(df1.stack()
.groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
.transform('sum').unstack())
%%timeit
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
%%timeit
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Not to say apply
is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.
Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.
– Scott Boston
1 hour ago
1
@ScottBoston I have already upvoted your answer for its simplicity B)
– coldspeed
1 hour ago
Thanks!! Yes, using a list comprehension wiithpd.concat
was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)
– yatu
59 mins ago
add a comment |
You will have to group each column individually since each column uses a different grouping scheme.
If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat
on the resultant series:
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
a b
0 4 11
1 6 11
2 4 15
3 6 15
Not to say there's anything wrong with using apply
as in the other answer, just that I don't like apply
, so this is my suggestion :-)
Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.
%%timeit
(df1.stack()
.groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
.transform('sum').unstack())
%%timeit
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
%%timeit
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Not to say apply
is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.
You will have to group each column individually since each column uses a different grouping scheme.
If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat
on the resultant series:
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
a b
0 4 11
1 6 11
2 4 15
3 6 15
Not to say there's anything wrong with using apply
as in the other answer, just that I don't like apply
, so this is my suggestion :-)
Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.
%%timeit
(df1.stack()
.groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
.transform('sum').unstack())
%%timeit
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
%%timeit
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)
8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Not to say apply
is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.
edited 1 hour ago
answered 1 hour ago
coldspeedcoldspeed
124k22125208
124k22125208
Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.
– Scott Boston
1 hour ago
1
@ScottBoston I have already upvoted your answer for its simplicity B)
– coldspeed
1 hour ago
Thanks!! Yes, using a list comprehension wiithpd.concat
was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)
– yatu
59 mins ago
add a comment |
Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.
– Scott Boston
1 hour ago
1
@ScottBoston I have already upvoted your answer for its simplicity B)
– coldspeed
1 hour ago
Thanks!! Yes, using a list comprehension wiithpd.concat
was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)
– yatu
59 mins ago
Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.
– Scott Boston
1 hour ago
Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.
– Scott Boston
1 hour ago
1
1
@ScottBoston I have already upvoted your answer for its simplicity B)
– coldspeed
1 hour ago
@ScottBoston I have already upvoted your answer for its simplicity B)
– coldspeed
1 hour ago
Thanks!! Yes, using a list comprehension wiith
pd.concat
was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)– yatu
59 mins ago
Thanks!! Yes, using a list comprehension wiith
pd.concat
was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)– yatu
59 mins ago
add a comment |
You could do something like the following:
res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
.assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))
Results:
a b
0 4 11
1 6 11
2 4 15
3 6 15
add a comment |
You could do something like the following:
res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
.assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))
Results:
a b
0 4 11
1 6 11
2 4 15
3 6 15
add a comment |
You could do something like the following:
res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
.assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))
Results:
a b
0 4 11
1 6 11
2 4 15
3 6 15
You could do something like the following:
res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
.assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))
Results:
a b
0 4 11
1 6 11
2 4 15
3 6 15
answered 1 hour ago
PMendePMende
1,428512
1,428512
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54202615%2fmultidimensional-grouper-for-a-groupby%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
can you elaborate on the desired output? not clear what the rule is
– Yuca
2 hours ago
Sure, added a brief explanation. Let me know if still not clear
– yatu
2 hours ago
What do you group by? Your output has the same number of rows and columns as the input.
– Zoe
2 hours ago
So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?
– Yuca
2 hours ago
Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values
– yatu
2 hours ago