Multidimensional grouper for a groupby












6















How could I use a multidimensional Grouper, in this case another dataframe, as a Grouper for another dataframe? Can it be done in one step?



My question is essentially regarding how to perform an actual grouping under these circumstances, but to make it more specific, say I want to then transform and take the sum.



Consider for example:



df1 = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]})

print(df1)
a b
0 1 5
1 2 6
2 3 7
3 4 8

df2 = pd.DataFrame({'a':['A','B','A','B'], 'b':['A','A','B','B']})

print(df2)
a b
0 A A
1 B A
2 A B
3 B B


Then, the expected output would be:



   a  b
0 4 11
1 6 11
2 4 15
3 6 15


Where columns a and b in df1 have been grouped by columns a and b from df2 respectively.










share|improve this question




















  • 1





    can you elaborate on the desired output? not clear what the rule is

    – Yuca
    2 hours ago











  • Sure, added a brief explanation. Let me know if still not clear

    – yatu
    2 hours ago











  • What do you group by? Your output has the same number of rows and columns as the input.

    – Zoe
    2 hours ago











  • So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?

    – Yuca
    2 hours ago











  • Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values

    – yatu
    2 hours ago
















6















How could I use a multidimensional Grouper, in this case another dataframe, as a Grouper for another dataframe? Can it be done in one step?



My question is essentially regarding how to perform an actual grouping under these circumstances, but to make it more specific, say I want to then transform and take the sum.



Consider for example:



df1 = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]})

print(df1)
a b
0 1 5
1 2 6
2 3 7
3 4 8

df2 = pd.DataFrame({'a':['A','B','A','B'], 'b':['A','A','B','B']})

print(df2)
a b
0 A A
1 B A
2 A B
3 B B


Then, the expected output would be:



   a  b
0 4 11
1 6 11
2 4 15
3 6 15


Where columns a and b in df1 have been grouped by columns a and b from df2 respectively.










share|improve this question




















  • 1





    can you elaborate on the desired output? not clear what the rule is

    – Yuca
    2 hours ago











  • Sure, added a brief explanation. Let me know if still not clear

    – yatu
    2 hours ago











  • What do you group by? Your output has the same number of rows and columns as the input.

    – Zoe
    2 hours ago











  • So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?

    – Yuca
    2 hours ago











  • Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values

    – yatu
    2 hours ago














6












6








6








How could I use a multidimensional Grouper, in this case another dataframe, as a Grouper for another dataframe? Can it be done in one step?



My question is essentially regarding how to perform an actual grouping under these circumstances, but to make it more specific, say I want to then transform and take the sum.



Consider for example:



df1 = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]})

print(df1)
a b
0 1 5
1 2 6
2 3 7
3 4 8

df2 = pd.DataFrame({'a':['A','B','A','B'], 'b':['A','A','B','B']})

print(df2)
a b
0 A A
1 B A
2 A B
3 B B


Then, the expected output would be:



   a  b
0 4 11
1 6 11
2 4 15
3 6 15


Where columns a and b in df1 have been grouped by columns a and b from df2 respectively.










share|improve this question
















How could I use a multidimensional Grouper, in this case another dataframe, as a Grouper for another dataframe? Can it be done in one step?



My question is essentially regarding how to perform an actual grouping under these circumstances, but to make it more specific, say I want to then transform and take the sum.



Consider for example:



df1 = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8]})

print(df1)
a b
0 1 5
1 2 6
2 3 7
3 4 8

df2 = pd.DataFrame({'a':['A','B','A','B'], 'b':['A','A','B','B']})

print(df2)
a b
0 A A
1 B A
2 A B
3 B B


Then, the expected output would be:



   a  b
0 4 11
1 6 11
2 4 15
3 6 15


Where columns a and b in df1 have been grouped by columns a and b from df2 respectively.







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 hours ago







yatu

















asked 2 hours ago









yatuyatu

6,6321825




6,6321825








  • 1





    can you elaborate on the desired output? not clear what the rule is

    – Yuca
    2 hours ago











  • Sure, added a brief explanation. Let me know if still not clear

    – yatu
    2 hours ago











  • What do you group by? Your output has the same number of rows and columns as the input.

    – Zoe
    2 hours ago











  • So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?

    – Yuca
    2 hours ago











  • Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values

    – yatu
    2 hours ago














  • 1





    can you elaborate on the desired output? not clear what the rule is

    – Yuca
    2 hours ago











  • Sure, added a brief explanation. Let me know if still not clear

    – yatu
    2 hours ago











  • What do you group by? Your output has the same number of rows and columns as the input.

    – Zoe
    2 hours ago











  • So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?

    – Yuca
    2 hours ago











  • Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values

    – yatu
    2 hours ago








1




1





can you elaborate on the desired output? not clear what the rule is

– Yuca
2 hours ago





can you elaborate on the desired output? not clear what the rule is

– Yuca
2 hours ago













Sure, added a brief explanation. Let me know if still not clear

– yatu
2 hours ago





Sure, added a brief explanation. Let me know if still not clear

– yatu
2 hours ago













What do you group by? Your output has the same number of rows and columns as the input.

– Zoe
2 hours ago





What do you group by? Your output has the same number of rows and columns as the input.

– Zoe
2 hours ago













So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?

– Yuca
2 hours ago





So you are grouping rows 1 and 3 in df1 because rows 1 and 3 are grouped in df2, correct?

– Yuca
2 hours ago













Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values

– yatu
2 hours ago





Yes that is correct. The the resulting df has the same shape as df1, with the sum of the grouped values

– yatu
2 hours ago












4 Answers
4






active

oldest

votes


















7














Try using apply to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:



df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))


Output:



   a   b
0 4 11
1 6 11
2 4 15
3 6 15





share|improve this answer


























  • Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case

    – yatu
    2 hours ago











  • No, I don't think you can apply two different groupings to a dataframe based on a column.

    – Scott Boston
    1 hour ago






  • 1





    Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept

    – yatu
    1 hour ago





















4














Using stack and unstack



df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
Out[291]:
a b
0 4 11
1 6 11
2 4 15
3 6 15





share|improve this answer
























  • Thanks @W-B interesting approach!!

    – yatu
    55 mins ago



















4














You will have to group each column individually since each column uses a different grouping scheme.



If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat on the resultant series:



pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

a b
0 4 11
1 6 11
2 4 15
3 6 15


Not to say there's anything wrong with using apply as in the other answer, just that I don't like apply, so this is my suggestion :-)





Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.



%%timeit 
(df1.stack()
.groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
.transform('sum').unstack())
%%timeit
df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
%%timeit
pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Not to say apply is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.






share|improve this answer


























  • Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.

    – Scott Boston
    1 hour ago






  • 1





    @ScottBoston I have already upvoted your answer for its simplicity B)

    – coldspeed
    1 hour ago











  • Thanks!! Yes, using a list comprehension wiith pd.concat was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)

    – yatu
    59 mins ago





















0














You could do something like the following:



res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
.assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))


Results:



   a   b
0 4 11
1 6 11
2 4 15
3 6 15





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54202615%2fmultidimensional-grouper-for-a-groupby%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    7














    Try using apply to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:



    df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))


    Output:



       a   b
    0 4 11
    1 6 11
    2 4 15
    3 6 15





    share|improve this answer


























    • Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case

      – yatu
      2 hours ago











    • No, I don't think you can apply two different groupings to a dataframe based on a column.

      – Scott Boston
      1 hour ago






    • 1





      Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept

      – yatu
      1 hour ago


















    7














    Try using apply to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:



    df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))


    Output:



       a   b
    0 4 11
    1 6 11
    2 4 15
    3 6 15





    share|improve this answer


























    • Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case

      – yatu
      2 hours ago











    • No, I don't think you can apply two different groupings to a dataframe based on a column.

      – Scott Boston
      1 hour ago






    • 1





      Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept

      – yatu
      1 hour ago
















    7












    7








    7







    Try using apply to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:



    df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))


    Output:



       a   b
    0 4 11
    1 6 11
    2 4 15
    3 6 15





    share|improve this answer















    Try using apply to apply a lambda function to each column of your dataframe, then use the name of that pd.Series to group by the second dataframe:



    df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))


    Output:



       a   b
    0 4 11
    1 6 11
    2 4 15
    3 6 15






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 hours ago

























    answered 2 hours ago









    Scott BostonScott Boston

    52.7k72955




    52.7k72955













    • Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case

      – yatu
      2 hours ago











    • No, I don't think you can apply two different groupings to a dataframe based on a column.

      – Scott Boston
      1 hour ago






    • 1





      Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept

      – yatu
      1 hour ago





















    • Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case

      – yatu
      2 hours ago











    • No, I don't think you can apply two different groupings to a dataframe based on a column.

      – Scott Boston
      1 hour ago






    • 1





      Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept

      – yatu
      1 hour ago



















    Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case

    – yatu
    2 hours ago





    Nicee! Guessing it can not directly be done using Groupby rather than applying along columns right? Nice alternative in any case

    – yatu
    2 hours ago













    No, I don't think you can apply two different groupings to a dataframe based on a column.

    – Scott Boston
    1 hour ago





    No, I don't think you can apply two different groupings to a dataframe based on a column.

    – Scott Boston
    1 hour ago




    1




    1





    Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept

    – yatu
    1 hour ago







    Ok, thanks. Will leave for some time see if I get any other answers, otherwise will accept

    – yatu
    1 hour ago















    4














    Using stack and unstack



    df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
    Out[291]:
    a b
    0 4 11
    1 6 11
    2 4 15
    3 6 15





    share|improve this answer
























    • Thanks @W-B interesting approach!!

      – yatu
      55 mins ago
















    4














    Using stack and unstack



    df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
    Out[291]:
    a b
    0 4 11
    1 6 11
    2 4 15
    3 6 15





    share|improve this answer
























    • Thanks @W-B interesting approach!!

      – yatu
      55 mins ago














    4












    4








    4







    Using stack and unstack



    df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
    Out[291]:
    a b
    0 4 11
    1 6 11
    2 4 15
    3 6 15





    share|improve this answer













    Using stack and unstack



    df1.stack().groupby([df2.stack().index.get_level_values(level=1),df2.stack()]).transform('sum').unstack()
    Out[291]:
    a b
    0 4 11
    1 6 11
    2 4 15
    3 6 15






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 1 hour ago









    W-BW-B

    104k73165




    104k73165













    • Thanks @W-B interesting approach!!

      – yatu
      55 mins ago



















    • Thanks @W-B interesting approach!!

      – yatu
      55 mins ago

















    Thanks @W-B interesting approach!!

    – yatu
    55 mins ago





    Thanks @W-B interesting approach!!

    – yatu
    55 mins ago











    4














    You will have to group each column individually since each column uses a different grouping scheme.



    If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat on the resultant series:



    pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

    a b
    0 4 11
    1 6 11
    2 4 15
    3 6 15


    Not to say there's anything wrong with using apply as in the other answer, just that I don't like apply, so this is my suggestion :-)





    Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.



    %%timeit 
    (df1.stack()
    .groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
    .transform('sum').unstack())
    %%timeit
    df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
    %%timeit
    pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

    8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
    6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


    Not to say apply is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.






    share|improve this answer


























    • Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.

      – Scott Boston
      1 hour ago






    • 1





      @ScottBoston I have already upvoted your answer for its simplicity B)

      – coldspeed
      1 hour ago











    • Thanks!! Yes, using a list comprehension wiith pd.concat was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)

      – yatu
      59 mins ago


















    4














    You will have to group each column individually since each column uses a different grouping scheme.



    If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat on the resultant series:



    pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

    a b
    0 4 11
    1 6 11
    2 4 15
    3 6 15


    Not to say there's anything wrong with using apply as in the other answer, just that I don't like apply, so this is my suggestion :-)





    Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.



    %%timeit 
    (df1.stack()
    .groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
    .transform('sum').unstack())
    %%timeit
    df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
    %%timeit
    pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

    8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
    6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


    Not to say apply is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.






    share|improve this answer


























    • Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.

      – Scott Boston
      1 hour ago






    • 1





      @ScottBoston I have already upvoted your answer for its simplicity B)

      – coldspeed
      1 hour ago











    • Thanks!! Yes, using a list comprehension wiith pd.concat was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)

      – yatu
      59 mins ago
















    4












    4








    4







    You will have to group each column individually since each column uses a different grouping scheme.



    If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat on the resultant series:



    pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

    a b
    0 4 11
    1 6 11
    2 4 15
    3 6 15


    Not to say there's anything wrong with using apply as in the other answer, just that I don't like apply, so this is my suggestion :-)





    Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.



    %%timeit 
    (df1.stack()
    .groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
    .transform('sum').unstack())
    %%timeit
    df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
    %%timeit
    pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

    8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
    6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


    Not to say apply is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.






    share|improve this answer















    You will have to group each column individually since each column uses a different grouping scheme.



    If you want a cleaner version, I would recommend a list comprehension over the column names, and call pd.concat on the resultant series:



    pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

    a b
    0 4 11
    1 6 11
    2 4 15
    3 6 15


    Not to say there's anything wrong with using apply as in the other answer, just that I don't like apply, so this is my suggestion :-)





    Here are some timeits for your perusal. Just for your sample data, you will notice the difference in timings is obvious.



    %%timeit 
    (df1.stack()
    .groupby([df2.stack().index.get_level_values(level=1), df2.stack()])
    .transform('sum').unstack())
    %%timeit
    df1.apply(lambda x: x.groupby(df2[x.name]).transform('sum'))
    %%timeit
    pd.concat([df1[c].groupby(df2[c]).transform('sum') for c in df1.columns], axis=1)

    8.99 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    8.35 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
    6.13 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


    Not to say apply is slow, but explicit iteration in this case is faster. Additionally, you will notice the second and third timed solution will scale better with larger length v/s breadth since the number of iterations depends on the number of columns.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 1 hour ago

























    answered 1 hour ago









    coldspeedcoldspeed

    124k22125208




    124k22125208













    • Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.

      – Scott Boston
      1 hour ago






    • 1





      @ScottBoston I have already upvoted your answer for its simplicity B)

      – coldspeed
      1 hour ago











    • Thanks!! Yes, using a list comprehension wiith pd.concat was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)

      – yatu
      59 mins ago





















    • Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.

      – Scott Boston
      1 hour ago






    • 1





      @ScottBoston I have already upvoted your answer for its simplicity B)

      – coldspeed
      1 hour ago











    • Thanks!! Yes, using a list comprehension wiith pd.concat was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)

      – yatu
      59 mins ago



















    Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.

    – Scott Boston
    1 hour ago





    Yep, getting rid of that apply makes a lot of sense; to use list comprehension here.

    – Scott Boston
    1 hour ago




    1




    1





    @ScottBoston I have already upvoted your answer for its simplicity B)

    – coldspeed
    1 hour ago





    @ScottBoston I have already upvoted your answer for its simplicity B)

    – coldspeed
    1 hour ago













    Thanks!! Yes, using a list comprehension wiith pd.concat was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)

    – yatu
    59 mins ago







    Thanks!! Yes, using a list comprehension wiith pd.concat was what I had in mind, was curious to know whether looping could be avoided. Nice to see other alternatives here too though. And thanks for the timeits :)

    – yatu
    59 mins ago













    0














    You could do something like the following:



    res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
    .assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))


    Results:



       a   b
    0 4 11
    1 6 11
    2 4 15
    3 6 15





    share|improve this answer




























      0














      You could do something like the following:



      res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
      .assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))


      Results:



         a   b
      0 4 11
      1 6 11
      2 4 15
      3 6 15





      share|improve this answer


























        0












        0








        0







        You could do something like the following:



        res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
        .assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))


        Results:



           a   b
        0 4 11
        1 6 11
        2 4 15
        3 6 15





        share|improve this answer













        You could do something like the following:



        res = df1.assign(a_sum=lambda df: df['a'].groupby(df2['a']).transform('sum'))
        .assign(b_sum=lambda df: df['b'].groupby(df2['b']).transform('sum'))


        Results:



           a   b
        0 4 11
        1 6 11
        2 4 15
        3 6 15






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 1 hour ago









        PMendePMende

        1,428512




        1,428512






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54202615%2fmultidimensional-grouper-for-a-groupby%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            Refactoring coordinates for Minecraft Pi buildings written in Python