Pandas DataFrame Groupby two columns and get counts












83















I have a pandas dataframe in the following format:



df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']


df:



   col1 col2 col3     col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1


Now I want to group this by two columns like following:



df.groupby(['col5','col2']).reset_index()


OutPut:



             index col1 col2 col3     col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6


I want to get the count by each row like following.
Expected Output:



col5 col2 count
1 A 1
D 3
2 B 2
etc...


How to get my expected output? And I want to find largest count for each 'col2' value?










share|improve this question























  • A very similar question just came up yesterday.. see here.

    – bdiamante
    Jul 16 '13 at 14:29













  • For an idiomatic solution that only uses a single groupby, see this answer below

    – Ted Petrou
    Nov 5 '17 at 19:38











  • Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series

    – jpp
    Jun 25 '18 at 14:01
















83















I have a pandas dataframe in the following format:



df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']


df:



   col1 col2 col3     col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1


Now I want to group this by two columns like following:



df.groupby(['col5','col2']).reset_index()


OutPut:



             index col1 col2 col3     col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6


I want to get the count by each row like following.
Expected Output:



col5 col2 count
1 A 1
D 3
2 B 2
etc...


How to get my expected output? And I want to find largest count for each 'col2' value?










share|improve this question























  • A very similar question just came up yesterday.. see here.

    – bdiamante
    Jul 16 '13 at 14:29













  • For an idiomatic solution that only uses a single groupby, see this answer below

    – Ted Petrou
    Nov 5 '17 at 19:38











  • Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series

    – jpp
    Jun 25 '18 at 14:01














83












83








83


31






I have a pandas dataframe in the following format:



df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']


df:



   col1 col2 col3     col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1


Now I want to group this by two columns like following:



df.groupby(['col5','col2']).reset_index()


OutPut:



             index col1 col2 col3     col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6


I want to get the count by each row like following.
Expected Output:



col5 col2 count
1 A 1
D 3
2 B 2
etc...


How to get my expected output? And I want to find largest count for each 'col2' value?










share|improve this question














I have a pandas dataframe in the following format:



df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']


df:



   col1 col2 col3     col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1


Now I want to group this by two columns like following:



df.groupby(['col5','col2']).reset_index()


OutPut:



             index col1 col2 col3     col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6


I want to get the count by each row like following.
Expected Output:



col5 col2 count
1 A 1
D 3
2 B 2
etc...


How to get my expected output? And I want to find largest count for each 'col2' value?







python pandas dataframe






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jul 16 '13 at 14:19









Nilani AlgiriyageNilani Algiriyage

6,319216192




6,319216192













  • A very similar question just came up yesterday.. see here.

    – bdiamante
    Jul 16 '13 at 14:29













  • For an idiomatic solution that only uses a single groupby, see this answer below

    – Ted Petrou
    Nov 5 '17 at 19:38











  • Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series

    – jpp
    Jun 25 '18 at 14:01



















  • A very similar question just came up yesterday.. see here.

    – bdiamante
    Jul 16 '13 at 14:29













  • For an idiomatic solution that only uses a single groupby, see this answer below

    – Ted Petrou
    Nov 5 '17 at 19:38











  • Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series

    – jpp
    Jun 25 '18 at 14:01

















A very similar question just came up yesterday.. see here.

– bdiamante
Jul 16 '13 at 14:29







A very similar question just came up yesterday.. see here.

– bdiamante
Jul 16 '13 at 14:29















For an idiomatic solution that only uses a single groupby, see this answer below

– Ted Petrou
Nov 5 '17 at 19:38





For an idiomatic solution that only uses a single groupby, see this answer below

– Ted Petrou
Nov 5 '17 at 19:38













Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series

– jpp
Jun 25 '18 at 14:01





Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series

– jpp
Jun 25 '18 at 14:01












6 Answers
6






active

oldest

votes


















61














Followed by @Andy's answer, you can do following to solve your second question:



In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3





share|improve this answer



















  • 1





    Can I get "col5" values for this like C...1...3?

    – Nilani Algiriyage
    Jul 16 '13 at 15:12



















80














You are looking for size:



In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5 col2
1 A 1
D 3
2 B 2
3 A 3
C 1
4 B 1
5 B 2
6 B 1
dtype: int64




To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:



In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A 3
B 2
C 1
D 3
dtype: int64





share|improve this answer





















  • 1





    I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?

    – Nilani Algiriyage
    Jul 16 '13 at 14:40



















13














Inserting data into a pandas dataframe and providing column name.



import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df) #printing dataframe.


This is our printed data:



enter image description here



For making a group of dataframe in pandas and counter,

You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.



Like this:



df['COUNTER'] =1       #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)


OUTPUT:



enter image description here






share|improve this answer





















  • 2





    How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??

    – blissweb
    Jan 6 '18 at 6:54



















5














Idiomatic solution that uses only a single groupby



df.groupby(['col5', 'col2']).size() 
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='col2')

col5 col2 count
0 3 A 3
1 1 D 3
2 5 B 2
6 3 C 1


Explanation



The result of the groupby size method is a Series with col5 and col2 in the index. From here, you can use another groupby method to find the maximum value of each value in col2 but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2 with the drop_duplicates method.






share|improve this answer
























  • There is no param called name in reset_index() in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…

    – mmBs
    Nov 25 '18 at 21:27











  • pandas.pydata.org/pandas-docs/stable/generated/…

    – Ted Petrou
    Nov 26 '18 at 11:20











  • Ok, my bad. I used it when working with DataFrame not Series. Thanks for the link.

    – mmBs
    Nov 26 '18 at 11:42



















1














Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:



df.count_column=df.groupby(['col5','col2']).col5.transform('count')


(I picked 'col5' as it contains no nan)






share|improve this answer































    -2














    You can just use the built-in function count follow by the groupby function



    df.groupby(['col5','col2']).count()





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f17679089%2fpandas-dataframe-groupby-two-columns-and-get-counts%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      6 Answers
      6






      active

      oldest

      votes








      6 Answers
      6






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      61














      Followed by @Andy's answer, you can do following to solve your second question:



      In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
      Out[56]:
      0
      col2
      A 3
      B 2
      C 1
      D 3





      share|improve this answer



















      • 1





        Can I get "col5" values for this like C...1...3?

        – Nilani Algiriyage
        Jul 16 '13 at 15:12
















      61














      Followed by @Andy's answer, you can do following to solve your second question:



      In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
      Out[56]:
      0
      col2
      A 3
      B 2
      C 1
      D 3





      share|improve this answer



















      • 1





        Can I get "col5" values for this like C...1...3?

        – Nilani Algiriyage
        Jul 16 '13 at 15:12














      61












      61








      61







      Followed by @Andy's answer, you can do following to solve your second question:



      In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
      Out[56]:
      0
      col2
      A 3
      B 2
      C 1
      D 3





      share|improve this answer













      Followed by @Andy's answer, you can do following to solve your second question:



      In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
      Out[56]:
      0
      col2
      A 3
      B 2
      C 1
      D 3






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Jul 16 '13 at 14:53









      waitingkuowaitingkuo

      36.3k178699




      36.3k178699








      • 1





        Can I get "col5" values for this like C...1...3?

        – Nilani Algiriyage
        Jul 16 '13 at 15:12














      • 1





        Can I get "col5" values for this like C...1...3?

        – Nilani Algiriyage
        Jul 16 '13 at 15:12








      1




      1





      Can I get "col5" values for this like C...1...3?

      – Nilani Algiriyage
      Jul 16 '13 at 15:12





      Can I get "col5" values for this like C...1...3?

      – Nilani Algiriyage
      Jul 16 '13 at 15:12













      80














      You are looking for size:



      In [11]: df.groupby(['col5', 'col2']).size()
      Out[11]:
      col5 col2
      1 A 1
      D 3
      2 B 2
      3 A 3
      C 1
      4 B 1
      5 B 2
      6 B 1
      dtype: int64




      To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:



      In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
      Out[12]:
      col2
      A 3
      B 2
      C 1
      D 3
      dtype: int64





      share|improve this answer





















      • 1





        I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?

        – Nilani Algiriyage
        Jul 16 '13 at 14:40
















      80














      You are looking for size:



      In [11]: df.groupby(['col5', 'col2']).size()
      Out[11]:
      col5 col2
      1 A 1
      D 3
      2 B 2
      3 A 3
      C 1
      4 B 1
      5 B 2
      6 B 1
      dtype: int64




      To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:



      In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
      Out[12]:
      col2
      A 3
      B 2
      C 1
      D 3
      dtype: int64





      share|improve this answer





















      • 1





        I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?

        – Nilani Algiriyage
        Jul 16 '13 at 14:40














      80












      80








      80







      You are looking for size:



      In [11]: df.groupby(['col5', 'col2']).size()
      Out[11]:
      col5 col2
      1 A 1
      D 3
      2 B 2
      3 A 3
      C 1
      4 B 1
      5 B 2
      6 B 1
      dtype: int64




      To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:



      In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
      Out[12]:
      col2
      A 3
      B 2
      C 1
      D 3
      dtype: int64





      share|improve this answer















      You are looking for size:



      In [11]: df.groupby(['col5', 'col2']).size()
      Out[11]:
      col5 col2
      1 A 1
      D 3
      2 B 2
      3 A 3
      C 1
      4 B 1
      5 B 2
      6 B 1
      dtype: int64




      To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:



      In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
      Out[12]:
      col2
      A 3
      B 2
      C 1
      D 3
      dtype: int64






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Apr 2 '15 at 16:45

























      answered Jul 16 '13 at 14:37









      Andy HaydenAndy Hayden

      183k52431419




      183k52431419








      • 1





        I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?

        – Nilani Algiriyage
        Jul 16 '13 at 14:40














      • 1





        I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?

        – Nilani Algiriyage
        Jul 16 '13 at 14:40








      1




      1





      I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?

      – Nilani Algiriyage
      Jul 16 '13 at 14:40





      I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?

      – Nilani Algiriyage
      Jul 16 '13 at 14:40











      13














      Inserting data into a pandas dataframe and providing column name.



      import pandas as pd
      df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
      df.columns = [['Alphabet','Words']]
      print(df) #printing dataframe.


      This is our printed data:



      enter image description here



      For making a group of dataframe in pandas and counter,

      You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.



      Like this:



      df['COUNTER'] =1       #initially, set that counter to 1.
      group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
      print(group_data)


      OUTPUT:



      enter image description here






      share|improve this answer





















      • 2





        How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??

        – blissweb
        Jan 6 '18 at 6:54
















      13














      Inserting data into a pandas dataframe and providing column name.



      import pandas as pd
      df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
      df.columns = [['Alphabet','Words']]
      print(df) #printing dataframe.


      This is our printed data:



      enter image description here



      For making a group of dataframe in pandas and counter,

      You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.



      Like this:



      df['COUNTER'] =1       #initially, set that counter to 1.
      group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
      print(group_data)


      OUTPUT:



      enter image description here






      share|improve this answer





















      • 2





        How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??

        – blissweb
        Jan 6 '18 at 6:54














      13












      13








      13







      Inserting data into a pandas dataframe and providing column name.



      import pandas as pd
      df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
      df.columns = [['Alphabet','Words']]
      print(df) #printing dataframe.


      This is our printed data:



      enter image description here



      For making a group of dataframe in pandas and counter,

      You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.



      Like this:



      df['COUNTER'] =1       #initially, set that counter to 1.
      group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
      print(group_data)


      OUTPUT:



      enter image description here






      share|improve this answer















      Inserting data into a pandas dataframe and providing column name.



      import pandas as pd
      df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
      df.columns = [['Alphabet','Words']]
      print(df) #printing dataframe.


      This is our printed data:



      enter image description here



      For making a group of dataframe in pandas and counter,

      You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.



      Like this:



      df['COUNTER'] =1       #initially, set that counter to 1.
      group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
      print(group_data)


      OUTPUT:



      enter image description here







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Sep 20 '17 at 12:02









      Vaibhav Mule

      3,06112444




      3,06112444










      answered Jul 21 '16 at 11:53









      The Gr8 AdakronThe Gr8 Adakron

      588612




      588612








      • 2





        How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??

        – blissweb
        Jan 6 '18 at 6:54














      • 2





        How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??

        – blissweb
        Jan 6 '18 at 6:54








      2




      2





      How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??

      – blissweb
      Jan 6 '18 at 6:54





      How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??

      – blissweb
      Jan 6 '18 at 6:54











      5














      Idiomatic solution that uses only a single groupby



      df.groupby(['col5', 'col2']).size() 
      .sort_values(ascending=False)
      .reset_index(name='count')
      .drop_duplicates(subset='col2')

      col5 col2 count
      0 3 A 3
      1 1 D 3
      2 5 B 2
      6 3 C 1


      Explanation



      The result of the groupby size method is a Series with col5 and col2 in the index. From here, you can use another groupby method to find the maximum value of each value in col2 but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2 with the drop_duplicates method.






      share|improve this answer
























      • There is no param called name in reset_index() in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…

        – mmBs
        Nov 25 '18 at 21:27











      • pandas.pydata.org/pandas-docs/stable/generated/…

        – Ted Petrou
        Nov 26 '18 at 11:20











      • Ok, my bad. I used it when working with DataFrame not Series. Thanks for the link.

        – mmBs
        Nov 26 '18 at 11:42
















      5














      Idiomatic solution that uses only a single groupby



      df.groupby(['col5', 'col2']).size() 
      .sort_values(ascending=False)
      .reset_index(name='count')
      .drop_duplicates(subset='col2')

      col5 col2 count
      0 3 A 3
      1 1 D 3
      2 5 B 2
      6 3 C 1


      Explanation



      The result of the groupby size method is a Series with col5 and col2 in the index. From here, you can use another groupby method to find the maximum value of each value in col2 but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2 with the drop_duplicates method.






      share|improve this answer
























      • There is no param called name in reset_index() in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…

        – mmBs
        Nov 25 '18 at 21:27











      • pandas.pydata.org/pandas-docs/stable/generated/…

        – Ted Petrou
        Nov 26 '18 at 11:20











      • Ok, my bad. I used it when working with DataFrame not Series. Thanks for the link.

        – mmBs
        Nov 26 '18 at 11:42














      5












      5








      5







      Idiomatic solution that uses only a single groupby



      df.groupby(['col5', 'col2']).size() 
      .sort_values(ascending=False)
      .reset_index(name='count')
      .drop_duplicates(subset='col2')

      col5 col2 count
      0 3 A 3
      1 1 D 3
      2 5 B 2
      6 3 C 1


      Explanation



      The result of the groupby size method is a Series with col5 and col2 in the index. From here, you can use another groupby method to find the maximum value of each value in col2 but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2 with the drop_duplicates method.






      share|improve this answer













      Idiomatic solution that uses only a single groupby



      df.groupby(['col5', 'col2']).size() 
      .sort_values(ascending=False)
      .reset_index(name='count')
      .drop_duplicates(subset='col2')

      col5 col2 count
      0 3 A 3
      1 1 D 3
      2 5 B 2
      6 3 C 1


      Explanation



      The result of the groupby size method is a Series with col5 and col2 in the index. From here, you can use another groupby method to find the maximum value of each value in col2 but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2 with the drop_duplicates method.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Nov 5 '17 at 19:37









      Ted PetrouTed Petrou

      22.6k87166




      22.6k87166













      • There is no param called name in reset_index() in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…

        – mmBs
        Nov 25 '18 at 21:27











      • pandas.pydata.org/pandas-docs/stable/generated/…

        – Ted Petrou
        Nov 26 '18 at 11:20











      • Ok, my bad. I used it when working with DataFrame not Series. Thanks for the link.

        – mmBs
        Nov 26 '18 at 11:42



















      • There is no param called name in reset_index() in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…

        – mmBs
        Nov 25 '18 at 21:27











      • pandas.pydata.org/pandas-docs/stable/generated/…

        – Ted Petrou
        Nov 26 '18 at 11:20











      • Ok, my bad. I used it when working with DataFrame not Series. Thanks for the link.

        – mmBs
        Nov 26 '18 at 11:42

















      There is no param called name in reset_index() in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…

      – mmBs
      Nov 25 '18 at 21:27





      There is no param called name in reset_index() in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…

      – mmBs
      Nov 25 '18 at 21:27













      pandas.pydata.org/pandas-docs/stable/generated/…

      – Ted Petrou
      Nov 26 '18 at 11:20





      pandas.pydata.org/pandas-docs/stable/generated/…

      – Ted Petrou
      Nov 26 '18 at 11:20













      Ok, my bad. I used it when working with DataFrame not Series. Thanks for the link.

      – mmBs
      Nov 26 '18 at 11:42





      Ok, my bad. I used it when working with DataFrame not Series. Thanks for the link.

      – mmBs
      Nov 26 '18 at 11:42











      1














      Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:



      df.count_column=df.groupby(['col5','col2']).col5.transform('count')


      (I picked 'col5' as it contains no nan)






      share|improve this answer




























        1














        Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:



        df.count_column=df.groupby(['col5','col2']).col5.transform('count')


        (I picked 'col5' as it contains no nan)






        share|improve this answer


























          1












          1








          1







          Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:



          df.count_column=df.groupby(['col5','col2']).col5.transform('count')


          (I picked 'col5' as it contains no nan)






          share|improve this answer













          Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:



          df.count_column=df.groupby(['col5','col2']).col5.transform('count')


          (I picked 'col5' as it contains no nan)







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jun 6 '17 at 12:20









          TomTom

          525




          525























              -2














              You can just use the built-in function count follow by the groupby function



              df.groupby(['col5','col2']).count()





              share|improve this answer




























                -2














                You can just use the built-in function count follow by the groupby function



                df.groupby(['col5','col2']).count()





                share|improve this answer


























                  -2












                  -2








                  -2







                  You can just use the built-in function count follow by the groupby function



                  df.groupby(['col5','col2']).count()





                  share|improve this answer













                  You can just use the built-in function count follow by the groupby function



                  df.groupby(['col5','col2']).count()






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 2 '16 at 2:22









                  seansio1995seansio1995

                  11




                  11






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f17679089%2fpandas-dataframe-groupby-two-columns-and-get-counts%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      404 Error Contact Form 7 ajax form submitting

                      How to know if a Active Directory user can login interactively

                      TypeError: fit_transform() missing 1 required positional argument: 'X'