Pandas DataFrame Groupby two columns and get counts
I have a pandas dataframe in the following format:
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']
df:
col1 col2 col3 col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1
Now I want to group this by two columns like following:
df.groupby(['col5','col2']).reset_index()
OutPut:
index col1 col2 col3 col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6
I want to get the count by each row like following.
Expected Output:
col5 col2 count
1 A 1
D 3
2 B 2
etc...
How to get my expected output? And I want to find largest count for each 'col2' value?
python pandas dataframe
add a comment |
I have a pandas dataframe in the following format:
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']
df:
col1 col2 col3 col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1
Now I want to group this by two columns like following:
df.groupby(['col5','col2']).reset_index()
OutPut:
index col1 col2 col3 col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6
I want to get the count by each row like following.
Expected Output:
col5 col2 count
1 A 1
D 3
2 B 2
etc...
How to get my expected output? And I want to find largest count for each 'col2' value?
python pandas dataframe
A very similar question just came up yesterday.. see here.
– bdiamante
Jul 16 '13 at 14:29
For an idiomatic solution that only uses a singlegroupby
, see this answer below
– Ted Petrou
Nov 5 '17 at 19:38
Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series
– jpp
Jun 25 '18 at 14:01
add a comment |
I have a pandas dataframe in the following format:
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']
df:
col1 col2 col3 col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1
Now I want to group this by two columns like following:
df.groupby(['col5','col2']).reset_index()
OutPut:
index col1 col2 col3 col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6
I want to get the count by each row like following.
Expected Output:
col5 col2 count
1 A 1
D 3
2 B 2
etc...
How to get my expected output? And I want to find largest count for each 'col2' value?
python pandas dataframe
I have a pandas dataframe in the following format:
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']
df:
col1 col2 col3 col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1
Now I want to group this by two columns like following:
df.groupby(['col5','col2']).reset_index()
OutPut:
index col1 col2 col3 col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6
I want to get the count by each row like following.
Expected Output:
col5 col2 count
1 A 1
D 3
2 B 2
etc...
How to get my expected output? And I want to find largest count for each 'col2' value?
python pandas dataframe
python pandas dataframe
asked Jul 16 '13 at 14:19
Nilani AlgiriyageNilani Algiriyage
6,319216192
6,319216192
A very similar question just came up yesterday.. see here.
– bdiamante
Jul 16 '13 at 14:29
For an idiomatic solution that only uses a singlegroupby
, see this answer below
– Ted Petrou
Nov 5 '17 at 19:38
Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series
– jpp
Jun 25 '18 at 14:01
add a comment |
A very similar question just came up yesterday.. see here.
– bdiamante
Jul 16 '13 at 14:29
For an idiomatic solution that only uses a singlegroupby
, see this answer below
– Ted Petrou
Nov 5 '17 at 19:38
Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series
– jpp
Jun 25 '18 at 14:01
A very similar question just came up yesterday.. see here.
– bdiamante
Jul 16 '13 at 14:29
A very similar question just came up yesterday.. see here.
– bdiamante
Jul 16 '13 at 14:29
For an idiomatic solution that only uses a single
groupby
, see this answer below– Ted Petrou
Nov 5 '17 at 19:38
For an idiomatic solution that only uses a single
groupby
, see this answer below– Ted Petrou
Nov 5 '17 at 19:38
Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series
– jpp
Jun 25 '18 at 14:01
Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series
– jpp
Jun 25 '18 at 14:01
add a comment |
6 Answers
6
active
oldest
votes
Followed by @Andy's answer, you can do following to solve your second question:
In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
1
Can I get "col5" values for this like C...1...3?
– Nilani Algiriyage
Jul 16 '13 at 15:12
add a comment |
You are looking for size
:
In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5 col2
1 A 1
D 3
2 B 2
3 A 3
C 1
4 B 1
5 B 2
6 B 1
dtype: int64
To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:
In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A 3
B 2
C 1
D 3
dtype: int64
1
I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?
– Nilani Algiriyage
Jul 16 '13 at 14:40
add a comment |
Inserting data into a pandas dataframe and providing column name.
import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df) #printing dataframe.
This is our printed data:
For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.
Like this:
df['COUNTER'] =1 #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)
OUTPUT:
2
How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??
– blissweb
Jan 6 '18 at 6:54
add a comment |
Idiomatic solution that uses only a single groupby
df.groupby(['col5', 'col2']).size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='col2')
col5 col2 count
0 3 A 3
1 1 D 3
2 5 B 2
6 3 C 1
Explanation
The result of the groupby size
method is a Series with col5
and col2
in the index. From here, you can use another groupby method to find the maximum value of each value in col2
but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2
with the drop_duplicates
method.
There is no param calledname
inreset_index()
in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…
– mmBs
Nov 25 '18 at 21:27
pandas.pydata.org/pandas-docs/stable/generated/…
– Ted Petrou
Nov 26 '18 at 11:20
Ok, my bad. I used it when working withDataFrame
notSeries
. Thanks for the link.
– mmBs
Nov 26 '18 at 11:42
add a comment |
Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:
df.count_column=df.groupby(['col5','col2']).col5.transform('count')
(I picked 'col5' as it contains no nan)
add a comment |
You can just use the built-in function count follow by the groupby function
df.groupby(['col5','col2']).count()
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f17679089%2fpandas-dataframe-groupby-two-columns-and-get-counts%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
Followed by @Andy's answer, you can do following to solve your second question:
In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
1
Can I get "col5" values for this like C...1...3?
– Nilani Algiriyage
Jul 16 '13 at 15:12
add a comment |
Followed by @Andy's answer, you can do following to solve your second question:
In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
1
Can I get "col5" values for this like C...1...3?
– Nilani Algiriyage
Jul 16 '13 at 15:12
add a comment |
Followed by @Andy's answer, you can do following to solve your second question:
In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
Followed by @Andy's answer, you can do following to solve your second question:
In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
answered Jul 16 '13 at 14:53
waitingkuowaitingkuo
36.3k178699
36.3k178699
1
Can I get "col5" values for this like C...1...3?
– Nilani Algiriyage
Jul 16 '13 at 15:12
add a comment |
1
Can I get "col5" values for this like C...1...3?
– Nilani Algiriyage
Jul 16 '13 at 15:12
1
1
Can I get "col5" values for this like C...1...3?
– Nilani Algiriyage
Jul 16 '13 at 15:12
Can I get "col5" values for this like C...1...3?
– Nilani Algiriyage
Jul 16 '13 at 15:12
add a comment |
You are looking for size
:
In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5 col2
1 A 1
D 3
2 B 2
3 A 3
C 1
4 B 1
5 B 2
6 B 1
dtype: int64
To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:
In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A 3
B 2
C 1
D 3
dtype: int64
1
I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?
– Nilani Algiriyage
Jul 16 '13 at 14:40
add a comment |
You are looking for size
:
In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5 col2
1 A 1
D 3
2 B 2
3 A 3
C 1
4 B 1
5 B 2
6 B 1
dtype: int64
To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:
In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A 3
B 2
C 1
D 3
dtype: int64
1
I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?
– Nilani Algiriyage
Jul 16 '13 at 14:40
add a comment |
You are looking for size
:
In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5 col2
1 A 1
D 3
2 B 2
3 A 3
C 1
4 B 1
5 B 2
6 B 1
dtype: int64
To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:
In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A 3
B 2
C 1
D 3
dtype: int64
You are looking for size
:
In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5 col2
1 A 1
D 3
2 B 2
3 A 3
C 1
4 B 1
5 B 2
6 B 1
dtype: int64
To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:
In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A 3
B 2
C 1
D 3
dtype: int64
edited Apr 2 '15 at 16:45
answered Jul 16 '13 at 14:37
Andy HaydenAndy Hayden
183k52431419
183k52431419
1
I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?
– Nilani Algiriyage
Jul 16 '13 at 14:40
add a comment |
1
I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?
– Nilani Algiriyage
Jul 16 '13 at 14:40
1
1
I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?
– Nilani Algiriyage
Jul 16 '13 at 14:40
I don't know Why I forgot this :O, Any way what about my second question?Find largest count for each "col2" value and get corresponding "col5" value?
– Nilani Algiriyage
Jul 16 '13 at 14:40
add a comment |
Inserting data into a pandas dataframe and providing column name.
import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df) #printing dataframe.
This is our printed data:
For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.
Like this:
df['COUNTER'] =1 #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)
OUTPUT:
2
How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??
– blissweb
Jan 6 '18 at 6:54
add a comment |
Inserting data into a pandas dataframe and providing column name.
import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df) #printing dataframe.
This is our printed data:
For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.
Like this:
df['COUNTER'] =1 #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)
OUTPUT:
2
How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??
– blissweb
Jan 6 '18 at 6:54
add a comment |
Inserting data into a pandas dataframe and providing column name.
import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df) #printing dataframe.
This is our printed data:
For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.
Like this:
df['COUNTER'] =1 #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)
OUTPUT:
Inserting data into a pandas dataframe and providing column name.
import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df) #printing dataframe.
This is our printed data:
For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER" in dataframe.
Like this:
df['COUNTER'] =1 #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)
OUTPUT:
edited Sep 20 '17 at 12:02
Vaibhav Mule
3,06112444
3,06112444
answered Jul 21 '16 at 11:53
The Gr8 AdakronThe Gr8 Adakron
588612
588612
2
How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??
– blissweb
Jan 6 '18 at 6:54
add a comment |
2
How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??
– blissweb
Jan 6 '18 at 6:54
2
2
How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??
– blissweb
Jan 6 '18 at 6:54
How can I get the Alphabet column (eg.A) to repeat below and not leave the gaps in the first column ??
– blissweb
Jan 6 '18 at 6:54
add a comment |
Idiomatic solution that uses only a single groupby
df.groupby(['col5', 'col2']).size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='col2')
col5 col2 count
0 3 A 3
1 1 D 3
2 5 B 2
6 3 C 1
Explanation
The result of the groupby size
method is a Series with col5
and col2
in the index. From here, you can use another groupby method to find the maximum value of each value in col2
but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2
with the drop_duplicates
method.
There is no param calledname
inreset_index()
in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…
– mmBs
Nov 25 '18 at 21:27
pandas.pydata.org/pandas-docs/stable/generated/…
– Ted Petrou
Nov 26 '18 at 11:20
Ok, my bad. I used it when working withDataFrame
notSeries
. Thanks for the link.
– mmBs
Nov 26 '18 at 11:42
add a comment |
Idiomatic solution that uses only a single groupby
df.groupby(['col5', 'col2']).size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='col2')
col5 col2 count
0 3 A 3
1 1 D 3
2 5 B 2
6 3 C 1
Explanation
The result of the groupby size
method is a Series with col5
and col2
in the index. From here, you can use another groupby method to find the maximum value of each value in col2
but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2
with the drop_duplicates
method.
There is no param calledname
inreset_index()
in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…
– mmBs
Nov 25 '18 at 21:27
pandas.pydata.org/pandas-docs/stable/generated/…
– Ted Petrou
Nov 26 '18 at 11:20
Ok, my bad. I used it when working withDataFrame
notSeries
. Thanks for the link.
– mmBs
Nov 26 '18 at 11:42
add a comment |
Idiomatic solution that uses only a single groupby
df.groupby(['col5', 'col2']).size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='col2')
col5 col2 count
0 3 A 3
1 1 D 3
2 5 B 2
6 3 C 1
Explanation
The result of the groupby size
method is a Series with col5
and col2
in the index. From here, you can use another groupby method to find the maximum value of each value in col2
but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2
with the drop_duplicates
method.
Idiomatic solution that uses only a single groupby
df.groupby(['col5', 'col2']).size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='col2')
col5 col2 count
0 3 A 3
1 1 D 3
2 5 B 2
6 3 C 1
Explanation
The result of the groupby size
method is a Series with col5
and col2
in the index. From here, you can use another groupby method to find the maximum value of each value in col2
but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2
with the drop_duplicates
method.
answered Nov 5 '17 at 19:37
Ted PetrouTed Petrou
22.6k87166
22.6k87166
There is no param calledname
inreset_index()
in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…
– mmBs
Nov 25 '18 at 21:27
pandas.pydata.org/pandas-docs/stable/generated/…
– Ted Petrou
Nov 26 '18 at 11:20
Ok, my bad. I used it when working withDataFrame
notSeries
. Thanks for the link.
– mmBs
Nov 26 '18 at 11:42
add a comment |
There is no param calledname
inreset_index()
in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…
– mmBs
Nov 25 '18 at 21:27
pandas.pydata.org/pandas-docs/stable/generated/…
– Ted Petrou
Nov 26 '18 at 11:20
Ok, my bad. I used it when working withDataFrame
notSeries
. Thanks for the link.
– mmBs
Nov 26 '18 at 11:42
There is no param called
name
in reset_index()
in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…– mmBs
Nov 25 '18 at 21:27
There is no param called
name
in reset_index()
in the current version of pandas: pandas.pydata.org/pandas-docs/stable/generated/…– mmBs
Nov 25 '18 at 21:27
pandas.pydata.org/pandas-docs/stable/generated/…
– Ted Petrou
Nov 26 '18 at 11:20
pandas.pydata.org/pandas-docs/stable/generated/…
– Ted Petrou
Nov 26 '18 at 11:20
Ok, my bad. I used it when working with
DataFrame
not Series
. Thanks for the link.– mmBs
Nov 26 '18 at 11:42
Ok, my bad. I used it when working with
DataFrame
not Series
. Thanks for the link.– mmBs
Nov 26 '18 at 11:42
add a comment |
Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:
df.count_column=df.groupby(['col5','col2']).col5.transform('count')
(I picked 'col5' as it contains no nan)
add a comment |
Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:
df.count_column=df.groupby(['col5','col2']).col5.transform('count')
(I picked 'col5' as it contains no nan)
add a comment |
Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:
df.count_column=df.groupby(['col5','col2']).col5.transform('count')
(I picked 'col5' as it contains no nan)
Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:
df.count_column=df.groupby(['col5','col2']).col5.transform('count')
(I picked 'col5' as it contains no nan)
answered Jun 6 '17 at 12:20
TomTom
525
525
add a comment |
add a comment |
You can just use the built-in function count follow by the groupby function
df.groupby(['col5','col2']).count()
add a comment |
You can just use the built-in function count follow by the groupby function
df.groupby(['col5','col2']).count()
add a comment |
You can just use the built-in function count follow by the groupby function
df.groupby(['col5','col2']).count()
You can just use the built-in function count follow by the groupby function
df.groupby(['col5','col2']).count()
answered Dec 2 '16 at 2:22
seansio1995seansio1995
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f17679089%2fpandas-dataframe-groupby-two-columns-and-get-counts%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
A very similar question just came up yesterday.. see here.
– bdiamante
Jul 16 '13 at 14:29
For an idiomatic solution that only uses a single
groupby
, see this answer below– Ted Petrou
Nov 5 '17 at 19:38
Note on performance, including alternatives: Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series
– jpp
Jun 25 '18 at 14:01