Adding a grouped, aggregate nunique column to pandas dataframe

I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.

my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.

something like this isn't working:

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()

nor is

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)

this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)

in R this is easily done in data.table with

df[, n_unique_id := uniqueN(id), by = c('track', 'type')]

thanks!

edited Nov 22 '18 at 11:52

jpp

95.1k2157108

asked May 1 '17 at 21:14

wbarts

162

Can paste a copy of your source dataframe and what the expected output should look like?

– Scott Boston
May 1 '17 at 21:27

add a comment |

something like this isn't working:

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()

nor is

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)

this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)

in R this is easily done in data.table with

df[, n_unique_id := uniqueN(id), by = c('track', 'type')]

thanks!

edited Nov 22 '18 at 11:52

jpp

95.1k2157108

asked May 1 '17 at 21:14

wbarts

162

Can paste a copy of your source dataframe and what the expected output should look like?

– Scott Boston
May 1 '17 at 21:27

add a comment |

something like this isn't working:

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()

nor is

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)

this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)

in R this is easily done in data.table with

df[, n_unique_id := uniqueN(id), by = c('track', 'type')]

thanks!

edited Nov 22 '18 at 11:52

jpp

95.1k2157108

asked May 1 '17 at 21:14

wbarts

162

something like this isn't working:

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()

nor is

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)

this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)

in R this is easily done in data.table with

df[, n_unique_id := uniqueN(id), by = c('track', 'type')]

thanks!

python pandas dataframe aggregate pandas-groupby

edited Nov 22 '18 at 11:52

jpp

95.1k2157108

asked May 1 '17 at 21:14

wbarts

162

edited Nov 22 '18 at 11:52

jpp

95.1k2157108

asked May 1 '17 at 21:14

wbarts

162

edited Nov 22 '18 at 11:52

jpp

95.1k2157108

edited Nov 22 '18 at 11:52

jpp

95.1k2157108

edited Nov 22 '18 at 11:52

jpp

95.1k2157108

asked May 1 '17 at 21:14

wbarts

162

asked May 1 '17 at 21:14

wbarts

162

asked May 1 '17 at 21:14

wbarts

162

Can paste a copy of your source dataframe and what the expected output should look like?

– Scott Boston
May 1 '17 at 21:27

add a comment |

Can paste a copy of your source dataframe and what the expected output should look like?

– Scott Boston
May 1 '17 at 21:27

Can paste a copy of your source dataframe and what the expected output should look like?

– Scott Boston
May 1 '17 at 21:27

add a comment |

1 Answer
1

active

oldest

votes

df.groupby(['track', 'type'])['id'].transform(nunique)

Implies that there is a name nunique in the name space that performs some function. transform will take a function or a string that it knows a function for. nunique is definitely one of those strings.

As pointed out by @root, often the method that pandas will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True even for passing numpy functions in some cases.

For example transform('sum') should be preferred over transform(sum).

Try this instead

df.groupby(['track', 'type'])['id'].transform('nunique')

demo

df = pd.DataFrame(dict(

    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))

print(df)



  id track type

0  X     1    A

1  X     1    A

2  Y     1    A

3  Z     1    A

4  W     2    B

5  W     2    B

6  W     2    B

7  W     2    B



df.groupby(['track', 'type'])['id'].transform('nunique')



0    3

1    3

2    3

3    3

4    1

5    1

6    1

7    1

Name: id, dtype: int64

edited May 1 '17 at 21:31

answered May 1 '17 at 21:27

piRSquared

153k22144287

4

I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g. .transform(sum) uses python sum, whereas .transfrom('sum') would use numpy and be more computationally efficient.

– root
May 1 '17 at 21:31

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f43726631%2fadding-a-grouped-aggregate-nunique-column-to-pandas-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

df.groupby(['track', 'type'])['id'].transform(nunique)

For example transform('sum') should be preferred over transform(sum).

Try this instead

df.groupby(['track', 'type'])['id'].transform('nunique')

demo

df = pd.DataFrame(dict(

    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))

print(df)



  id track type

0  X     1    A

1  X     1    A

2  Y     1    A

3  Z     1    A

4  W     2    B

5  W     2    B

6  W     2    B

7  W     2    B



df.groupby(['track', 'type'])['id'].transform('nunique')



0    3

1    3

2    3

3    3

4    1

5    1

6    1

7    1

Name: id, dtype: int64

edited May 1 '17 at 21:31

answered May 1 '17 at 21:27

piRSquared

153k22144287

4

I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g. .transform(sum) uses python sum, whereas .transfrom('sum') would use numpy and be more computationally efficient.

– root
May 1 '17 at 21:31

add a comment |

df.groupby(['track', 'type'])['id'].transform(nunique)

For example transform('sum') should be preferred over transform(sum).

Try this instead

df.groupby(['track', 'type'])['id'].transform('nunique')

demo

df = pd.DataFrame(dict(

    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))

print(df)



  id track type

0  X     1    A

1  X     1    A

2  Y     1    A

3  Z     1    A

4  W     2    B

5  W     2    B

6  W     2    B

7  W     2    B



df.groupby(['track', 'type'])['id'].transform('nunique')



0    3

1    3

2    3

3    3

4    1

5    1

6    1

7    1

Name: id, dtype: int64

edited May 1 '17 at 21:31

answered May 1 '17 at 21:27

piRSquared

153k22144287

4

I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g. .transform(sum) uses python sum, whereas .transfrom('sum') would use numpy and be more computationally efficient.

– root
May 1 '17 at 21:31

add a comment |

df.groupby(['track', 'type'])['id'].transform(nunique)

For example transform('sum') should be preferred over transform(sum).

Try this instead

df.groupby(['track', 'type'])['id'].transform('nunique')

demo

df = pd.DataFrame(dict(

    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))

print(df)



  id track type

0  X     1    A

1  X     1    A

2  Y     1    A

3  Z     1    A

4  W     2    B

5  W     2    B

6  W     2    B

7  W     2    B



df.groupby(['track', 'type'])['id'].transform('nunique')



0    3

1    3

2    3

3    3

4    1

5    1

6    1

7    1

Name: id, dtype: int64

edited May 1 '17 at 21:31

answered May 1 '17 at 21:27

piRSquared

153k22144287

df.groupby(['track', 'type'])['id'].transform(nunique)

For example transform('sum') should be preferred over transform(sum).

Try this instead

df.groupby(['track', 'type'])['id'].transform('nunique')

demo

df = pd.DataFrame(dict(

    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))

print(df)



  id track type

0  X     1    A

1  X     1    A

2  Y     1    A

3  Z     1    A

4  W     2    B

5  W     2    B

6  W     2    B

7  W     2    B



df.groupby(['track', 'type'])['id'].transform('nunique')



0    3

1    3

2    3

3    3

4    1

5    1

6    1

7    1

Name: id, dtype: int64

edited May 1 '17 at 21:31

answered May 1 '17 at 21:27

piRSquared

153k22144287

edited May 1 '17 at 21:31

answered May 1 '17 at 21:27

piRSquared

153k22144287

answered May 1 '17 at 21:27

piRSquared

153k22144287

answered May 1 '17 at 21:27

piRSquared

153k22144287

4

I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g. .transform(sum) uses python sum, whereas .transfrom('sum') would use numpy and be more computationally efficient.

– root
May 1 '17 at 21:31

add a comment |

4

I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g. .transform(sum) uses python sum, whereas .transfrom('sum') would use numpy and be more computationally efficient.

– root
May 1 '17 at 21:31

I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g. .transform(sum) uses python sum, whereas .transfrom('sum') would use numpy and be more computationally efficient.

– root
May 1 '17 at 21:31

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk