filtering specific value with aggregate function in R












-1















enter image description here



Hi,



I would like to filter the largest datetime values for each customer by the first three digits of mccmnc.



As you can see in the picture, customer == 'abghsd' has two different mccmnc values '53208' and '53210'. The first three digits of mccmnc, however, are the same (532). So I want to filter customer abghsd's maximum datetime value with mccmnc = '532'. For customer = 'abbaedl', I need to filter the maximum datetime for mccmnc = '623' and mccmnc = '451'.



So may I ask how to give conditions for this problem?
With the query below, I was able to filter datetime by customer and mccmnc, but I want to filter mccmnc's first three digits.



processed <- aggregate(datetime ~ customer + mccmnc, data =raw_data3, max)


This is the result that I want to get:



Customer       datetime mccmnc
abghsd 20181123222022 53210
abbaedl 20181226121213 62330
abbaedl 20181227191919 45123


Thank you.










share|improve this question




















  • 1





    please include data so we can help you. Use the dput(data) command and include the output in your questions

    – RAB
    Nov 25 '18 at 11:47











  • What is the algorithm for this? Just grab first three values from mccmnc?

    – Roman Luštrik
    Nov 25 '18 at 12:02











  • You seem to have serious trouble with your underlying data structure. Is, e.g., mccmnc always exactly five digits long?

    – Roman
    Nov 25 '18 at 12:46











  • @RomanLuštrik yeah I need to grab first three values from mccmnc

    – HJ LEE
    Nov 25 '18 at 13:05
















-1















enter image description here



Hi,



I would like to filter the largest datetime values for each customer by the first three digits of mccmnc.



As you can see in the picture, customer == 'abghsd' has two different mccmnc values '53208' and '53210'. The first three digits of mccmnc, however, are the same (532). So I want to filter customer abghsd's maximum datetime value with mccmnc = '532'. For customer = 'abbaedl', I need to filter the maximum datetime for mccmnc = '623' and mccmnc = '451'.



So may I ask how to give conditions for this problem?
With the query below, I was able to filter datetime by customer and mccmnc, but I want to filter mccmnc's first three digits.



processed <- aggregate(datetime ~ customer + mccmnc, data =raw_data3, max)


This is the result that I want to get:



Customer       datetime mccmnc
abghsd 20181123222022 53210
abbaedl 20181226121213 62330
abbaedl 20181227191919 45123


Thank you.










share|improve this question




















  • 1





    please include data so we can help you. Use the dput(data) command and include the output in your questions

    – RAB
    Nov 25 '18 at 11:47











  • What is the algorithm for this? Just grab first three values from mccmnc?

    – Roman Luštrik
    Nov 25 '18 at 12:02











  • You seem to have serious trouble with your underlying data structure. Is, e.g., mccmnc always exactly five digits long?

    – Roman
    Nov 25 '18 at 12:46











  • @RomanLuštrik yeah I need to grab first three values from mccmnc

    – HJ LEE
    Nov 25 '18 at 13:05














-1












-1








-1








enter image description here



Hi,



I would like to filter the largest datetime values for each customer by the first three digits of mccmnc.



As you can see in the picture, customer == 'abghsd' has two different mccmnc values '53208' and '53210'. The first three digits of mccmnc, however, are the same (532). So I want to filter customer abghsd's maximum datetime value with mccmnc = '532'. For customer = 'abbaedl', I need to filter the maximum datetime for mccmnc = '623' and mccmnc = '451'.



So may I ask how to give conditions for this problem?
With the query below, I was able to filter datetime by customer and mccmnc, but I want to filter mccmnc's first three digits.



processed <- aggregate(datetime ~ customer + mccmnc, data =raw_data3, max)


This is the result that I want to get:



Customer       datetime mccmnc
abghsd 20181123222022 53210
abbaedl 20181226121213 62330
abbaedl 20181227191919 45123


Thank you.










share|improve this question
















enter image description here



Hi,



I would like to filter the largest datetime values for each customer by the first three digits of mccmnc.



As you can see in the picture, customer == 'abghsd' has two different mccmnc values '53208' and '53210'. The first three digits of mccmnc, however, are the same (532). So I want to filter customer abghsd's maximum datetime value with mccmnc = '532'. For customer = 'abbaedl', I need to filter the maximum datetime for mccmnc = '623' and mccmnc = '451'.



So may I ask how to give conditions for this problem?
With the query below, I was able to filter datetime by customer and mccmnc, but I want to filter mccmnc's first three digits.



processed <- aggregate(datetime ~ customer + mccmnc, data =raw_data3, max)


This is the result that I want to get:



Customer       datetime mccmnc
abghsd 20181123222022 53210
abbaedl 20181226121213 62330
abbaedl 20181227191919 45123


Thank you.







r filter dplyr rstudio aggregate






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 '18 at 21:15









Roman

2,0891531




2,0891531










asked Nov 25 '18 at 11:33









HJ LEEHJ LEE

1




1








  • 1





    please include data so we can help you. Use the dput(data) command and include the output in your questions

    – RAB
    Nov 25 '18 at 11:47











  • What is the algorithm for this? Just grab first three values from mccmnc?

    – Roman Luštrik
    Nov 25 '18 at 12:02











  • You seem to have serious trouble with your underlying data structure. Is, e.g., mccmnc always exactly five digits long?

    – Roman
    Nov 25 '18 at 12:46











  • @RomanLuštrik yeah I need to grab first three values from mccmnc

    – HJ LEE
    Nov 25 '18 at 13:05














  • 1





    please include data so we can help you. Use the dput(data) command and include the output in your questions

    – RAB
    Nov 25 '18 at 11:47











  • What is the algorithm for this? Just grab first three values from mccmnc?

    – Roman Luštrik
    Nov 25 '18 at 12:02











  • You seem to have serious trouble with your underlying data structure. Is, e.g., mccmnc always exactly five digits long?

    – Roman
    Nov 25 '18 at 12:46











  • @RomanLuštrik yeah I need to grab first three values from mccmnc

    – HJ LEE
    Nov 25 '18 at 13:05








1




1





please include data so we can help you. Use the dput(data) command and include the output in your questions

– RAB
Nov 25 '18 at 11:47





please include data so we can help you. Use the dput(data) command and include the output in your questions

– RAB
Nov 25 '18 at 11:47













What is the algorithm for this? Just grab first three values from mccmnc?

– Roman Luštrik
Nov 25 '18 at 12:02





What is the algorithm for this? Just grab first three values from mccmnc?

– Roman Luštrik
Nov 25 '18 at 12:02













You seem to have serious trouble with your underlying data structure. Is, e.g., mccmnc always exactly five digits long?

– Roman
Nov 25 '18 at 12:46





You seem to have serious trouble with your underlying data structure. Is, e.g., mccmnc always exactly five digits long?

– Roman
Nov 25 '18 at 12:46













@RomanLuštrik yeah I need to grab first three values from mccmnc

– HJ LEE
Nov 25 '18 at 13:05





@RomanLuštrik yeah I need to grab first three values from mccmnc

– HJ LEE
Nov 25 '18 at 13:05












1 Answer
1






active

oldest

votes


















1















Editing your original code, you can just add substr():



processed <- aggregate(datetime ~ customer + substr(mccmnc, 1, 3), data = raw_data3, max)


Alternatively, a tidyverse solution:

Code



library(tidyverse)
df %>%
# Group by customer ID and first 3 characters of mccmnc
group_by(customer, mccmnc_group = substr(mccmnc, 1, 3)) %>%
# Get the max datetime per group
summarise(max_datetime = max(datetime)) %>%
# Put columns in original order
select(1, 3, 2)

# A tibble: 3 x 3
# Groups: customer [2]
customer max_datetime mccmnc_group
<fct> <dbl> <chr>
1 John Package 20181201 532
2 Miranda Nuts 20181227 451
3 Miranda Nuts 20181226 623


Data



df <- data.frame(customer = c(rep("John Package", 3), rep("Miranda Nuts", 4)),
datetime = c(20181123, 20181201, 20181124, 20181125, 20181226, 20181226, 20181227),
mccmnc = c("532-08", "532-08", "532-10", "623-12", "623-30", "451-21", "451-23"))

> df
customer datetime mccmnc
1 John Package 20181123 532-08
2 John Package 20181201 532-08
3 John Package 20181124 532-10
4 Miranda Nuts 20181125 623-12
5 Miranda Nuts 20181226 623-30
6 Miranda Nuts 20181226 451-21
7 Miranda Nuts 20181227 451-23





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467009%2ffiltering-specific-value-with-aggregate-function-in-r%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1















    Editing your original code, you can just add substr():



    processed <- aggregate(datetime ~ customer + substr(mccmnc, 1, 3), data = raw_data3, max)


    Alternatively, a tidyverse solution:

    Code



    library(tidyverse)
    df %>%
    # Group by customer ID and first 3 characters of mccmnc
    group_by(customer, mccmnc_group = substr(mccmnc, 1, 3)) %>%
    # Get the max datetime per group
    summarise(max_datetime = max(datetime)) %>%
    # Put columns in original order
    select(1, 3, 2)

    # A tibble: 3 x 3
    # Groups: customer [2]
    customer max_datetime mccmnc_group
    <fct> <dbl> <chr>
    1 John Package 20181201 532
    2 Miranda Nuts 20181227 451
    3 Miranda Nuts 20181226 623


    Data



    df <- data.frame(customer = c(rep("John Package", 3), rep("Miranda Nuts", 4)),
    datetime = c(20181123, 20181201, 20181124, 20181125, 20181226, 20181226, 20181227),
    mccmnc = c("532-08", "532-08", "532-10", "623-12", "623-30", "451-21", "451-23"))

    > df
    customer datetime mccmnc
    1 John Package 20181123 532-08
    2 John Package 20181201 532-08
    3 John Package 20181124 532-10
    4 Miranda Nuts 20181125 623-12
    5 Miranda Nuts 20181226 623-30
    6 Miranda Nuts 20181226 451-21
    7 Miranda Nuts 20181227 451-23





    share|improve this answer






























      1















      Editing your original code, you can just add substr():



      processed <- aggregate(datetime ~ customer + substr(mccmnc, 1, 3), data = raw_data3, max)


      Alternatively, a tidyverse solution:

      Code



      library(tidyverse)
      df %>%
      # Group by customer ID and first 3 characters of mccmnc
      group_by(customer, mccmnc_group = substr(mccmnc, 1, 3)) %>%
      # Get the max datetime per group
      summarise(max_datetime = max(datetime)) %>%
      # Put columns in original order
      select(1, 3, 2)

      # A tibble: 3 x 3
      # Groups: customer [2]
      customer max_datetime mccmnc_group
      <fct> <dbl> <chr>
      1 John Package 20181201 532
      2 Miranda Nuts 20181227 451
      3 Miranda Nuts 20181226 623


      Data



      df <- data.frame(customer = c(rep("John Package", 3), rep("Miranda Nuts", 4)),
      datetime = c(20181123, 20181201, 20181124, 20181125, 20181226, 20181226, 20181227),
      mccmnc = c("532-08", "532-08", "532-10", "623-12", "623-30", "451-21", "451-23"))

      > df
      customer datetime mccmnc
      1 John Package 20181123 532-08
      2 John Package 20181201 532-08
      3 John Package 20181124 532-10
      4 Miranda Nuts 20181125 623-12
      5 Miranda Nuts 20181226 623-30
      6 Miranda Nuts 20181226 451-21
      7 Miranda Nuts 20181227 451-23





      share|improve this answer




























        1












        1








        1








        Editing your original code, you can just add substr():



        processed <- aggregate(datetime ~ customer + substr(mccmnc, 1, 3), data = raw_data3, max)


        Alternatively, a tidyverse solution:

        Code



        library(tidyverse)
        df %>%
        # Group by customer ID and first 3 characters of mccmnc
        group_by(customer, mccmnc_group = substr(mccmnc, 1, 3)) %>%
        # Get the max datetime per group
        summarise(max_datetime = max(datetime)) %>%
        # Put columns in original order
        select(1, 3, 2)

        # A tibble: 3 x 3
        # Groups: customer [2]
        customer max_datetime mccmnc_group
        <fct> <dbl> <chr>
        1 John Package 20181201 532
        2 Miranda Nuts 20181227 451
        3 Miranda Nuts 20181226 623


        Data



        df <- data.frame(customer = c(rep("John Package", 3), rep("Miranda Nuts", 4)),
        datetime = c(20181123, 20181201, 20181124, 20181125, 20181226, 20181226, 20181227),
        mccmnc = c("532-08", "532-08", "532-10", "623-12", "623-30", "451-21", "451-23"))

        > df
        customer datetime mccmnc
        1 John Package 20181123 532-08
        2 John Package 20181201 532-08
        3 John Package 20181124 532-10
        4 Miranda Nuts 20181125 623-12
        5 Miranda Nuts 20181226 623-30
        6 Miranda Nuts 20181226 451-21
        7 Miranda Nuts 20181227 451-23





        share|improve this answer
















        Editing your original code, you can just add substr():



        processed <- aggregate(datetime ~ customer + substr(mccmnc, 1, 3), data = raw_data3, max)


        Alternatively, a tidyverse solution:

        Code



        library(tidyverse)
        df %>%
        # Group by customer ID and first 3 characters of mccmnc
        group_by(customer, mccmnc_group = substr(mccmnc, 1, 3)) %>%
        # Get the max datetime per group
        summarise(max_datetime = max(datetime)) %>%
        # Put columns in original order
        select(1, 3, 2)

        # A tibble: 3 x 3
        # Groups: customer [2]
        customer max_datetime mccmnc_group
        <fct> <dbl> <chr>
        1 John Package 20181201 532
        2 Miranda Nuts 20181227 451
        3 Miranda Nuts 20181226 623


        Data



        df <- data.frame(customer = c(rep("John Package", 3), rep("Miranda Nuts", 4)),
        datetime = c(20181123, 20181201, 20181124, 20181125, 20181226, 20181226, 20181227),
        mccmnc = c("532-08", "532-08", "532-10", "623-12", "623-30", "451-21", "451-23"))

        > df
        customer datetime mccmnc
        1 John Package 20181123 532-08
        2 John Package 20181201 532-08
        3 John Package 20181124 532-10
        4 Miranda Nuts 20181125 623-12
        5 Miranda Nuts 20181226 623-30
        6 Miranda Nuts 20181226 451-21
        7 Miranda Nuts 20181227 451-23






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 25 '18 at 13:08

























        answered Nov 25 '18 at 12:56









        RomanRoman

        2,0891531




        2,0891531
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467009%2ffiltering-specific-value-with-aggregate-function-in-r%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            Refactoring coordinates for Minecraft Pi buildings written in Python