Retrieve all rows for specific text after grouping by application and user id












3














When a user completes a step digitally column is_digitally_signed changes to YES.
What I am trying to do: If any step is completed digitally I want to retrieve all rows for same application_id and user_id. Please check beneath my desired output.



R code to replicate my dataset



df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3), 
user_id = c(123,123,123,456,456,456,789,789,789),
application_status = c("incomplete", "details_verified", "complete"),
date = c("01/01/2018", "02/01/2018", "03/01/2018"),
is_digitally_signed = c("NULL", "NULL", "YES", "NULL", "NULL", "NULL", "NULL", "YES", "NULL")) %>%
mutate(date = as.Date(date, "%d/%m/%Y"))


With an output



df
application_id user_id application_status date is_digitally_signed
1 123 incomplete 2018-01-01 NULL
1 123 details_verified 2018-01-02 NULL
1 123 complete 2018-01-03 YES
2 456 incomplete 2018-01-01 NULL
2 456 details_verified 2018-01-02 NULL
2 456 complete 2018-01-03 NULL
3 789 incomplete 2018-01-01 NULL
3 789 details_verified 2018-01-02 YES
3 789 complete 2018-01-03 NULL


My (unsuccessful) effort



df %>% group_by(application_id,user_id) %>% filter_all(all.vars(. == "YES"))


Desired outcome



application_id user_id application_status       date is_digitally_signed
1 123 incomplete 2018-01-01 NULL
1 123 details_verified 2018-01-02 NULL
1 123 complete 2018-01-03 YES
3 789 incomplete 2018-01-01 NULL
3 789 details_verified 2018-01-02 YES
3 789 complete 2018-01-03 NULL









share|improve this question



























    3














    When a user completes a step digitally column is_digitally_signed changes to YES.
    What I am trying to do: If any step is completed digitally I want to retrieve all rows for same application_id and user_id. Please check beneath my desired output.



    R code to replicate my dataset



    df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3), 
    user_id = c(123,123,123,456,456,456,789,789,789),
    application_status = c("incomplete", "details_verified", "complete"),
    date = c("01/01/2018", "02/01/2018", "03/01/2018"),
    is_digitally_signed = c("NULL", "NULL", "YES", "NULL", "NULL", "NULL", "NULL", "YES", "NULL")) %>%
    mutate(date = as.Date(date, "%d/%m/%Y"))


    With an output



    df
    application_id user_id application_status date is_digitally_signed
    1 123 incomplete 2018-01-01 NULL
    1 123 details_verified 2018-01-02 NULL
    1 123 complete 2018-01-03 YES
    2 456 incomplete 2018-01-01 NULL
    2 456 details_verified 2018-01-02 NULL
    2 456 complete 2018-01-03 NULL
    3 789 incomplete 2018-01-01 NULL
    3 789 details_verified 2018-01-02 YES
    3 789 complete 2018-01-03 NULL


    My (unsuccessful) effort



    df %>% group_by(application_id,user_id) %>% filter_all(all.vars(. == "YES"))


    Desired outcome



    application_id user_id application_status       date is_digitally_signed
    1 123 incomplete 2018-01-01 NULL
    1 123 details_verified 2018-01-02 NULL
    1 123 complete 2018-01-03 YES
    3 789 incomplete 2018-01-01 NULL
    3 789 details_verified 2018-01-02 YES
    3 789 complete 2018-01-03 NULL









    share|improve this question

























      3












      3








      3







      When a user completes a step digitally column is_digitally_signed changes to YES.
      What I am trying to do: If any step is completed digitally I want to retrieve all rows for same application_id and user_id. Please check beneath my desired output.



      R code to replicate my dataset



      df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3), 
      user_id = c(123,123,123,456,456,456,789,789,789),
      application_status = c("incomplete", "details_verified", "complete"),
      date = c("01/01/2018", "02/01/2018", "03/01/2018"),
      is_digitally_signed = c("NULL", "NULL", "YES", "NULL", "NULL", "NULL", "NULL", "YES", "NULL")) %>%
      mutate(date = as.Date(date, "%d/%m/%Y"))


      With an output



      df
      application_id user_id application_status date is_digitally_signed
      1 123 incomplete 2018-01-01 NULL
      1 123 details_verified 2018-01-02 NULL
      1 123 complete 2018-01-03 YES
      2 456 incomplete 2018-01-01 NULL
      2 456 details_verified 2018-01-02 NULL
      2 456 complete 2018-01-03 NULL
      3 789 incomplete 2018-01-01 NULL
      3 789 details_verified 2018-01-02 YES
      3 789 complete 2018-01-03 NULL


      My (unsuccessful) effort



      df %>% group_by(application_id,user_id) %>% filter_all(all.vars(. == "YES"))


      Desired outcome



      application_id user_id application_status       date is_digitally_signed
      1 123 incomplete 2018-01-01 NULL
      1 123 details_verified 2018-01-02 NULL
      1 123 complete 2018-01-03 YES
      3 789 incomplete 2018-01-01 NULL
      3 789 details_verified 2018-01-02 YES
      3 789 complete 2018-01-03 NULL









      share|improve this question













      When a user completes a step digitally column is_digitally_signed changes to YES.
      What I am trying to do: If any step is completed digitally I want to retrieve all rows for same application_id and user_id. Please check beneath my desired output.



      R code to replicate my dataset



      df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3), 
      user_id = c(123,123,123,456,456,456,789,789,789),
      application_status = c("incomplete", "details_verified", "complete"),
      date = c("01/01/2018", "02/01/2018", "03/01/2018"),
      is_digitally_signed = c("NULL", "NULL", "YES", "NULL", "NULL", "NULL", "NULL", "YES", "NULL")) %>%
      mutate(date = as.Date(date, "%d/%m/%Y"))


      With an output



      df
      application_id user_id application_status date is_digitally_signed
      1 123 incomplete 2018-01-01 NULL
      1 123 details_verified 2018-01-02 NULL
      1 123 complete 2018-01-03 YES
      2 456 incomplete 2018-01-01 NULL
      2 456 details_verified 2018-01-02 NULL
      2 456 complete 2018-01-03 NULL
      3 789 incomplete 2018-01-01 NULL
      3 789 details_verified 2018-01-02 YES
      3 789 complete 2018-01-03 NULL


      My (unsuccessful) effort



      df %>% group_by(application_id,user_id) %>% filter_all(all.vars(. == "YES"))


      Desired outcome



      application_id user_id application_status       date is_digitally_signed
      1 123 incomplete 2018-01-01 NULL
      1 123 details_verified 2018-01-02 NULL
      1 123 complete 2018-01-03 YES
      3 789 incomplete 2018-01-01 NULL
      3 789 details_verified 2018-01-02 YES
      3 789 complete 2018-01-03 NULL






      r dplyr data-manipulation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 '18 at 16:40









      Greconomist

      130111




      130111
























          2 Answers
          2






          active

          oldest

          votes


















          3














          dplyr



          We can use filter with any, which checks for a given group whether there is at least one record with is_digitally_signed == 'YES':



          library(dplyr)

          df %>%
          group_by(application_id, user_id) %>%
          filter(any(is_digitally_signed == "YES"))


          or use the all function to subset groups where not all is_digitally_signed == "NULL":



          df %>% 
          group_by(application_id, user_id) %>%
          filter(!all(is_digitally_signed == "NULL"))


          data.table



          We can also use data.table since you've already loaded your data as a DT:



          library(data.table)
          dt = setDT(df)
          dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]


          or with .SD:



          dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]


          Output:



          # A tibble: 6 x 5
          # Groups: application_id, user_id [2]
          application_id user_id application_status date is_digitally_signed
          <dbl> <dbl> <fct> <date> <fct>
          1 1 123 incomplete 2018-01-01 NULL
          2 1 123 details_verified 2018-01-02 NULL
          3 1 123 complete 2018-01-03 YES
          4 3 789 incomplete 2018-01-01 NULL
          5 3 789 details_verified 2018-01-02 YES
          6 3 789 complete 2018-01-03 NULL





          share|improve this answer































            3














            As there is only a single column to test, we can simply use filter with any



            library(dplyr)
            df %>%
            group_by(application_id,user_id) %>%
            filter(any(is_digitally_signed == "YES"))
            # A tibble: 6 x 5
            # Groups: application_id, user_id [2]
            # application_id user_id application_status date is_digitally_signed
            # <dbl> <dbl> <chr> <date> <chr>
            #1 1 123 incomplete 2018-01-01 NULL
            #2 1 123 details_verified 2018-01-02 NULL
            #3 1 123 complete 2018-01-03 YES
            #4 3 789 incomplete 2018-01-01 NULL
            #5 3 789 details_verified 2018-01-02 YES
            #6 3 789 complete 2018-01-03 NULL




            Or another option is using %in% to return a single TRUE/FALSE output that gets recycled



            df %>% 
            group_by(application_id,user_id) %>%
            filter("YES" %in% is_digitally_signed)




            Or we can use base R



            df[with(df, ave(is_digitally_signed == "YES", application_id,user_id, FUN = any)),]





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416759%2fretrieve-all-rows-for-specific-text-after-grouping-by-application-and-user-id%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              dplyr



              We can use filter with any, which checks for a given group whether there is at least one record with is_digitally_signed == 'YES':



              library(dplyr)

              df %>%
              group_by(application_id, user_id) %>%
              filter(any(is_digitally_signed == "YES"))


              or use the all function to subset groups where not all is_digitally_signed == "NULL":



              df %>% 
              group_by(application_id, user_id) %>%
              filter(!all(is_digitally_signed == "NULL"))


              data.table



              We can also use data.table since you've already loaded your data as a DT:



              library(data.table)
              dt = setDT(df)
              dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]


              or with .SD:



              dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]


              Output:



              # A tibble: 6 x 5
              # Groups: application_id, user_id [2]
              application_id user_id application_status date is_digitally_signed
              <dbl> <dbl> <fct> <date> <fct>
              1 1 123 incomplete 2018-01-01 NULL
              2 1 123 details_verified 2018-01-02 NULL
              3 1 123 complete 2018-01-03 YES
              4 3 789 incomplete 2018-01-01 NULL
              5 3 789 details_verified 2018-01-02 YES
              6 3 789 complete 2018-01-03 NULL





              share|improve this answer




























                3














                dplyr



                We can use filter with any, which checks for a given group whether there is at least one record with is_digitally_signed == 'YES':



                library(dplyr)

                df %>%
                group_by(application_id, user_id) %>%
                filter(any(is_digitally_signed == "YES"))


                or use the all function to subset groups where not all is_digitally_signed == "NULL":



                df %>% 
                group_by(application_id, user_id) %>%
                filter(!all(is_digitally_signed == "NULL"))


                data.table



                We can also use data.table since you've already loaded your data as a DT:



                library(data.table)
                dt = setDT(df)
                dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]


                or with .SD:



                dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]


                Output:



                # A tibble: 6 x 5
                # Groups: application_id, user_id [2]
                application_id user_id application_status date is_digitally_signed
                <dbl> <dbl> <fct> <date> <fct>
                1 1 123 incomplete 2018-01-01 NULL
                2 1 123 details_verified 2018-01-02 NULL
                3 1 123 complete 2018-01-03 YES
                4 3 789 incomplete 2018-01-01 NULL
                5 3 789 details_verified 2018-01-02 YES
                6 3 789 complete 2018-01-03 NULL





                share|improve this answer


























                  3












                  3








                  3






                  dplyr



                  We can use filter with any, which checks for a given group whether there is at least one record with is_digitally_signed == 'YES':



                  library(dplyr)

                  df %>%
                  group_by(application_id, user_id) %>%
                  filter(any(is_digitally_signed == "YES"))


                  or use the all function to subset groups where not all is_digitally_signed == "NULL":



                  df %>% 
                  group_by(application_id, user_id) %>%
                  filter(!all(is_digitally_signed == "NULL"))


                  data.table



                  We can also use data.table since you've already loaded your data as a DT:



                  library(data.table)
                  dt = setDT(df)
                  dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]


                  or with .SD:



                  dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]


                  Output:



                  # A tibble: 6 x 5
                  # Groups: application_id, user_id [2]
                  application_id user_id application_status date is_digitally_signed
                  <dbl> <dbl> <fct> <date> <fct>
                  1 1 123 incomplete 2018-01-01 NULL
                  2 1 123 details_verified 2018-01-02 NULL
                  3 1 123 complete 2018-01-03 YES
                  4 3 789 incomplete 2018-01-01 NULL
                  5 3 789 details_verified 2018-01-02 YES
                  6 3 789 complete 2018-01-03 NULL





                  share|improve this answer














                  dplyr



                  We can use filter with any, which checks for a given group whether there is at least one record with is_digitally_signed == 'YES':



                  library(dplyr)

                  df %>%
                  group_by(application_id, user_id) %>%
                  filter(any(is_digitally_signed == "YES"))


                  or use the all function to subset groups where not all is_digitally_signed == "NULL":



                  df %>% 
                  group_by(application_id, user_id) %>%
                  filter(!all(is_digitally_signed == "NULL"))


                  data.table



                  We can also use data.table since you've already loaded your data as a DT:



                  library(data.table)
                  dt = setDT(df)
                  dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]


                  or with .SD:



                  dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]


                  Output:



                  # A tibble: 6 x 5
                  # Groups: application_id, user_id [2]
                  application_id user_id application_status date is_digitally_signed
                  <dbl> <dbl> <fct> <date> <fct>
                  1 1 123 incomplete 2018-01-01 NULL
                  2 1 123 details_verified 2018-01-02 NULL
                  3 1 123 complete 2018-01-03 YES
                  4 3 789 incomplete 2018-01-01 NULL
                  5 3 789 details_verified 2018-01-02 YES
                  6 3 789 complete 2018-01-03 NULL






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 21 '18 at 17:09

























                  answered Nov 21 '18 at 16:43









                  avid_useR

                  11.8k41830




                  11.8k41830

























                      3














                      As there is only a single column to test, we can simply use filter with any



                      library(dplyr)
                      df %>%
                      group_by(application_id,user_id) %>%
                      filter(any(is_digitally_signed == "YES"))
                      # A tibble: 6 x 5
                      # Groups: application_id, user_id [2]
                      # application_id user_id application_status date is_digitally_signed
                      # <dbl> <dbl> <chr> <date> <chr>
                      #1 1 123 incomplete 2018-01-01 NULL
                      #2 1 123 details_verified 2018-01-02 NULL
                      #3 1 123 complete 2018-01-03 YES
                      #4 3 789 incomplete 2018-01-01 NULL
                      #5 3 789 details_verified 2018-01-02 YES
                      #6 3 789 complete 2018-01-03 NULL




                      Or another option is using %in% to return a single TRUE/FALSE output that gets recycled



                      df %>% 
                      group_by(application_id,user_id) %>%
                      filter("YES" %in% is_digitally_signed)




                      Or we can use base R



                      df[with(df, ave(is_digitally_signed == "YES", application_id,user_id, FUN = any)),]





                      share|improve this answer




























                        3














                        As there is only a single column to test, we can simply use filter with any



                        library(dplyr)
                        df %>%
                        group_by(application_id,user_id) %>%
                        filter(any(is_digitally_signed == "YES"))
                        # A tibble: 6 x 5
                        # Groups: application_id, user_id [2]
                        # application_id user_id application_status date is_digitally_signed
                        # <dbl> <dbl> <chr> <date> <chr>
                        #1 1 123 incomplete 2018-01-01 NULL
                        #2 1 123 details_verified 2018-01-02 NULL
                        #3 1 123 complete 2018-01-03 YES
                        #4 3 789 incomplete 2018-01-01 NULL
                        #5 3 789 details_verified 2018-01-02 YES
                        #6 3 789 complete 2018-01-03 NULL




                        Or another option is using %in% to return a single TRUE/FALSE output that gets recycled



                        df %>% 
                        group_by(application_id,user_id) %>%
                        filter("YES" %in% is_digitally_signed)




                        Or we can use base R



                        df[with(df, ave(is_digitally_signed == "YES", application_id,user_id, FUN = any)),]





                        share|improve this answer


























                          3












                          3








                          3






                          As there is only a single column to test, we can simply use filter with any



                          library(dplyr)
                          df %>%
                          group_by(application_id,user_id) %>%
                          filter(any(is_digitally_signed == "YES"))
                          # A tibble: 6 x 5
                          # Groups: application_id, user_id [2]
                          # application_id user_id application_status date is_digitally_signed
                          # <dbl> <dbl> <chr> <date> <chr>
                          #1 1 123 incomplete 2018-01-01 NULL
                          #2 1 123 details_verified 2018-01-02 NULL
                          #3 1 123 complete 2018-01-03 YES
                          #4 3 789 incomplete 2018-01-01 NULL
                          #5 3 789 details_verified 2018-01-02 YES
                          #6 3 789 complete 2018-01-03 NULL




                          Or another option is using %in% to return a single TRUE/FALSE output that gets recycled



                          df %>% 
                          group_by(application_id,user_id) %>%
                          filter("YES" %in% is_digitally_signed)




                          Or we can use base R



                          df[with(df, ave(is_digitally_signed == "YES", application_id,user_id, FUN = any)),]





                          share|improve this answer














                          As there is only a single column to test, we can simply use filter with any



                          library(dplyr)
                          df %>%
                          group_by(application_id,user_id) %>%
                          filter(any(is_digitally_signed == "YES"))
                          # A tibble: 6 x 5
                          # Groups: application_id, user_id [2]
                          # application_id user_id application_status date is_digitally_signed
                          # <dbl> <dbl> <chr> <date> <chr>
                          #1 1 123 incomplete 2018-01-01 NULL
                          #2 1 123 details_verified 2018-01-02 NULL
                          #3 1 123 complete 2018-01-03 YES
                          #4 3 789 incomplete 2018-01-01 NULL
                          #5 3 789 details_verified 2018-01-02 YES
                          #6 3 789 complete 2018-01-03 NULL




                          Or another option is using %in% to return a single TRUE/FALSE output that gets recycled



                          df %>% 
                          group_by(application_id,user_id) %>%
                          filter("YES" %in% is_digitally_signed)




                          Or we can use base R



                          df[with(df, ave(is_digitally_signed == "YES", application_id,user_id, FUN = any)),]






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 21 '18 at 17:31

























                          answered Nov 21 '18 at 16:43









                          akrun

                          398k13187260




                          398k13187260






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416759%2fretrieve-all-rows-for-specific-text-after-grouping-by-application-and-user-id%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              404 Error Contact Form 7 ajax form submitting

                              How to know if a Active Directory user can login interactively

                              Refactoring coordinates for Minecraft Pi buildings written in Python