Data Tables: Creating New Column By Examining Multiple Columns On Multiple Rows











up vote
1
down vote

favorite
1












I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.



I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.



Below is an example of my desired output:



   id                  prevId originatorId createdAt
1: 11a 11a 2018/1/12
2: 11b 11a 11a 2018/1/13
3: 11c 11b 11a 2018/1/14
4: 12a 12a 2018/1/12
5: 12b 12a 12a 2018/1/13


Any guidance would be appreciated, thanks!



Edit:



I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:



DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]


It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:



   id                  prevId originatorId createdAt
1: 10a 10a 2018/1/12
2: 11b 11a 10a 2018/1/13
3: 11c 11b 10a 2018/1/14


In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.










share|improve this question




















  • 2




    Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
    – chinsoon12
    Nov 19 at 22:20










  • This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
    – Michael
    Nov 26 at 12:19










  • Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
    – chinsoon12
    Nov 26 at 13:20

















up vote
1
down vote

favorite
1












I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.



I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.



Below is an example of my desired output:



   id                  prevId originatorId createdAt
1: 11a 11a 2018/1/12
2: 11b 11a 11a 2018/1/13
3: 11c 11b 11a 2018/1/14
4: 12a 12a 2018/1/12
5: 12b 12a 12a 2018/1/13


Any guidance would be appreciated, thanks!



Edit:



I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:



DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]


It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:



   id                  prevId originatorId createdAt
1: 10a 10a 2018/1/12
2: 11b 11a 10a 2018/1/13
3: 11c 11b 10a 2018/1/14


In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.










share|improve this question




















  • 2




    Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
    – chinsoon12
    Nov 19 at 22:20










  • This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
    – Michael
    Nov 26 at 12:19










  • Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
    – chinsoon12
    Nov 26 at 13:20















up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.



I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.



Below is an example of my desired output:



   id                  prevId originatorId createdAt
1: 11a 11a 2018/1/12
2: 11b 11a 11a 2018/1/13
3: 11c 11b 11a 2018/1/14
4: 12a 12a 2018/1/12
5: 12b 12a 12a 2018/1/13


Any guidance would be appreciated, thanks!



Edit:



I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:



DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]


It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:



   id                  prevId originatorId createdAt
1: 10a 10a 2018/1/12
2: 11b 11a 10a 2018/1/13
3: 11c 11b 10a 2018/1/14


In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.










share|improve this question















I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.



I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.



Below is an example of my desired output:



   id                  prevId originatorId createdAt
1: 11a 11a 2018/1/12
2: 11b 11a 11a 2018/1/13
3: 11c 11b 11a 2018/1/14
4: 12a 12a 2018/1/12
5: 12b 12a 12a 2018/1/13


Any guidance would be appreciated, thanks!



Edit:



I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:



DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]


It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:



   id                  prevId originatorId createdAt
1: 10a 10a 2018/1/12
2: 11b 11a 10a 2018/1/13
3: 11c 11b 10a 2018/1/14


In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.







r data.table






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 at 12:19

























asked Nov 19 at 17:59









Michael

3017




3017








  • 2




    Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
    – chinsoon12
    Nov 19 at 22:20










  • This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
    – Michael
    Nov 26 at 12:19










  • Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
    – chinsoon12
    Nov 26 at 13:20
















  • 2




    Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
    – chinsoon12
    Nov 19 at 22:20










  • This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
    – Michael
    Nov 26 at 12:19










  • Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
    – chinsoon12
    Nov 26 at 13:20










2




2




Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20




Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20












This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19




This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19












Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20






Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20














2 Answers
2






active

oldest

votes

















up vote
0
down vote



accepted










This is now recursive with two recursive steps, using dplyr and data.table functionality.



dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))

data.table(left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% left_join(
y = dt[,.(id,prevId)]
, by = c("prevId" = "id")
))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]

> id prevId originatorId
1: 11a <NA> 11a
2: 11b 11a 11a
3: 11c 11b 11a
4: 12a <NA> 12a
5: 12b 12a 12a


Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.



library(dplyr)
left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
, y = dt[,.(Second.id = id, Second.prevId = prevId)]
, by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}


dt.join.final.data <- rbindlist(list( dt.join.1
, dt.join.2)
, fill = TRUE
, idcol = "id"
, use.names = TRUE)


The resulting data.table looks then like this:



> dt.join.final.data
id id prevId Second.prevId
1: 1 11a <NA> <NA>
2: 1 11b 11a <NA>
3: 1 11c 11b <NA>
4: 1 12a <NA> <NA>
5: 1 12b 12a <NA>
6: 2 11a <NA> <NA>
7: 2 11b 11a <NA>
8: 2 11c 11b 11a
9: 2 12a <NA> <NA>
10: 2 12b 12a <NA>





share|improve this answer























  • Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
    – Michael
    Nov 26 at 11:58










  • Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
    – hannes101
    Nov 26 at 12:20










  • Much appreciated hannes!
    – Michael
    Nov 26 at 12:32










  • Updated the answer, please check if that's helping you.
    – hannes101
    Nov 26 at 12:57










  • Thanks Hannes! I made a recursive variant of your solution and am making great progress.
    – Michael
    yesterday


















up vote
0
down vote













I managed to solve this by defining a recursive function and applying it to all rows.



find_originatorId = function(current_row, df){
if(current_row[["prevId"]] == ""){
current_row[["id"]]
} else {
parent_row = df[df$id == current_row[["prevId"]],]
find_originatorId(parent_row, df)
}
}

df$originatorId = apply(df, 1, find_originatorId, df)





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380257%2fdata-tables-creating-new-column-by-examining-multiple-columns-on-multiple-rows%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote



    accepted










    This is now recursive with two recursive steps, using dplyr and data.table functionality.



    dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))

    data.table(left_join(x = dt
    , y = dt[,.(prevId)]
    , by = c("id" = "prevId")) %>% left_join(
    y = dt[,.(id,prevId)]
    , by = c("prevId" = "id")
    ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]

    > id prevId originatorId
    1: 11a <NA> 11a
    2: 11b 11a 11a
    3: 11c 11b 11a
    4: 12a <NA> 12a
    5: 12b 12a 12a


    Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.



    library(dplyr)
    left_join(x = dt
    , y = dt[,.(prevId)]
    , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
    , y = dt[,.(Second.id = id, Second.prevId = prevId)]
    , by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}


    dt.join.final.data <- rbindlist(list( dt.join.1
    , dt.join.2)
    , fill = TRUE
    , idcol = "id"
    , use.names = TRUE)


    The resulting data.table looks then like this:



    > dt.join.final.data
    id id prevId Second.prevId
    1: 1 11a <NA> <NA>
    2: 1 11b 11a <NA>
    3: 1 11c 11b <NA>
    4: 1 12a <NA> <NA>
    5: 1 12b 12a <NA>
    6: 2 11a <NA> <NA>
    7: 2 11b 11a <NA>
    8: 2 11c 11b 11a
    9: 2 12a <NA> <NA>
    10: 2 12b 12a <NA>





    share|improve this answer























    • Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
      – Michael
      Nov 26 at 11:58










    • Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
      – hannes101
      Nov 26 at 12:20










    • Much appreciated hannes!
      – Michael
      Nov 26 at 12:32










    • Updated the answer, please check if that's helping you.
      – hannes101
      Nov 26 at 12:57










    • Thanks Hannes! I made a recursive variant of your solution and am making great progress.
      – Michael
      yesterday















    up vote
    0
    down vote



    accepted










    This is now recursive with two recursive steps, using dplyr and data.table functionality.



    dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))

    data.table(left_join(x = dt
    , y = dt[,.(prevId)]
    , by = c("id" = "prevId")) %>% left_join(
    y = dt[,.(id,prevId)]
    , by = c("prevId" = "id")
    ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]

    > id prevId originatorId
    1: 11a <NA> 11a
    2: 11b 11a 11a
    3: 11c 11b 11a
    4: 12a <NA> 12a
    5: 12b 12a 12a


    Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.



    library(dplyr)
    left_join(x = dt
    , y = dt[,.(prevId)]
    , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
    , y = dt[,.(Second.id = id, Second.prevId = prevId)]
    , by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}


    dt.join.final.data <- rbindlist(list( dt.join.1
    , dt.join.2)
    , fill = TRUE
    , idcol = "id"
    , use.names = TRUE)


    The resulting data.table looks then like this:



    > dt.join.final.data
    id id prevId Second.prevId
    1: 1 11a <NA> <NA>
    2: 1 11b 11a <NA>
    3: 1 11c 11b <NA>
    4: 1 12a <NA> <NA>
    5: 1 12b 12a <NA>
    6: 2 11a <NA> <NA>
    7: 2 11b 11a <NA>
    8: 2 11c 11b 11a
    9: 2 12a <NA> <NA>
    10: 2 12b 12a <NA>





    share|improve this answer























    • Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
      – Michael
      Nov 26 at 11:58










    • Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
      – hannes101
      Nov 26 at 12:20










    • Much appreciated hannes!
      – Michael
      Nov 26 at 12:32










    • Updated the answer, please check if that's helping you.
      – hannes101
      Nov 26 at 12:57










    • Thanks Hannes! I made a recursive variant of your solution and am making great progress.
      – Michael
      yesterday













    up vote
    0
    down vote



    accepted







    up vote
    0
    down vote



    accepted






    This is now recursive with two recursive steps, using dplyr and data.table functionality.



    dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))

    data.table(left_join(x = dt
    , y = dt[,.(prevId)]
    , by = c("id" = "prevId")) %>% left_join(
    y = dt[,.(id,prevId)]
    , by = c("prevId" = "id")
    ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]

    > id prevId originatorId
    1: 11a <NA> 11a
    2: 11b 11a 11a
    3: 11c 11b 11a
    4: 12a <NA> 12a
    5: 12b 12a 12a


    Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.



    library(dplyr)
    left_join(x = dt
    , y = dt[,.(prevId)]
    , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
    , y = dt[,.(Second.id = id, Second.prevId = prevId)]
    , by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}


    dt.join.final.data <- rbindlist(list( dt.join.1
    , dt.join.2)
    , fill = TRUE
    , idcol = "id"
    , use.names = TRUE)


    The resulting data.table looks then like this:



    > dt.join.final.data
    id id prevId Second.prevId
    1: 1 11a <NA> <NA>
    2: 1 11b 11a <NA>
    3: 1 11c 11b <NA>
    4: 1 12a <NA> <NA>
    5: 1 12b 12a <NA>
    6: 2 11a <NA> <NA>
    7: 2 11b 11a <NA>
    8: 2 11c 11b 11a
    9: 2 12a <NA> <NA>
    10: 2 12b 12a <NA>





    share|improve this answer














    This is now recursive with two recursive steps, using dplyr and data.table functionality.



    dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))

    data.table(left_join(x = dt
    , y = dt[,.(prevId)]
    , by = c("id" = "prevId")) %>% left_join(
    y = dt[,.(id,prevId)]
    , by = c("prevId" = "id")
    ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]

    > id prevId originatorId
    1: 11a <NA> 11a
    2: 11b 11a 11a
    3: 11c 11b 11a
    4: 12a <NA> 12a
    5: 12b 12a 12a


    Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.



    library(dplyr)
    left_join(x = dt
    , y = dt[,.(prevId)]
    , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
    , y = dt[,.(Second.id = id, Second.prevId = prevId)]
    , by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}


    dt.join.final.data <- rbindlist(list( dt.join.1
    , dt.join.2)
    , fill = TRUE
    , idcol = "id"
    , use.names = TRUE)


    The resulting data.table looks then like this:



    > dt.join.final.data
    id id prevId Second.prevId
    1: 1 11a <NA> <NA>
    2: 1 11b 11a <NA>
    3: 1 11c 11b <NA>
    4: 1 12a <NA> <NA>
    5: 1 12b 12a <NA>
    6: 2 11a <NA> <NA>
    7: 2 11b 11a <NA>
    8: 2 11c 11b 11a
    9: 2 12a <NA> <NA>
    10: 2 12b 12a <NA>






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 26 at 12:46

























    answered Nov 20 at 14:31









    hannes101

    626515




    626515












    • Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
      – Michael
      Nov 26 at 11:58










    • Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
      – hannes101
      Nov 26 at 12:20










    • Much appreciated hannes!
      – Michael
      Nov 26 at 12:32










    • Updated the answer, please check if that's helping you.
      – hannes101
      Nov 26 at 12:57










    • Thanks Hannes! I made a recursive variant of your solution and am making great progress.
      – Michael
      yesterday


















    • Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
      – Michael
      Nov 26 at 11:58










    • Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
      – hannes101
      Nov 26 at 12:20










    • Much appreciated hannes!
      – Michael
      Nov 26 at 12:32










    • Updated the answer, please check if that's helping you.
      – hannes101
      Nov 26 at 12:57










    • Thanks Hannes! I made a recursive variant of your solution and am making great progress.
      – Michael
      yesterday
















    Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
    – Michael
    Nov 26 at 11:58




    Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
    – Michael
    Nov 26 at 11:58












    Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
    – hannes101
    Nov 26 at 12:20




    Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
    – hannes101
    Nov 26 at 12:20












    Much appreciated hannes!
    – Michael
    Nov 26 at 12:32




    Much appreciated hannes!
    – Michael
    Nov 26 at 12:32












    Updated the answer, please check if that's helping you.
    – hannes101
    Nov 26 at 12:57




    Updated the answer, please check if that's helping you.
    – hannes101
    Nov 26 at 12:57












    Thanks Hannes! I made a recursive variant of your solution and am making great progress.
    – Michael
    yesterday




    Thanks Hannes! I made a recursive variant of your solution and am making great progress.
    – Michael
    yesterday












    up vote
    0
    down vote













    I managed to solve this by defining a recursive function and applying it to all rows.



    find_originatorId = function(current_row, df){
    if(current_row[["prevId"]] == ""){
    current_row[["id"]]
    } else {
    parent_row = df[df$id == current_row[["prevId"]],]
    find_originatorId(parent_row, df)
    }
    }

    df$originatorId = apply(df, 1, find_originatorId, df)





    share|improve this answer

























      up vote
      0
      down vote













      I managed to solve this by defining a recursive function and applying it to all rows.



      find_originatorId = function(current_row, df){
      if(current_row[["prevId"]] == ""){
      current_row[["id"]]
      } else {
      parent_row = df[df$id == current_row[["prevId"]],]
      find_originatorId(parent_row, df)
      }
      }

      df$originatorId = apply(df, 1, find_originatorId, df)





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        I managed to solve this by defining a recursive function and applying it to all rows.



        find_originatorId = function(current_row, df){
        if(current_row[["prevId"]] == ""){
        current_row[["id"]]
        } else {
        parent_row = df[df$id == current_row[["prevId"]],]
        find_originatorId(parent_row, df)
        }
        }

        df$originatorId = apply(df, 1, find_originatorId, df)





        share|improve this answer












        I managed to solve this by defining a recursive function and applying it to all rows.



        find_originatorId = function(current_row, df){
        if(current_row[["prevId"]] == ""){
        current_row[["id"]]
        } else {
        parent_row = df[df$id == current_row[["prevId"]],]
        find_originatorId(parent_row, df)
        }
        }

        df$originatorId = apply(df, 1, find_originatorId, df)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 19 at 20:04









        vectorson

        193




        193






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380257%2fdata-tables-creating-new-column-by-examining-multiple-columns-on-multiple-rows%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            TypeError: fit_transform() missing 1 required positional argument: 'X'