Data Tables: Creating New Column By Examining Multiple Columns On Multiple Rows

up vote
1
down vote

favorite

I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.

I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.

Below is an example of my desired output:

   id                  prevId originatorId createdAt

1: 11a                        11a          2018/1/12 

2: 11b                 11a    11a          2018/1/13

3: 11c                 11b    11a          2018/1/14

4: 12a                        12a          2018/1/12        

5: 12b                 12a    12a          2018/1/13

Any guidance would be appreciated, thanks!

Edit:

I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:

DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]

It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:

   id                  prevId originatorId createdAt

1: 10a                        10a          2018/1/12 

2: 11b                 11a    10a          2018/1/13

3: 11c                 11b    10a          2018/1/14

In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.

edited Nov 26 at 12:19

asked Nov 19 at 17:59

Michael

3017

2

Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20

This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19

Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20

add a comment |

up vote
1
down vote

favorite

I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.

Below is an example of my desired output:

   id                  prevId originatorId createdAt

1: 11a                        11a          2018/1/12 

2: 11b                 11a    11a          2018/1/13

3: 11c                 11b    11a          2018/1/14

4: 12a                        12a          2018/1/12        

5: 12b                 12a    12a          2018/1/13

Any guidance would be appreciated, thanks!

Edit:

DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]

It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:

   id                  prevId originatorId createdAt

1: 10a                        10a          2018/1/12 

2: 11b                 11a    10a          2018/1/13

3: 11c                 11b    10a          2018/1/14

In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.

edited Nov 26 at 12:19

asked Nov 19 at 17:59

Michael

3017

2

Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20

This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19

Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20

add a comment |

up vote
1
down vote

favorite

I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.

Below is an example of my desired output:

   id                  prevId originatorId createdAt

1: 11a                        11a          2018/1/12 

2: 11b                 11a    11a          2018/1/13

3: 11c                 11b    11a          2018/1/14

4: 12a                        12a          2018/1/12        

5: 12b                 12a    12a          2018/1/13

Any guidance would be appreciated, thanks!

Edit:

DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]

It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:

   id                  prevId originatorId createdAt

1: 10a                        10a          2018/1/12 

2: 11b                 11a    10a          2018/1/13

3: 11c                 11b    10a          2018/1/14

In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.

edited Nov 26 at 12:19

asked Nov 19 at 17:59

Michael

3017

I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.

Below is an example of my desired output:

   id                  prevId originatorId createdAt

1: 11a                        11a          2018/1/12 

2: 11b                 11a    11a          2018/1/13

3: 11c                 11b    11a          2018/1/14

4: 12a                        12a          2018/1/12        

5: 12b                 12a    12a          2018/1/13

Any guidance would be appreciated, thanks!

Edit:

DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]

It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:

   id                  prevId originatorId createdAt

1: 10a                        10a          2018/1/12 

2: 11b                 11a    10a          2018/1/13

3: 11c                 11b    10a          2018/1/14

In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.

r data.table

edited Nov 26 at 12:19

asked Nov 19 at 17:59

Michael

3017

edited Nov 26 at 12:19

asked Nov 19 at 17:59

Michael

3017

edited Nov 26 at 12:19

asked Nov 19 at 17:59

Michael

3017

asked Nov 19 at 17:59

Michael

3017

asked Nov 19 at 17:59

Michael

3017

2

Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20

This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19

Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20

add a comment |

2

Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20

This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19

Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20

Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20

This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19

Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

This is now recursive with two recursive steps, using dplyr and data.table functionality.

dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))



data.table(left_join(x = dt

       , y = dt[,.(prevId)]

       , by = c("id" = "prevId")) %>% left_join(

                                            y = dt[,.(id,prevId)]

                                            , by = c("prevId" = "id")

       ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]



>  id   prevId  originatorId

1: 11a   <NA>          11a

2: 11b    11a          11a

3: 11c    11b          11a

4: 12a   <NA>          12a

5: 12b    12a          12a

Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.

library(dplyr)

left_join(x = dt

          , y = dt[,.(prevId)]

          , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1}   %>% left_join(x = .

                                                                                 , y = dt[,.(Second.id = id, Second.prevId = prevId)]

                                                                                , by = c("prevId" = "Second.id")) %>%  data.table(.) %>% { . ->> dt.join.2}





dt.join.final.data <- rbindlist(list(  dt.join.1

                                       , dt.join.2)

                                , fill = TRUE

                                , idcol = "id"

                                , use.names = TRUE)

The resulting data.table looks then like this:

> dt.join.final.data

    id  id prevId Second.prevId

 1:  1 11a   <NA>          <NA>

 2:  1 11b    11a          <NA>

 3:  1 11c    11b          <NA>

 4:  1 12a   <NA>          <NA>

 5:  1 12b    12a          <NA>

 6:  2 11a   <NA>          <NA>

 7:  2 11b    11a          <NA>

 8:  2 11c    11b           11a

 9:  2 12a   <NA>          <NA>

10:  2 12b    12a          <NA>

edited Nov 26 at 12:46

answered Nov 20 at 14:31

hannes101

626515

Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58

Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20

Much appreciated hannes!
– Michael
Nov 26 at 12:32

Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57

Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday

add a comment |

up vote
0
down vote

I managed to solve this by defining a recursive function and applying it to all rows.

find_originatorId = function(current_row, df){

    if(current_row[["prevId"]] == ""){

        current_row[["id"]]

    } else {

        parent_row = df[df$id == current_row[["prevId"]],]

        find_originatorId(parent_row, df)

    }

}



df$originatorId = apply(df, 1, find_originatorId, df)

answered Nov 19 at 20:04

vectorson

193

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380257%2fdata-tables-creating-new-column-by-examining-multiple-columns-on-multiple-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

This is now recursive with two recursive steps, using dplyr and data.table functionality.

dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))



data.table(left_join(x = dt

       , y = dt[,.(prevId)]

       , by = c("id" = "prevId")) %>% left_join(

                                            y = dt[,.(id,prevId)]

                                            , by = c("prevId" = "id")

       ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]



>  id   prevId  originatorId

1: 11a   <NA>          11a

2: 11b    11a          11a

3: 11c    11b          11a

4: 12a   <NA>          12a

5: 12b    12a          12a

library(dplyr)

left_join(x = dt

          , y = dt[,.(prevId)]

          , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1}   %>% left_join(x = .

                                                                                 , y = dt[,.(Second.id = id, Second.prevId = prevId)]

                                                                                , by = c("prevId" = "Second.id")) %>%  data.table(.) %>% { . ->> dt.join.2}





dt.join.final.data <- rbindlist(list(  dt.join.1

                                       , dt.join.2)

                                , fill = TRUE

                                , idcol = "id"

                                , use.names = TRUE)

The resulting data.table looks then like this:

> dt.join.final.data

    id  id prevId Second.prevId

 1:  1 11a   <NA>          <NA>

 2:  1 11b    11a          <NA>

 3:  1 11c    11b          <NA>

 4:  1 12a   <NA>          <NA>

 5:  1 12b    12a          <NA>

 6:  2 11a   <NA>          <NA>

 7:  2 11b    11a          <NA>

 8:  2 11c    11b           11a

 9:  2 12a   <NA>          <NA>

10:  2 12b    12a          <NA>

edited Nov 26 at 12:46

answered Nov 20 at 14:31

hannes101

626515

Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58

Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20

Much appreciated hannes!
– Michael
Nov 26 at 12:32

Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57

Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday

add a comment |

up vote
0
down vote

accepted

This is now recursive with two recursive steps, using dplyr and data.table functionality.

dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))



data.table(left_join(x = dt

       , y = dt[,.(prevId)]

       , by = c("id" = "prevId")) %>% left_join(

                                            y = dt[,.(id,prevId)]

                                            , by = c("prevId" = "id")

       ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]



>  id   prevId  originatorId

1: 11a   <NA>          11a

2: 11b    11a          11a

3: 11c    11b          11a

4: 12a   <NA>          12a

5: 12b    12a          12a

library(dplyr)

left_join(x = dt

          , y = dt[,.(prevId)]

          , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1}   %>% left_join(x = .

                                                                                 , y = dt[,.(Second.id = id, Second.prevId = prevId)]

                                                                                , by = c("prevId" = "Second.id")) %>%  data.table(.) %>% { . ->> dt.join.2}





dt.join.final.data <- rbindlist(list(  dt.join.1

                                       , dt.join.2)

                                , fill = TRUE

                                , idcol = "id"

                                , use.names = TRUE)

The resulting data.table looks then like this:

> dt.join.final.data

    id  id prevId Second.prevId

 1:  1 11a   <NA>          <NA>

 2:  1 11b    11a          <NA>

 3:  1 11c    11b          <NA>

 4:  1 12a   <NA>          <NA>

 5:  1 12b    12a          <NA>

 6:  2 11a   <NA>          <NA>

 7:  2 11b    11a          <NA>

 8:  2 11c    11b           11a

 9:  2 12a   <NA>          <NA>

10:  2 12b    12a          <NA>

edited Nov 26 at 12:46

answered Nov 20 at 14:31

hannes101

626515

Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58

Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20

Much appreciated hannes!
– Michael
Nov 26 at 12:32

Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57

Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday

add a comment |

up vote
0
down vote

accepted

This is now recursive with two recursive steps, using dplyr and data.table functionality.

dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))



data.table(left_join(x = dt

       , y = dt[,.(prevId)]

       , by = c("id" = "prevId")) %>% left_join(

                                            y = dt[,.(id,prevId)]

                                            , by = c("prevId" = "id")

       ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]



>  id   prevId  originatorId

1: 11a   <NA>          11a

2: 11b    11a          11a

3: 11c    11b          11a

4: 12a   <NA>          12a

5: 12b    12a          12a

library(dplyr)

left_join(x = dt

          , y = dt[,.(prevId)]

          , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1}   %>% left_join(x = .

                                                                                 , y = dt[,.(Second.id = id, Second.prevId = prevId)]

                                                                                , by = c("prevId" = "Second.id")) %>%  data.table(.) %>% { . ->> dt.join.2}





dt.join.final.data <- rbindlist(list(  dt.join.1

                                       , dt.join.2)

                                , fill = TRUE

                                , idcol = "id"

                                , use.names = TRUE)

The resulting data.table looks then like this:

> dt.join.final.data

    id  id prevId Second.prevId

 1:  1 11a   <NA>          <NA>

 2:  1 11b    11a          <NA>

 3:  1 11c    11b          <NA>

 4:  1 12a   <NA>          <NA>

 5:  1 12b    12a          <NA>

 6:  2 11a   <NA>          <NA>

 7:  2 11b    11a          <NA>

 8:  2 11c    11b           11a

 9:  2 12a   <NA>          <NA>

10:  2 12b    12a          <NA>

edited Nov 26 at 12:46

answered Nov 20 at 14:31

hannes101

626515

This is now recursive with two recursive steps, using dplyr and data.table functionality.

dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA,  "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))



data.table(left_join(x = dt

       , y = dt[,.(prevId)]

       , by = c("id" = "prevId")) %>% left_join(

                                            y = dt[,.(id,prevId)]

                                            , by = c("prevId" = "id")

       ))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]



>  id   prevId  originatorId

1: 11a   <NA>          11a

2: 11b    11a          11a

3: 11c    11b          11a

4: 12a   <NA>          12a

5: 12b    12a          12a

library(dplyr)

left_join(x = dt

          , y = dt[,.(prevId)]

          , by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1}   %>% left_join(x = .

                                                                                 , y = dt[,.(Second.id = id, Second.prevId = prevId)]

                                                                                , by = c("prevId" = "Second.id")) %>%  data.table(.) %>% { . ->> dt.join.2}





dt.join.final.data <- rbindlist(list(  dt.join.1

                                       , dt.join.2)

                                , fill = TRUE

                                , idcol = "id"

                                , use.names = TRUE)

The resulting data.table looks then like this:

> dt.join.final.data

    id  id prevId Second.prevId

 1:  1 11a   <NA>          <NA>

 2:  1 11b    11a          <NA>

 3:  1 11c    11b          <NA>

 4:  1 12a   <NA>          <NA>

 5:  1 12b    12a          <NA>

 6:  2 11a   <NA>          <NA>

 7:  2 11b    11a          <NA>

 8:  2 11c    11b           11a

 9:  2 12a   <NA>          <NA>

10:  2 12b    12a          <NA>

edited Nov 26 at 12:46

answered Nov 20 at 14:31

hannes101

626515

edited Nov 26 at 12:46

answered Nov 20 at 14:31

hannes101

626515

answered Nov 20 at 14:31

hannes101

626515

answered Nov 20 at 14:31

hannes101

626515

Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58

Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20

Much appreciated hannes!
– Michael
Nov 26 at 12:32

Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57

Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday

add a comment |

Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58

Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20

Much appreciated hannes!
– Michael
Nov 26 at 12:32

Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57

Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday

Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58

Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20

Much appreciated hannes!
– Michael
Nov 26 at 12:32

Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57

Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday

add a comment |

up vote
0
down vote

I managed to solve this by defining a recursive function and applying it to all rows.

find_originatorId = function(current_row, df){

    if(current_row[["prevId"]] == ""){

        current_row[["id"]]

    } else {

        parent_row = df[df$id == current_row[["prevId"]],]

        find_originatorId(parent_row, df)

    }

}



df$originatorId = apply(df, 1, find_originatorId, df)

answered Nov 19 at 20:04

vectorson

193

add a comment |

up vote
0
down vote

I managed to solve this by defining a recursive function and applying it to all rows.

find_originatorId = function(current_row, df){

    if(current_row[["prevId"]] == ""){

        current_row[["id"]]

    } else {

        parent_row = df[df$id == current_row[["prevId"]],]

        find_originatorId(parent_row, df)

    }

}



df$originatorId = apply(df, 1, find_originatorId, df)

answered Nov 19 at 20:04

vectorson

193

add a comment |

up vote
0
down vote

I managed to solve this by defining a recursive function and applying it to all rows.

find_originatorId = function(current_row, df){

    if(current_row[["prevId"]] == ""){

        current_row[["id"]]

    } else {

        parent_row = df[df$id == current_row[["prevId"]],]

        find_originatorId(parent_row, df)

    }

}



df$originatorId = apply(df, 1, find_originatorId, df)

answered Nov 19 at 20:04

vectorson

193

I managed to solve this by defining a recursive function and applying it to all rows.

find_originatorId = function(current_row, df){

    if(current_row[["prevId"]] == ""){

        current_row[["id"]]

    } else {

        parent_row = df[df$id == current_row[["prevId"]],]

        find_originatorId(parent_row, df)

    }

}



df$originatorId = apply(df, 1, find_originatorId, df)

answered Nov 19 at 20:04

vectorson

193

answered Nov 19 at 20:04

vectorson

193

answered Nov 19 at 20:04

vectorson

193

answered Nov 19 at 20:04

vectorson

193

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

lVLsuPzfGBoqLwD,cnCYQmTjo

搜尋此網誌

Tukukkk