Data Tables: Creating New Column By Examining Multiple Columns On Multiple Rows
up vote
1
down vote
favorite
I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.
I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.
Below is an example of my desired output:
id prevId originatorId createdAt
1: 11a 11a 2018/1/12
2: 11b 11a 11a 2018/1/13
3: 11c 11b 11a 2018/1/14
4: 12a 12a 2018/1/12
5: 12b 12a 12a 2018/1/13
Any guidance would be appreciated, thanks!
Edit:
I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:
DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:
id prevId originatorId createdAt
1: 10a 10a 2018/1/12
2: 11b 11a 10a 2018/1/13
3: 11c 11b 10a 2018/1/14
In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.
r data.table
add a comment |
up vote
1
down vote
favorite
I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.
I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.
Below is an example of my desired output:
id prevId originatorId createdAt
1: 11a 11a 2018/1/12
2: 11b 11a 11a 2018/1/13
3: 11c 11b 11a 2018/1/14
4: 12a 12a 2018/1/12
5: 12b 12a 12a 2018/1/13
Any guidance would be appreciated, thanks!
Edit:
I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:
DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:
id prevId originatorId createdAt
1: 10a 10a 2018/1/12
2: 11b 11a 10a 2018/1/13
3: 11c 11b 10a 2018/1/14
In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.
r data.table
2
Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20
This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19
Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.
I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.
Below is an example of my desired output:
id prevId originatorId createdAt
1: 11a 11a 2018/1/12
2: 11b 11a 11a 2018/1/13
3: 11c 11b 11a 2018/1/14
4: 12a 12a 2018/1/12
5: 12b 12a 12a 2018/1/13
Any guidance would be appreciated, thanks!
Edit:
I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:
DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:
id prevId originatorId createdAt
1: 10a 10a 2018/1/12
2: 11b 11a 10a 2018/1/13
3: 11c 11b 10a 2018/1/14
In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.
r data.table
I presently have a data table containing two columns - 'id' and 'prevId'. I would like to create a chain of ids by starting with a first row, finding a second row whose 'id' equals the first row's 'prevId', and repeating this process until 'prevId' is blank.
I would then like to identify the 'id' in the end of the chain without a 'prevId', and create a new column listing this originator 'id' for every row.
Below is an example of my desired output:
id prevId originatorId createdAt
1: 11a 11a 2018/1/12
2: 11b 11a 11a 2018/1/13
3: 11c 11b 11a 2018/1/14
4: 12a 12a 2018/1/12
5: 12b 12a 12a 2018/1/13
Any guidance would be appreciated, thanks!
Edit:
I have noticed a caveat while testing some of the proposed solutions. There may be instances where an element has a 'prevId' that is not seen as an 'id' in any other element. If I were to use to run chinsoon's solution:
DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
It would cause such elements to list the closest peer without a 'prevId' as its originator, even if that peer was not part of the id-prevId chain. An example:
id prevId originatorId createdAt
1: 10a 10a 2018/1/12
2: 11b 11a 10a 2018/1/13
3: 11c 11b 10a 2018/1/14
In this instance, '11a' should be the originator for the 2nd and 3rd elements, but because it isn't present, '10a' gets the spot instead. Is there a tweak that could fix this? Much appreciated.
r data.table
r data.table
edited Nov 26 at 12:19
asked Nov 19 at 17:59
Michael
3017
3017
2
Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20
This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19
Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20
add a comment |
2
Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20
This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19
Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20
2
2
Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20
Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20
This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19
This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19
Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20
Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
This is now recursive with two recursive steps, using dplyr
and data.table
functionality.
dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA, "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))
data.table(left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% left_join(
y = dt[,.(id,prevId)]
, by = c("prevId" = "id")
))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]
> id prevId originatorId
1: 11a <NA> 11a
2: 11b 11a 11a
3: 11c 11b 11a
4: 12a <NA> 12a
5: 12b 12a 12a
Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.
library(dplyr)
left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
, y = dt[,.(Second.id = id, Second.prevId = prevId)]
, by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}
dt.join.final.data <- rbindlist(list( dt.join.1
, dt.join.2)
, fill = TRUE
, idcol = "id"
, use.names = TRUE)
The resulting data.table looks then like this:
> dt.join.final.data
id id prevId Second.prevId
1: 1 11a <NA> <NA>
2: 1 11b 11a <NA>
3: 1 11c 11b <NA>
4: 1 12a <NA> <NA>
5: 1 12b 12a <NA>
6: 2 11a <NA> <NA>
7: 2 11b 11a <NA>
8: 2 11c 11b 11a
9: 2 12a <NA> <NA>
10: 2 12b 12a <NA>
Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58
Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20
Much appreciated hannes!
– Michael
Nov 26 at 12:32
Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57
Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday
add a comment |
up vote
0
down vote
I managed to solve this by defining a recursive function and applying it to all rows.
find_originatorId = function(current_row, df){
if(current_row[["prevId"]] == ""){
current_row[["id"]]
} else {
parent_row = df[df$id == current_row[["prevId"]],]
find_originatorId(parent_row, df)
}
}
df$originatorId = apply(df, 1, find_originatorId, df)
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
This is now recursive with two recursive steps, using dplyr
and data.table
functionality.
dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA, "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))
data.table(left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% left_join(
y = dt[,.(id,prevId)]
, by = c("prevId" = "id")
))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]
> id prevId originatorId
1: 11a <NA> 11a
2: 11b 11a 11a
3: 11c 11b 11a
4: 12a <NA> 12a
5: 12b 12a 12a
Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.
library(dplyr)
left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
, y = dt[,.(Second.id = id, Second.prevId = prevId)]
, by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}
dt.join.final.data <- rbindlist(list( dt.join.1
, dt.join.2)
, fill = TRUE
, idcol = "id"
, use.names = TRUE)
The resulting data.table looks then like this:
> dt.join.final.data
id id prevId Second.prevId
1: 1 11a <NA> <NA>
2: 1 11b 11a <NA>
3: 1 11c 11b <NA>
4: 1 12a <NA> <NA>
5: 1 12b 12a <NA>
6: 2 11a <NA> <NA>
7: 2 11b 11a <NA>
8: 2 11c 11b 11a
9: 2 12a <NA> <NA>
10: 2 12b 12a <NA>
Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58
Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20
Much appreciated hannes!
– Michael
Nov 26 at 12:32
Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57
Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday
add a comment |
up vote
0
down vote
accepted
This is now recursive with two recursive steps, using dplyr
and data.table
functionality.
dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA, "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))
data.table(left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% left_join(
y = dt[,.(id,prevId)]
, by = c("prevId" = "id")
))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]
> id prevId originatorId
1: 11a <NA> 11a
2: 11b 11a 11a
3: 11c 11b 11a
4: 12a <NA> 12a
5: 12b 12a 12a
Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.
library(dplyr)
left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
, y = dt[,.(Second.id = id, Second.prevId = prevId)]
, by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}
dt.join.final.data <- rbindlist(list( dt.join.1
, dt.join.2)
, fill = TRUE
, idcol = "id"
, use.names = TRUE)
The resulting data.table looks then like this:
> dt.join.final.data
id id prevId Second.prevId
1: 1 11a <NA> <NA>
2: 1 11b 11a <NA>
3: 1 11c 11b <NA>
4: 1 12a <NA> <NA>
5: 1 12b 12a <NA>
6: 2 11a <NA> <NA>
7: 2 11b 11a <NA>
8: 2 11c 11b 11a
9: 2 12a <NA> <NA>
10: 2 12b 12a <NA>
Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58
Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20
Much appreciated hannes!
– Michael
Nov 26 at 12:32
Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57
Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
This is now recursive with two recursive steps, using dplyr
and data.table
functionality.
dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA, "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))
data.table(left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% left_join(
y = dt[,.(id,prevId)]
, by = c("prevId" = "id")
))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]
> id prevId originatorId
1: 11a <NA> 11a
2: 11b 11a 11a
3: 11c 11b 11a
4: 12a <NA> 12a
5: 12b 12a 12a
Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.
library(dplyr)
left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
, y = dt[,.(Second.id = id, Second.prevId = prevId)]
, by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}
dt.join.final.data <- rbindlist(list( dt.join.1
, dt.join.2)
, fill = TRUE
, idcol = "id"
, use.names = TRUE)
The resulting data.table looks then like this:
> dt.join.final.data
id id prevId Second.prevId
1: 1 11a <NA> <NA>
2: 1 11b 11a <NA>
3: 1 11c 11b <NA>
4: 1 12a <NA> <NA>
5: 1 12b 12a <NA>
6: 2 11a <NA> <NA>
7: 2 11b 11a <NA>
8: 2 11c 11b 11a
9: 2 12a <NA> <NA>
10: 2 12b 12a <NA>
This is now recursive with two recursive steps, using dplyr
and data.table
functionality.
dt <- structure(list(id = c("11a", "11b", "11c", "12a", "12b"), prevId = c(NA, "11a", "11b", NA, "12a")), row.names = c(NA, -5L), class = c("data.table", "data.frame"))
data.table(left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% left_join(
y = dt[,.(id,prevId)]
, by = c("prevId" = "id")
))[, .(id, prevId, originatorId = ifelse(is.na(prevId.y), ifelse(is.na(prevId), id, prevId), prevId.y ))]
> id prevId originatorId
1: 11a <NA> 11a
2: 11b 11a 11a
3: 11c 11b 11a
4: 12a <NA> 12a
5: 12b 12a 12a
Expanded the example to incorporate the comment by @Michael. It is pretty scalable and allows to adjust the number of recursive steps, by adding additional joins into the pipe. It saves the resulting joined data.table after each iteration and thus allows to follow the matching steps pretty easily. Finally, the results of each join are combined and the resulting table should offer a good overview over the chain of ids in the data.
library(dplyr)
left_join(x = dt
, y = dt[,.(prevId)]
, by = c("id" = "prevId")) %>% data.table(.) %>% { . ->> dt.join.1} %>% left_join(x = .
, y = dt[,.(Second.id = id, Second.prevId = prevId)]
, by = c("prevId" = "Second.id")) %>% data.table(.) %>% { . ->> dt.join.2}
dt.join.final.data <- rbindlist(list( dt.join.1
, dt.join.2)
, fill = TRUE
, idcol = "id"
, use.names = TRUE)
The resulting data.table looks then like this:
> dt.join.final.data
id id prevId Second.prevId
1: 1 11a <NA> <NA>
2: 1 11b 11a <NA>
3: 1 11c 11b <NA>
4: 1 12a <NA> <NA>
5: 1 12b 12a <NA>
6: 2 11a <NA> <NA>
7: 2 11b 11a <NA>
8: 2 11c 11b 11a
9: 2 12a <NA> <NA>
10: 2 12b 12a <NA>
edited Nov 26 at 12:46
answered Nov 20 at 14:31
hannes101
626515
626515
Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58
Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20
Much appreciated hannes!
– Michael
Nov 26 at 12:32
Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57
Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday
add a comment |
Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58
Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20
Much appreciated hannes!
– Michael
Nov 26 at 12:32
Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57
Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday
Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58
Great logic hannes! I apologise for not specifying in the question that I was looking for a solution that could scale the number of ids in each chain. I tested your solution and it works perfectly with those chain lengths but breaks once the lengths differ. Would you happen to know a fix?
– Michael
Nov 26 at 11:58
Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20
Since, I was also looking for the exact same problem. I created solution which is easily scalable to more than one recursive steps. I will try to adapt it to your MWE and will update my answer accordingly.
– hannes101
Nov 26 at 12:20
Much appreciated hannes!
– Michael
Nov 26 at 12:32
Much appreciated hannes!
– Michael
Nov 26 at 12:32
Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57
Updated the answer, please check if that's helping you.
– hannes101
Nov 26 at 12:57
Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday
Thanks Hannes! I made a recursive variant of your solution and am making great progress.
– Michael
yesterday
add a comment |
up vote
0
down vote
I managed to solve this by defining a recursive function and applying it to all rows.
find_originatorId = function(current_row, df){
if(current_row[["prevId"]] == ""){
current_row[["id"]]
} else {
parent_row = df[df$id == current_row[["prevId"]],]
find_originatorId(parent_row, df)
}
}
df$originatorId = apply(df, 1, find_originatorId, df)
add a comment |
up vote
0
down vote
I managed to solve this by defining a recursive function and applying it to all rows.
find_originatorId = function(current_row, df){
if(current_row[["prevId"]] == ""){
current_row[["id"]]
} else {
parent_row = df[df$id == current_row[["prevId"]],]
find_originatorId(parent_row, df)
}
}
df$originatorId = apply(df, 1, find_originatorId, df)
add a comment |
up vote
0
down vote
up vote
0
down vote
I managed to solve this by defining a recursive function and applying it to all rows.
find_originatorId = function(current_row, df){
if(current_row[["prevId"]] == ""){
current_row[["id"]]
} else {
parent_row = df[df$id == current_row[["prevId"]],]
find_originatorId(parent_row, df)
}
}
df$originatorId = apply(df, 1, find_originatorId, df)
I managed to solve this by defining a recursive function and applying it to all rows.
find_originatorId = function(current_row, df){
if(current_row[["prevId"]] == ""){
current_row[["id"]]
} else {
parent_row = df[df$id == current_row[["prevId"]],]
find_originatorId(parent_row, df)
}
}
df$originatorId = apply(df, 1, find_originatorId, df)
answered Nov 19 at 20:04
vectorson
193
193
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380257%2fdata-tables-creating-new-column-by-examining-multiple-columns-on-multiple-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Are you looking for DT[, originatorId:=id[1L], by=cumsum(prevId==“”)]
– chinsoon12
Nov 19 at 22:20
This is almost perfect chinsoon. One minor problem is that if there is a row where there is no earlier row with a 'id' equal to its 'prevId', it will set the closest row without a prevId as its originator. See the example in my edit. Would you know a tweak that could fix this?
– Michael
Nov 26 at 12:19
Use prevId[2L] instead of id[1L] if you want the whole originator to be 11a. If not, can you show your desired output?
– chinsoon12
Nov 26 at 13:20