Subset a dataframe in two dataframes by values in two columns of another dataframe [duplicate]

This question already has an answer here:

Filter by ranges supplied by two vectors, without a join operation

2 answers

Efficient way to filter one data frame by ranges in another

3 answers

Subset by multiple ranges [duplicate]

3 answers

I have two dataframes.
df1 looks like (or the column of df1 i am interested in):

position

2

6

12

18

25

31

and df2 looks like:

start   end

2       17

24      29

I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:

position

2

6

12

25

Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.

df4 would look like:

position

18

31

I can do this the perl way using a for loop (coming from perl and currently learning R) but I am pretty sure I can somehow use filter or some other dplyr or base R combination of functions to achieve this.

Any help would be appreciated!

EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!

edited Nov 23 '18 at 13:36

asked Nov 22 '18 at 17:20

RacktheMan

marked as duplicate by Henrik r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 '18 at 18:05

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

you mean 18, 31 for df4

– Andre Elrico
Nov 23 '18 at 12:56

Yes, corrected this!

– RacktheMan
Nov 23 '18 at 13:36

add a comment |

This question already has an answer here:

Filter by ranges supplied by two vectors, without a join operation

2 answers

Efficient way to filter one data frame by ranges in another

3 answers

Subset by multiple ranges [duplicate]

3 answers

I have two dataframes.
df1 looks like (or the column of df1 i am interested in):

position

2

6

12

18

25

31

and df2 looks like:

start   end

2       17

24      29

I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:

position

2

6

12

25

Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.

df4 would look like:

position

18

31

Any help would be appreciated!

EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!

edited Nov 23 '18 at 13:36

asked Nov 22 '18 at 17:20

RacktheMan

marked as duplicate by Henrik r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 '18 at 18:05

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

you mean 18, 31 for df4

– Andre Elrico
Nov 23 '18 at 12:56

Yes, corrected this!

– RacktheMan
Nov 23 '18 at 13:36

add a comment |

This question already has an answer here:

Filter by ranges supplied by two vectors, without a join operation

2 answers

Efficient way to filter one data frame by ranges in another

3 answers

Subset by multiple ranges [duplicate]

3 answers

I have two dataframes.
df1 looks like (or the column of df1 i am interested in):

position

2

6

12

18

25

31

and df2 looks like:

start   end

2       17

24      29

I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:

position

2

6

12

25

Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.

df4 would look like:

position

18

31

Any help would be appreciated!

EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!

edited Nov 23 '18 at 13:36

asked Nov 22 '18 at 17:20

RacktheMan

This question already has an answer here:

Filter by ranges supplied by two vectors, without a join operation

2 answers

Efficient way to filter one data frame by ranges in another

3 answers

Subset by multiple ranges [duplicate]

3 answers

I have two dataframes.
df1 looks like (or the column of df1 i am interested in):

position

2

6

12

18

25

31

and df2 looks like:

start   end

2       17

24      29

I want to keep the positions in df1 that only fall between (<= or >=) the start and end coordinates of df2, so that df1 looks like this after filtering:

position

2

6

12

25

Then I want to keep the filtered out "leftover" values of df1 in another dataframe, let's call it df4.

df4 would look like:

position

18

31

Any help would be appreciated!

EDIT: Added df4 calculation as my question was marked as duplicate and this is sth not found in the other similar threads. This is something I am interested in doing to make my code faster!

This question already has an answer here:

Filter by ranges supplied by two vectors, without a join operation

2 answers

Efficient way to filter one data frame by ranges in another

3 answers

Subset by multiple ranges [duplicate]

3 answers

r dplyr

edited Nov 23 '18 at 13:36

asked Nov 22 '18 at 17:20

RacktheMan

edited Nov 23 '18 at 13:36

asked Nov 22 '18 at 17:20

RacktheMan

edited Nov 23 '18 at 13:36

asked Nov 22 '18 at 17:20

RacktheMan

asked Nov 22 '18 at 17:20

RacktheMan

asked Nov 22 '18 at 17:20

RacktheMan

marked as duplicate by Henrik r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 '18 at 18:05

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Henrik r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 22 '18 at 18:05

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

you mean 18, 31 for df4

– Andre Elrico
Nov 23 '18 at 12:56

Yes, corrected this!

– RacktheMan
Nov 23 '18 at 13:36

add a comment |

you mean 18, 31 for df4

– Andre Elrico
Nov 23 '18 at 12:56

Yes, corrected this!

– RacktheMan
Nov 23 '18 at 13:36

you mean 18, 31 for df4

– Andre Elrico
Nov 23 '18 at 12:56

Yes, corrected this!

– RacktheMan
Nov 23 '18 at 13:36

add a comment |

5 Answers
5

active

oldest

votes

We can full_join these two data frames and then filter for the rows within the start and end column. The Flag column in the example is just for the join. Finally, we can use distinct to remove duplicated rows.

library(dplyr)



df3 <- df1 %>%

  mutate(Flag = 1) %>%

  full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%

  filter(position >= start, position <= end) %>%

  distinct(position)

df3

#   position

# 1        3

# 2        6

# 3       12

# 4       25

DATA

df1 <- read.table(text = "position

3

                  6

                  12

                  18

                  25

                  31", header = TRUE)



df2 <- read.table(text = "start   end

2       17

24      29",

                  header = TRUE)

answered Nov 22 '18 at 17:31

www

26.2k112240

Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)

– RacktheMan
Nov 23 '18 at 11:18

I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!

– RacktheMan
Nov 23 '18 at 13:04

add a comment |

Single line, simple base solution:

df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]

The apply simply generates a vector of all the cases that fall between starts and ends.

edited Nov 22 '18 at 17:51

answered Nov 22 '18 at 17:45

iod

3,7392722

After a quick check, this seems to be the fastest solution.

– lith
Nov 26 '18 at 8:47

add a comment |

Here is a base R option

do.call(rbind, Map(function(i, j) 

  df1[df1$position > i & df1$position < j, , drop = FALSE], 

      df2$start, df2$end))

#    position

#1        3

#2        6

#3       12

#5       25

Or using fuzzy_join

library(fuzzyjoin)

library(dplyr)

fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'), 

        match_fun = list(`>`, `<`)) %>%

    select(position)

#  position

#1        3

#2        6

#3       12

#4       25

Or use a non-equi join from data.table

setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]

#   position

#1:        3

#2:        6

#3:       12

#4:       25

data

df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 

 -6L), class = "data.frame")



df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)), 

 class = "data.frame", row.names = c(NA, -2L))

edited Nov 22 '18 at 18:03

answered Nov 22 '18 at 17:32

akrun

403k13194266

add a comment |

base R solution (no packages needed)

keepRows<-

sapply(df1$position, function(x_o) {

    any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))

})



df1[keepRows,, drop = FALSE]

Result:

#  position

#1        3

#2        6

#3       12

#5       25

Please note:

This is basically a double loop, I don't know how else to solve this with base::.

Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.

To get the "leftover" use negation:

df1[!keepRows,, drop = FALSE]

edited Nov 23 '18 at 12:55

answered Nov 22 '18 at 17:31

Andre Elrico

5,68911028

Made my question more clear upon your suggestion.

– RacktheMan
Nov 23 '18 at 11:19

add a comment |

Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):

subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))

You should probably run some benchmarks on the proposed approaches before making a decision.

edited Nov 26 '18 at 15:09

answered Nov 22 '18 at 17:51

lith

577217

@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.

– lith
Nov 26 '18 at 8:49

@AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.

– lith
Nov 26 '18 at 15:09

add a comment |

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

library(dplyr)



df3 <- df1 %>%

  mutate(Flag = 1) %>%

  full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%

  filter(position >= start, position <= end) %>%

  distinct(position)

df3

#   position

# 1        3

# 2        6

# 3       12

# 4       25

DATA

df1 <- read.table(text = "position

3

                  6

                  12

                  18

                  25

                  31", header = TRUE)



df2 <- read.table(text = "start   end

2       17

24      29",

                  header = TRUE)

answered Nov 22 '18 at 17:31

www

26.2k112240

Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)

– RacktheMan
Nov 23 '18 at 11:18

I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!

– RacktheMan
Nov 23 '18 at 13:04

add a comment |

library(dplyr)



df3 <- df1 %>%

  mutate(Flag = 1) %>%

  full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%

  filter(position >= start, position <= end) %>%

  distinct(position)

df3

#   position

# 1        3

# 2        6

# 3       12

# 4       25

DATA

df1 <- read.table(text = "position

3

                  6

                  12

                  18

                  25

                  31", header = TRUE)



df2 <- read.table(text = "start   end

2       17

24      29",

                  header = TRUE)

answered Nov 22 '18 at 17:31

www

26.2k112240

Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)

– RacktheMan
Nov 23 '18 at 11:18

I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!

– RacktheMan
Nov 23 '18 at 13:04

add a comment |

library(dplyr)



df3 <- df1 %>%

  mutate(Flag = 1) %>%

  full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%

  filter(position >= start, position <= end) %>%

  distinct(position)

df3

#   position

# 1        3

# 2        6

# 3       12

# 4       25

DATA

df1 <- read.table(text = "position

3

                  6

                  12

                  18

                  25

                  31", header = TRUE)



df2 <- read.table(text = "start   end

2       17

24      29",

                  header = TRUE)

answered Nov 22 '18 at 17:31

www

26.2k112240

library(dplyr)



df3 <- df1 %>%

  mutate(Flag = 1) %>%

  full_join(df2 %>% mutate(Flag = 1), by = "Flag") %>%

  filter(position >= start, position <= end) %>%

  distinct(position)

df3

#   position

# 1        3

# 2        6

# 3       12

# 4       25

DATA

df1 <- read.table(text = "position

3

                  6

                  12

                  18

                  25

                  31", header = TRUE)



df2 <- read.table(text = "start   end

2       17

24      29",

                  header = TRUE)

answered Nov 22 '18 at 17:31

www

26.2k112240

answered Nov 22 '18 at 17:31

www

26.2k112240

answered Nov 22 '18 at 17:31

www

26.2k112240

answered Nov 22 '18 at 17:31

www

26.2k112240

Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)

– RacktheMan
Nov 23 '18 at 11:18

I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!

– RacktheMan
Nov 23 '18 at 13:04

add a comment |

Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)

– RacktheMan
Nov 23 '18 at 11:18

I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!

– RacktheMan
Nov 23 '18 at 13:04

Hello www, this an easy for me to understand dplyr solution! I am more familiar with dplyr than base r, so thanx for the great response! Also, this answer provides back a dataframe as I asked for! Is there a way to keep what is filtered out in a df4 without having to repeat this code? (See question EDIT)

– RacktheMan
Nov 23 '18 at 11:18

I found that out I think. I could just use:df4<-anti_join(df1, df3, by="position")!

– RacktheMan
Nov 23 '18 at 13:04

add a comment |

Single line, simple base solution:

df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]

The apply simply generates a vector of all the cases that fall between starts and ends.

edited Nov 22 '18 at 17:51

answered Nov 22 '18 at 17:45

iod

3,7392722

After a quick check, this seems to be the fastest solution.

– lith
Nov 26 '18 at 8:47

add a comment |

Single line, simple base solution:

df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]

The apply simply generates a vector of all the cases that fall between starts and ends.

edited Nov 22 '18 at 17:51

answered Nov 22 '18 at 17:45

iod

3,7392722

After a quick check, this seems to be the fastest solution.

– lith
Nov 26 '18 at 8:47

add a comment |

Single line, simple base solution:

df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]

The apply simply generates a vector of all the cases that fall between starts and ends.

edited Nov 22 '18 at 17:51

answered Nov 22 '18 at 17:45

iod

3,7392722

Single line, simple base solution:

df1[df1$position %in% unlist(apply(df2,1,function(x) x["start"]:x["end"])),]

The apply simply generates a vector of all the cases that fall between starts and ends.

edited Nov 22 '18 at 17:51

answered Nov 22 '18 at 17:45

iod

3,7392722

edited Nov 22 '18 at 17:51

answered Nov 22 '18 at 17:45

iod

3,7392722

answered Nov 22 '18 at 17:45

iod

3,7392722

answered Nov 22 '18 at 17:45

iod

3,7392722

After a quick check, this seems to be the fastest solution.

– lith
Nov 26 '18 at 8:47

add a comment |

After a quick check, this seems to be the fastest solution.

– lith
Nov 26 '18 at 8:47

After a quick check, this seems to be the fastest solution.

– lith
Nov 26 '18 at 8:47

add a comment |

Here is a base R option

do.call(rbind, Map(function(i, j) 

  df1[df1$position > i & df1$position < j, , drop = FALSE], 

      df2$start, df2$end))

#    position

#1        3

#2        6

#3       12

#5       25

Or using fuzzy_join

library(fuzzyjoin)

library(dplyr)

fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'), 

        match_fun = list(`>`, `<`)) %>%

    select(position)

#  position

#1        3

#2        6

#3       12

#4       25

Or use a non-equi join from data.table

setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]

#   position

#1:        3

#2:        6

#3:       12

#4:       25

data

df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 

 -6L), class = "data.frame")



df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)), 

 class = "data.frame", row.names = c(NA, -2L))

edited Nov 22 '18 at 18:03

answered Nov 22 '18 at 17:32

akrun

403k13194266

add a comment |

Here is a base R option

do.call(rbind, Map(function(i, j) 

  df1[df1$position > i & df1$position < j, , drop = FALSE], 

      df2$start, df2$end))

#    position

#1        3

#2        6

#3       12

#5       25

Or using fuzzy_join

library(fuzzyjoin)

library(dplyr)

fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'), 

        match_fun = list(`>`, `<`)) %>%

    select(position)

#  position

#1        3

#2        6

#3       12

#4       25

Or use a non-equi join from data.table

setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]

#   position

#1:        3

#2:        6

#3:       12

#4:       25

data

df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 

 -6L), class = "data.frame")



df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)), 

 class = "data.frame", row.names = c(NA, -2L))

edited Nov 22 '18 at 18:03

answered Nov 22 '18 at 17:32

akrun

403k13194266

add a comment |

Here is a base R option

do.call(rbind, Map(function(i, j) 

  df1[df1$position > i & df1$position < j, , drop = FALSE], 

      df2$start, df2$end))

#    position

#1        3

#2        6

#3       12

#5       25

Or using fuzzy_join

library(fuzzyjoin)

library(dplyr)

fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'), 

        match_fun = list(`>`, `<`)) %>%

    select(position)

#  position

#1        3

#2        6

#3       12

#4       25

Or use a non-equi join from data.table

setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]

#   position

#1:        3

#2:        6

#3:       12

#4:       25

data

df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 

 -6L), class = "data.frame")



df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)), 

 class = "data.frame", row.names = c(NA, -2L))

edited Nov 22 '18 at 18:03

answered Nov 22 '18 at 17:32

akrun

403k13194266

Here is a base R option

do.call(rbind, Map(function(i, j) 

  df1[df1$position > i & df1$position < j, , drop = FALSE], 

      df2$start, df2$end))

#    position

#1        3

#2        6

#3       12

#5       25

Or using fuzzy_join

library(fuzzyjoin)

library(dplyr)

fuzzy_inner_join(df1, df2, by = c('position' = 'start', 'position' = 'end'), 

        match_fun = list(`>`, `<`)) %>%

    select(position)

#  position

#1        3

#2        6

#3       12

#4       25

Or use a non-equi join from data.table

setDT(df2)[df1, on = .(start < position, end > position), .(position), nomatch = 0]

#   position

#1:        3

#2:        6

#3:       12

#4:       25

data

df1 <- structure(list(position = c(3L, 6L, 12L, 18L, 25L, 31L)), row.names = c(NA, 

 -6L), class = "data.frame")



df2 <- structure(list(start = c(2L, 24L), end = c(17L, 29L)), 

 class = "data.frame", row.names = c(NA, -2L))

edited Nov 22 '18 at 18:03

answered Nov 22 '18 at 17:32

akrun

403k13194266

edited Nov 22 '18 at 18:03

answered Nov 22 '18 at 17:32

akrun

403k13194266

answered Nov 22 '18 at 17:32

akrun

403k13194266

answered Nov 22 '18 at 17:32

akrun

403k13194266

add a comment |

base R solution (no packages needed)

keepRows<-

sapply(df1$position, function(x_o) {

    any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))

})



df1[keepRows,, drop = FALSE]

Result:

#  position

#1        3

#2        6

#3       12

#5       25

Please note:

This is basically a double loop, I don't know how else to solve this with base::.

Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.

To get the "leftover" use negation:

df1[!keepRows,, drop = FALSE]

edited Nov 23 '18 at 12:55

answered Nov 22 '18 at 17:31

Andre Elrico

5,68911028

Made my question more clear upon your suggestion.

– RacktheMan
Nov 23 '18 at 11:19

add a comment |

base R solution (no packages needed)

keepRows<-

sapply(df1$position, function(x_o) {

    any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))

})



df1[keepRows,, drop = FALSE]

Result:

#  position

#1        3

#2        6

#3       12

#5       25

Please note:

This is basically a double loop, I don't know how else to solve this with base::.

Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.

To get the "leftover" use negation:

df1[!keepRows,, drop = FALSE]

edited Nov 23 '18 at 12:55

answered Nov 22 '18 at 17:31

Andre Elrico

5,68911028

Made my question more clear upon your suggestion.

– RacktheMan
Nov 23 '18 at 11:19

add a comment |

base R solution (no packages needed)

keepRows<-

sapply(df1$position, function(x_o) {

    any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))

})



df1[keepRows,, drop = FALSE]

Result:

#  position

#1        3

#2        6

#3       12

#5       25

Please note:

This is basically a double loop, I don't know how else to solve this with base::.

Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.

To get the "leftover" use negation:

df1[!keepRows,, drop = FALSE]

edited Nov 23 '18 at 12:55

answered Nov 22 '18 at 17:31

Andre Elrico

5,68911028

base R solution (no packages needed)

keepRows<-

sapply(df1$position, function(x_o) {

    any(apply(df2, 1, function(x) {x_o => x[1] & x_o <= x[2]}))

})



df1[keepRows,, drop = FALSE]

Result:

#  position

#1        3

#2        6

#3       12

#5       25

Please note:

This is basically a double loop, I don't know how else to solve this with base::.

Should the border values be included? You are very vague with "between". Currently, I don't include them. You can change that using <=, >=.

To get the "leftover" use negation:

df1[!keepRows,, drop = FALSE]

edited Nov 23 '18 at 12:55

answered Nov 22 '18 at 17:31

Andre Elrico

5,68911028

edited Nov 23 '18 at 12:55

answered Nov 22 '18 at 17:31

Andre Elrico

5,68911028

answered Nov 22 '18 at 17:31

Andre Elrico

5,68911028

answered Nov 22 '18 at 17:31

Andre Elrico

5,68911028

Made my question more clear upon your suggestion.

– RacktheMan
Nov 23 '18 at 11:19

add a comment |

Made my question more clear upon your suggestion.

– RacktheMan
Nov 23 '18 at 11:19

Made my question more clear upon your suggestion.

– RacktheMan
Nov 23 '18 at 11:19

add a comment |

Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):

subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))

You should probably run some benchmarks on the proposed approaches before making a decision.

edited Nov 26 '18 at 15:09

answered Nov 22 '18 at 17:51

lith

577217

@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.

– lith
Nov 26 '18 at 8:49

@AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.

– lith
Nov 26 '18 at 15:09

add a comment |

Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):

subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))

You should probably run some benchmarks on the proposed approaches before making a decision.

edited Nov 26 '18 at 15:09

answered Nov 22 '18 at 17:51

lith

577217

@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.

– lith
Nov 26 '18 at 8:49

@AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.

– lith
Nov 26 '18 at 15:09

add a comment |

Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):

subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))

You should probably run some benchmarks on the proposed approaches before making a decision.

edited Nov 26 '18 at 15:09

answered Nov 22 '18 at 17:51

lith

577217

Here is another take that starts with df2 (I don't say this is wiser than Andre's approach):

subset(df1, apply(apply(df2, 1, function (x) {dplyr::between(df1$position, x["start"], x["end"])}), 1, any))

You should probably run some benchmarks on the proposed approaches before making a decision.

edited Nov 26 '18 at 15:09

answered Nov 22 '18 at 17:51

lith

577217

edited Nov 26 '18 at 15:09

answered Nov 22 '18 at 17:51

lith

577217

answered Nov 22 '18 at 17:51

lith

577217

answered Nov 22 '18 at 17:51

lith

577217

@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.

– lith
Nov 26 '18 at 8:49

@AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.

– lith
Nov 26 '18 at 15:09

add a comment |

@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.

– lith
Nov 26 '18 at 8:49

@AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.

– lith
Nov 26 '18 at 15:09

@AndreElrico Something like DT sometimes has to convert the data. This doesn't make it the ideal solution for all kind of problems. It's okay if the data already is a data.table though.

– lith
Nov 26 '18 at 8:49

@AndreElrico Since I don't use data.table the function I use actually is from dplyr. Thanks for the hint though. I forgot it isn't in base.

– lith
Nov 26 '18 at 15:09

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk