Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the...
This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation).
The data I am working with looks like this:
df <- data.frame(Effect = rep(seq(from = 0.05, to = 1, by = 0.05), each = 5), Time = rep(c(1:20,1:20), each = 5), Replicate = c(1:5))
Essentially what I would like to do is create custom partitions, like those generated by the caret::groupKFold function but for these folds to be over a specified range (i.e. > 15 days) and for each fold to with-hold one point to be a test set and with all other data to be used for training. This would be repeated at each iteration till every point in the specified range has been used as a test set. @Missuse wrote some code towards this end which gets close to the desired output for this question in the above link. 
I would try and show you the desired output but in all honesty the caret::groupKFold functions output confuses me so hopefully the above description will suffice. Happy to try and clarify though!
r cross-validation r-caret data-partitioning
add a comment |
This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation).
The data I am working with looks like this:
df <- data.frame(Effect = rep(seq(from = 0.05, to = 1, by = 0.05), each = 5), Time = rep(c(1:20,1:20), each = 5), Replicate = c(1:5))
Essentially what I would like to do is create custom partitions, like those generated by the caret::groupKFold function but for these folds to be over a specified range (i.e. > 15 days) and for each fold to with-hold one point to be a test set and with all other data to be used for training. This would be repeated at each iteration till every point in the specified range has been used as a test set. @Missuse wrote some code towards this end which gets close to the desired output for this question in the above link. 
I would try and show you the desired output but in all honesty the caret::groupKFold functions output confuses me so hopefully the above description will suffice. Happy to try and clarify though!
r cross-validation r-caret data-partitioning
 
 
 1
 
 
 
 
 you can proceed as in the linked answer but instead of splitting by time, split by a dummy variable which is an integer sequence- 1:n(). If still having problems I can post an answer with code.
 – missuse
 Nov 22 at 7:09
 
 
 
 
 
 
 
 
 
 
 
 I am not sure exactly sure how to implement and I think I may have been a little misleading with how the data was represented... I have just updated the question to have a more representative dataset. Sorry for any trouble this might have caused and thank you again for the help!
 – André.B
 Nov 30 at 3:00
 
 
 
add a comment |
This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation).
The data I am working with looks like this:
df <- data.frame(Effect = rep(seq(from = 0.05, to = 1, by = 0.05), each = 5), Time = rep(c(1:20,1:20), each = 5), Replicate = c(1:5))
Essentially what I would like to do is create custom partitions, like those generated by the caret::groupKFold function but for these folds to be over a specified range (i.e. > 15 days) and for each fold to with-hold one point to be a test set and with all other data to be used for training. This would be repeated at each iteration till every point in the specified range has been used as a test set. @Missuse wrote some code towards this end which gets close to the desired output for this question in the above link. 
I would try and show you the desired output but in all honesty the caret::groupKFold functions output confuses me so hopefully the above description will suffice. Happy to try and clarify though!
r cross-validation r-caret data-partitioning
This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation).
The data I am working with looks like this:
df <- data.frame(Effect = rep(seq(from = 0.05, to = 1, by = 0.05), each = 5), Time = rep(c(1:20,1:20), each = 5), Replicate = c(1:5))
Essentially what I would like to do is create custom partitions, like those generated by the caret::groupKFold function but for these folds to be over a specified range (i.e. > 15 days) and for each fold to with-hold one point to be a test set and with all other data to be used for training. This would be repeated at each iteration till every point in the specified range has been used as a test set. @Missuse wrote some code towards this end which gets close to the desired output for this question in the above link. 
I would try and show you the desired output but in all honesty the caret::groupKFold functions output confuses me so hopefully the above description will suffice. Happy to try and clarify though!
r cross-validation r-caret data-partitioning
r cross-validation r-caret data-partitioning
edited Dec 17 at 21:54
asked Nov 20 at 20:25


André.B
528
528
 
 
 1
 
 
 
 
 you can proceed as in the linked answer but instead of splitting by time, split by a dummy variable which is an integer sequence- 1:n(). If still having problems I can post an answer with code.
 – missuse
 Nov 22 at 7:09
 
 
 
 
 
 
 
 
 
 
 
 I am not sure exactly sure how to implement and I think I may have been a little misleading with how the data was represented... I have just updated the question to have a more representative dataset. Sorry for any trouble this might have caused and thank you again for the help!
 – André.B
 Nov 30 at 3:00
 
 
 
add a comment |
 
 
 1
 
 
 
 
 you can proceed as in the linked answer but instead of splitting by time, split by a dummy variable which is an integer sequence- 1:n(). If still having problems I can post an answer with code.
 – missuse
 Nov 22 at 7:09
 
 
 
 
 
 
 
 
 
 
 
 I am not sure exactly sure how to implement and I think I may have been a little misleading with how the data was represented... I have just updated the question to have a more representative dataset. Sorry for any trouble this might have caused and thank you again for the help!
 – André.B
 Nov 30 at 3:00
 
 
 
1
1
you can proceed as in the linked answer but instead of splitting by time, split by a dummy variable which is an integer sequence
1:n(). If still having problems I can post an answer with code.– missuse
Nov 22 at 7:09
you can proceed as in the linked answer but instead of splitting by time, split by a dummy variable which is an integer sequence
1:n(). If still having problems I can post an answer with code.– missuse
Nov 22 at 7:09
I am not sure exactly sure how to implement and I think I may have been a little misleading with how the data was represented... I have just updated the question to have a more representative dataset. Sorry for any trouble this might have caused and thank you again for the help!
– André.B
Nov 30 at 3:00
I am not sure exactly sure how to implement and I think I may have been a little misleading with how the data was represented... I have just updated the question to have a more representative dataset. Sorry for any trouble this might have caused and thank you again for the help!
– André.B
Nov 30 at 3:00
add a comment |
                                1 Answer
                                1
                        
active
oldest
votes
Here is one way you could create the desired partition using tidyverse: 
library(tidyverse)
df %>%
  mutate(id = row_number()) %>% #create a column called id which will hold the row numbers
  filter(Time > 15) %>% #subset data frame according to your description 
  split(.$id)  %>% #split the data frame into lists by id (row number)
  map(~ .x %>% select(id) %>% #clean up so it works with indexOut argument in trainControl
        unlist %>%
        unname) -> folds_cv
EDIT: it seems indexOut argument does not perform as expected, but the index argument does so after making folds_cv one can just get the inverse using setdiff:
folds_cv <- lapply(folds_cv, function(x) setdiff(1:nrow(df), x))
and now:
test_control <- trainControl(index = folds_cv,
                             savePredictions = "final")
quad.lm2 <- train(Time ~ Effect,
                  data = df,
                  method = "lm",
                  trControl = test_control)
with a warning:
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
> quad.lm2
Linear Regression 
200 samples
  1 predictor
No pre-processing
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 199, 199, 199, 199, 199, 199, ... 
Resampling results:
  RMSE          Rsquared  MAE         
  3.552714e-16  NaN       3.552714e-16
Tuning parameter 'intercept' was held constant at a value of TRUE
so each re-sample used 199 rows and predicted on 1, repeating for all 50 rows which we wanted to hold out at a time. This can be verified in:
quad.lm2$pred
Why Rsquared is missing I am not sure I will dig a bit deeper.
 
 
 
 
 
 
 Hey @missuse, I have just gotten around to running this again and it looks like there is a slight issue with the code - the above will spit out a list of single integers to be used as test sets rather than training sets. Is there a way to invert it? I think train control needs the training sets specified rather than the test sets. Sorry for the trouble and thanks again for the help!
 – André.B
 Dec 17 at 21:59
 
 
 
 
 
 1
 
 
 
 
 You can specify the test indexes in- trainControlusing the argument- indexOut. All others will be used for training. As specified in my answer: "#clean up so it works with indexOut argument in trainControl"
 – missuse
 Dec 17 at 22:02
 
 
 
 
 
 
 
 1
 
 
 
 
 I gave that a try as suggested but I am getting this error with the test data:- test_control <- trainControl(indexOut = folds_cv, method = "cv")and then- quad.lm2 <- train(Time ~ Effect, data = df, method = "lm", trControl = test_control)Any idea what I am doing wrong @missuse?
 – André.B
 Dec 17 at 23:42
 
 
 
 
 
 
 
 
 
 
 
 You are correct. It appears not to be working although I am sure I have used it in some previous caret version successfully. I have edited the answer with a working example. Still there is a minor problem in- R2which I will try to get to.
 – missuse
 Dec 18 at 7:36
 
 
 
 
 
 
 
 
 
 I suspect that it might be spitting out NaN's for the R^2 because one can't tell how well one variable is correlated with another if one only has a single point to draw upon. What do you think?
 – André.B
 Dec 18 at 22:39
 
 
 
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400998%2fspecifiying-a-selected-range-of-data-to-be-used-in-leave-one-out-jack-knife-cr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
                                1 Answer
                                1
                        
active
oldest
votes
                                1 Answer
                                1
                        
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here is one way you could create the desired partition using tidyverse: 
library(tidyverse)
df %>%
  mutate(id = row_number()) %>% #create a column called id which will hold the row numbers
  filter(Time > 15) %>% #subset data frame according to your description 
  split(.$id)  %>% #split the data frame into lists by id (row number)
  map(~ .x %>% select(id) %>% #clean up so it works with indexOut argument in trainControl
        unlist %>%
        unname) -> folds_cv
EDIT: it seems indexOut argument does not perform as expected, but the index argument does so after making folds_cv one can just get the inverse using setdiff:
folds_cv <- lapply(folds_cv, function(x) setdiff(1:nrow(df), x))
and now:
test_control <- trainControl(index = folds_cv,
                             savePredictions = "final")
quad.lm2 <- train(Time ~ Effect,
                  data = df,
                  method = "lm",
                  trControl = test_control)
with a warning:
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
> quad.lm2
Linear Regression 
200 samples
  1 predictor
No pre-processing
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 199, 199, 199, 199, 199, 199, ... 
Resampling results:
  RMSE          Rsquared  MAE         
  3.552714e-16  NaN       3.552714e-16
Tuning parameter 'intercept' was held constant at a value of TRUE
so each re-sample used 199 rows and predicted on 1, repeating for all 50 rows which we wanted to hold out at a time. This can be verified in:
quad.lm2$pred
Why Rsquared is missing I am not sure I will dig a bit deeper.
 
 
 
 
 
 
 Hey @missuse, I have just gotten around to running this again and it looks like there is a slight issue with the code - the above will spit out a list of single integers to be used as test sets rather than training sets. Is there a way to invert it? I think train control needs the training sets specified rather than the test sets. Sorry for the trouble and thanks again for the help!
 – André.B
 Dec 17 at 21:59
 
 
 
 
 
 1
 
 
 
 
 You can specify the test indexes in- trainControlusing the argument- indexOut. All others will be used for training. As specified in my answer: "#clean up so it works with indexOut argument in trainControl"
 – missuse
 Dec 17 at 22:02
 
 
 
 
 
 
 
 1
 
 
 
 
 I gave that a try as suggested but I am getting this error with the test data:- test_control <- trainControl(indexOut = folds_cv, method = "cv")and then- quad.lm2 <- train(Time ~ Effect, data = df, method = "lm", trControl = test_control)Any idea what I am doing wrong @missuse?
 – André.B
 Dec 17 at 23:42
 
 
 
 
 
 
 
 
 
 
 
 You are correct. It appears not to be working although I am sure I have used it in some previous caret version successfully. I have edited the answer with a working example. Still there is a minor problem in- R2which I will try to get to.
 – missuse
 Dec 18 at 7:36
 
 
 
 
 
 
 
 
 
 I suspect that it might be spitting out NaN's for the R^2 because one can't tell how well one variable is correlated with another if one only has a single point to draw upon. What do you think?
 – André.B
 Dec 18 at 22:39
 
 
 
add a comment |
Here is one way you could create the desired partition using tidyverse: 
library(tidyverse)
df %>%
  mutate(id = row_number()) %>% #create a column called id which will hold the row numbers
  filter(Time > 15) %>% #subset data frame according to your description 
  split(.$id)  %>% #split the data frame into lists by id (row number)
  map(~ .x %>% select(id) %>% #clean up so it works with indexOut argument in trainControl
        unlist %>%
        unname) -> folds_cv
EDIT: it seems indexOut argument does not perform as expected, but the index argument does so after making folds_cv one can just get the inverse using setdiff:
folds_cv <- lapply(folds_cv, function(x) setdiff(1:nrow(df), x))
and now:
test_control <- trainControl(index = folds_cv,
                             savePredictions = "final")
quad.lm2 <- train(Time ~ Effect,
                  data = df,
                  method = "lm",
                  trControl = test_control)
with a warning:
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
> quad.lm2
Linear Regression 
200 samples
  1 predictor
No pre-processing
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 199, 199, 199, 199, 199, 199, ... 
Resampling results:
  RMSE          Rsquared  MAE         
  3.552714e-16  NaN       3.552714e-16
Tuning parameter 'intercept' was held constant at a value of TRUE
so each re-sample used 199 rows and predicted on 1, repeating for all 50 rows which we wanted to hold out at a time. This can be verified in:
quad.lm2$pred
Why Rsquared is missing I am not sure I will dig a bit deeper.
 
 
 
 
 
 
 Hey @missuse, I have just gotten around to running this again and it looks like there is a slight issue with the code - the above will spit out a list of single integers to be used as test sets rather than training sets. Is there a way to invert it? I think train control needs the training sets specified rather than the test sets. Sorry for the trouble and thanks again for the help!
 – André.B
 Dec 17 at 21:59
 
 
 
 
 
 1
 
 
 
 
 You can specify the test indexes in- trainControlusing the argument- indexOut. All others will be used for training. As specified in my answer: "#clean up so it works with indexOut argument in trainControl"
 – missuse
 Dec 17 at 22:02
 
 
 
 
 
 
 
 1
 
 
 
 
 I gave that a try as suggested but I am getting this error with the test data:- test_control <- trainControl(indexOut = folds_cv, method = "cv")and then- quad.lm2 <- train(Time ~ Effect, data = df, method = "lm", trControl = test_control)Any idea what I am doing wrong @missuse?
 – André.B
 Dec 17 at 23:42
 
 
 
 
 
 
 
 
 
 
 
 You are correct. It appears not to be working although I am sure I have used it in some previous caret version successfully. I have edited the answer with a working example. Still there is a minor problem in- R2which I will try to get to.
 – missuse
 Dec 18 at 7:36
 
 
 
 
 
 
 
 
 
 I suspect that it might be spitting out NaN's for the R^2 because one can't tell how well one variable is correlated with another if one only has a single point to draw upon. What do you think?
 – André.B
 Dec 18 at 22:39
 
 
 
add a comment |
Here is one way you could create the desired partition using tidyverse: 
library(tidyverse)
df %>%
  mutate(id = row_number()) %>% #create a column called id which will hold the row numbers
  filter(Time > 15) %>% #subset data frame according to your description 
  split(.$id)  %>% #split the data frame into lists by id (row number)
  map(~ .x %>% select(id) %>% #clean up so it works with indexOut argument in trainControl
        unlist %>%
        unname) -> folds_cv
EDIT: it seems indexOut argument does not perform as expected, but the index argument does so after making folds_cv one can just get the inverse using setdiff:
folds_cv <- lapply(folds_cv, function(x) setdiff(1:nrow(df), x))
and now:
test_control <- trainControl(index = folds_cv,
                             savePredictions = "final")
quad.lm2 <- train(Time ~ Effect,
                  data = df,
                  method = "lm",
                  trControl = test_control)
with a warning:
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
> quad.lm2
Linear Regression 
200 samples
  1 predictor
No pre-processing
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 199, 199, 199, 199, 199, 199, ... 
Resampling results:
  RMSE          Rsquared  MAE         
  3.552714e-16  NaN       3.552714e-16
Tuning parameter 'intercept' was held constant at a value of TRUE
so each re-sample used 199 rows and predicted on 1, repeating for all 50 rows which we wanted to hold out at a time. This can be verified in:
quad.lm2$pred
Why Rsquared is missing I am not sure I will dig a bit deeper.
Here is one way you could create the desired partition using tidyverse: 
library(tidyverse)
df %>%
  mutate(id = row_number()) %>% #create a column called id which will hold the row numbers
  filter(Time > 15) %>% #subset data frame according to your description 
  split(.$id)  %>% #split the data frame into lists by id (row number)
  map(~ .x %>% select(id) %>% #clean up so it works with indexOut argument in trainControl
        unlist %>%
        unname) -> folds_cv
EDIT: it seems indexOut argument does not perform as expected, but the index argument does so after making folds_cv one can just get the inverse using setdiff:
folds_cv <- lapply(folds_cv, function(x) setdiff(1:nrow(df), x))
and now:
test_control <- trainControl(index = folds_cv,
                             savePredictions = "final")
quad.lm2 <- train(Time ~ Effect,
                  data = df,
                  method = "lm",
                  trControl = test_control)
with a warning:
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
> quad.lm2
Linear Regression 
200 samples
  1 predictor
No pre-processing
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 199, 199, 199, 199, 199, 199, ... 
Resampling results:
  RMSE          Rsquared  MAE         
  3.552714e-16  NaN       3.552714e-16
Tuning parameter 'intercept' was held constant at a value of TRUE
so each re-sample used 199 rows and predicted on 1, repeating for all 50 rows which we wanted to hold out at a time. This can be verified in:
quad.lm2$pred
Why Rsquared is missing I am not sure I will dig a bit deeper.
edited Dec 18 at 7:35
answered Nov 30 at 8:11


missuse
11.5k2622
11.5k2622
 
 
 
 
 
 
 Hey @missuse, I have just gotten around to running this again and it looks like there is a slight issue with the code - the above will spit out a list of single integers to be used as test sets rather than training sets. Is there a way to invert it? I think train control needs the training sets specified rather than the test sets. Sorry for the trouble and thanks again for the help!
 – André.B
 Dec 17 at 21:59
 
 
 
 
 
 1
 
 
 
 
 You can specify the test indexes in- trainControlusing the argument- indexOut. All others will be used for training. As specified in my answer: "#clean up so it works with indexOut argument in trainControl"
 – missuse
 Dec 17 at 22:02
 
 
 
 
 
 
 
 1
 
 
 
 
 I gave that a try as suggested but I am getting this error with the test data:- test_control <- trainControl(indexOut = folds_cv, method = "cv")and then- quad.lm2 <- train(Time ~ Effect, data = df, method = "lm", trControl = test_control)Any idea what I am doing wrong @missuse?
 – André.B
 Dec 17 at 23:42
 
 
 
 
 
 
 
 
 
 
 
 You are correct. It appears not to be working although I am sure I have used it in some previous caret version successfully. I have edited the answer with a working example. Still there is a minor problem in- R2which I will try to get to.
 – missuse
 Dec 18 at 7:36
 
 
 
 
 
 
 
 
 
 I suspect that it might be spitting out NaN's for the R^2 because one can't tell how well one variable is correlated with another if one only has a single point to draw upon. What do you think?
 – André.B
 Dec 18 at 22:39
 
 
 
add a comment |
 
 
 
 
 
 
 Hey @missuse, I have just gotten around to running this again and it looks like there is a slight issue with the code - the above will spit out a list of single integers to be used as test sets rather than training sets. Is there a way to invert it? I think train control needs the training sets specified rather than the test sets. Sorry for the trouble and thanks again for the help!
 – André.B
 Dec 17 at 21:59
 
 
 
 
 
 1
 
 
 
 
 You can specify the test indexes in- trainControlusing the argument- indexOut. All others will be used for training. As specified in my answer: "#clean up so it works with indexOut argument in trainControl"
 – missuse
 Dec 17 at 22:02
 
 
 
 
 
 
 
 1
 
 
 
 
 I gave that a try as suggested but I am getting this error with the test data:- test_control <- trainControl(indexOut = folds_cv, method = "cv")and then- quad.lm2 <- train(Time ~ Effect, data = df, method = "lm", trControl = test_control)Any idea what I am doing wrong @missuse?
 – André.B
 Dec 17 at 23:42
 
 
 
 
 
 
 
 
 
 
 
 You are correct. It appears not to be working although I am sure I have used it in some previous caret version successfully. I have edited the answer with a working example. Still there is a minor problem in- R2which I will try to get to.
 – missuse
 Dec 18 at 7:36
 
 
 
 
 
 
 
 
 
 I suspect that it might be spitting out NaN's for the R^2 because one can't tell how well one variable is correlated with another if one only has a single point to draw upon. What do you think?
 – André.B
 Dec 18 at 22:39
 
 
 
Hey @missuse, I have just gotten around to running this again and it looks like there is a slight issue with the code - the above will spit out a list of single integers to be used as test sets rather than training sets. Is there a way to invert it? I think train control needs the training sets specified rather than the test sets. Sorry for the trouble and thanks again for the help!
– André.B
Dec 17 at 21:59
Hey @missuse, I have just gotten around to running this again and it looks like there is a slight issue with the code - the above will spit out a list of single integers to be used as test sets rather than training sets. Is there a way to invert it? I think train control needs the training sets specified rather than the test sets. Sorry for the trouble and thanks again for the help!
– André.B
Dec 17 at 21:59
1
1
You can specify the test indexes in
trainControl using the argument indexOut. All others will be used for training. As specified in my answer: "#clean up so it works with indexOut argument in trainControl"– missuse
Dec 17 at 22:02
You can specify the test indexes in
trainControl using the argument indexOut. All others will be used for training. As specified in my answer: "#clean up so it works with indexOut argument in trainControl"– missuse
Dec 17 at 22:02
1
1
I gave that a try as suggested but I am getting this error with the test data:
test_control <- trainControl(indexOut = folds_cv, method = "cv") and then  quad.lm2 <- train(Time ~ Effect,                    data = df, method = "lm",                    trControl = test_control)  Any idea what I am doing wrong @missuse?– André.B
Dec 17 at 23:42
I gave that a try as suggested but I am getting this error with the test data:
test_control <- trainControl(indexOut = folds_cv, method = "cv") and then  quad.lm2 <- train(Time ~ Effect,                    data = df, method = "lm",                    trControl = test_control)  Any idea what I am doing wrong @missuse?– André.B
Dec 17 at 23:42
You are correct. It appears not to be working although I am sure I have used it in some previous caret version successfully. I have edited the answer with a working example. Still there is a minor problem in
R2 which I will try to get to.– missuse
Dec 18 at 7:36
You are correct. It appears not to be working although I am sure I have used it in some previous caret version successfully. I have edited the answer with a working example. Still there is a minor problem in
R2 which I will try to get to.– missuse
Dec 18 at 7:36
I suspect that it might be spitting out NaN's for the R^2 because one can't tell how well one variable is correlated with another if one only has a single point to draw upon. What do you think?
– André.B
Dec 18 at 22:39
I suspect that it might be spitting out NaN's for the R^2 because one can't tell how well one variable is correlated with another if one only has a single point to draw upon. What do you think?
– André.B
Dec 18 at 22:39
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400998%2fspecifiying-a-selected-range-of-data-to-be-used-in-leave-one-out-jack-knife-cr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
you can proceed as in the linked answer but instead of splitting by time, split by a dummy variable which is an integer sequence
1:n(). If still having problems I can post an answer with code.– missuse
Nov 22 at 7:09
I am not sure exactly sure how to implement and I think I may have been a little misleading with how the data was represented... I have just updated the question to have a more representative dataset. Sorry for any trouble this might have caused and thank you again for the help!
– André.B
Nov 30 at 3:00