Using dplyr and pipes for logistic regression plotting
up vote
1
down vote
favorite
I have successfully created a plot of a binomial glm using example data. https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/125/2018/11/parasites.txt
The predictors of the model include 3 predictors (one categorical, 2 continuous)
The code works fine but I have been wanting to try and incorporating more dplyr functions and pipes to streamline code. Ultimately, I want to make my block of code into a function that works with any model with the same type and number of predictors for a binomial glm. Are there better ways to carry out my code with more tidyverse/dplyr code?
#import parasites file
df<-parasites
m1<-glm(data=df, infected~age+weight+sex, family = "binomial")
summary(m1)
age_grid <- round(seq(min(df$age), max(df$age), length.out = 15))
weight_grid <- round(seq(min(df$weight), max(df$weight), length.out = 15))
newdat <- expand.grid(weight =weight_grid,
age = age_grid, sex = c("female", "male"))
pred <- predict.glm(m1, newdata = newdat, type="link", se=TRUE)
ymin <- m1$family$linkinv(pred$fit - 1.96 * pred$se.fit)
ymax <- m1$family$linkinv(pred$fit + 1.96 * pred$se.fit)
fit <- m1$family$linkinv(pred$fit)
z <- matrix(fit, length(age_grid))
ci.low <- matrix(ymin, length(age_grid))
ci.up <- matrix(ymax, length(age_grid))
x<-data.frame(pred = fit,
low = ymin,
high = ymax,
newdat) %>% mutate(category=cut(age, breaks=c(0, 69, 138, 206), labels =
c("0-69", "70-139", "139-206")))
x$age<-as.factor(x$age)
library(ggplot2)
finalgraph<-ggplot(data=x)+
geom_line(aes(x = weight, y = pred, color = age))+
geom_ribbon(aes(x = weight, ymin = low, ymax = high, fill = age), alpha = 0.1) +
facet_grid(category~sex) +theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
ylab(expression(bold(y = "Infection Probability"))) + xlab(expression(bold("Weight"))) +
theme(legend.position = "right",strip.text.x = element_text(face = "bold", size=12),
strip.text.y = element_text(size=10),
axis.text.y = element_text(size=10, face = "bold"), axis.text.x = element_text(size=10),
axis.title = element_text(size=12),
legend.text=element_text(size=10), legend.title = element_text(size=12, face="bold"))+
labs(linetype="Age (months)", colour="Age (months)", fill = "Age (months)")
finalgraph
Code notes:
Essentially I made a model, created a bunch of values from my predictors (age_grid, v_grid) and made all possible combinations of these values along with the categorical variable of sex using expand.grid.
Then I just used the predict.glm function to extract predicted values based off of expand.grid object. I also extracted std. errors and calculated confidence intervals (ci.up and ci. low). Then I used some dplyr functions to create a dataframe with all this information and also made a new column called category. Category breaks down one of my variables (age) into four distinct groups based of f of breaks I decided on and labelled as decided as well. Then I plotted all of this data using ggplot2.
r statistics data-visualization
New contributor
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
1
down vote
favorite
I have successfully created a plot of a binomial glm using example data. https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/125/2018/11/parasites.txt
The predictors of the model include 3 predictors (one categorical, 2 continuous)
The code works fine but I have been wanting to try and incorporating more dplyr functions and pipes to streamline code. Ultimately, I want to make my block of code into a function that works with any model with the same type and number of predictors for a binomial glm. Are there better ways to carry out my code with more tidyverse/dplyr code?
#import parasites file
df<-parasites
m1<-glm(data=df, infected~age+weight+sex, family = "binomial")
summary(m1)
age_grid <- round(seq(min(df$age), max(df$age), length.out = 15))
weight_grid <- round(seq(min(df$weight), max(df$weight), length.out = 15))
newdat <- expand.grid(weight =weight_grid,
age = age_grid, sex = c("female", "male"))
pred <- predict.glm(m1, newdata = newdat, type="link", se=TRUE)
ymin <- m1$family$linkinv(pred$fit - 1.96 * pred$se.fit)
ymax <- m1$family$linkinv(pred$fit + 1.96 * pred$se.fit)
fit <- m1$family$linkinv(pred$fit)
z <- matrix(fit, length(age_grid))
ci.low <- matrix(ymin, length(age_grid))
ci.up <- matrix(ymax, length(age_grid))
x<-data.frame(pred = fit,
low = ymin,
high = ymax,
newdat) %>% mutate(category=cut(age, breaks=c(0, 69, 138, 206), labels =
c("0-69", "70-139", "139-206")))
x$age<-as.factor(x$age)
library(ggplot2)
finalgraph<-ggplot(data=x)+
geom_line(aes(x = weight, y = pred, color = age))+
geom_ribbon(aes(x = weight, ymin = low, ymax = high, fill = age), alpha = 0.1) +
facet_grid(category~sex) +theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
ylab(expression(bold(y = "Infection Probability"))) + xlab(expression(bold("Weight"))) +
theme(legend.position = "right",strip.text.x = element_text(face = "bold", size=12),
strip.text.y = element_text(size=10),
axis.text.y = element_text(size=10, face = "bold"), axis.text.x = element_text(size=10),
axis.title = element_text(size=12),
legend.text=element_text(size=10), legend.title = element_text(size=12, face="bold"))+
labs(linetype="Age (months)", colour="Age (months)", fill = "Age (months)")
finalgraph
Code notes:
Essentially I made a model, created a bunch of values from my predictors (age_grid, v_grid) and made all possible combinations of these values along with the categorical variable of sex using expand.grid.
Then I just used the predict.glm function to extract predicted values based off of expand.grid object. I also extracted std. errors and calculated confidence intervals (ci.up and ci. low). Then I used some dplyr functions to create a dataframe with all this information and also made a new column called category. Category breaks down one of my variables (age) into four distinct groups based of f of breaks I decided on and labelled as decided as well. Then I plotted all of this data using ggplot2.
r statistics data-visualization
New contributor
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have successfully created a plot of a binomial glm using example data. https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/125/2018/11/parasites.txt
The predictors of the model include 3 predictors (one categorical, 2 continuous)
The code works fine but I have been wanting to try and incorporating more dplyr functions and pipes to streamline code. Ultimately, I want to make my block of code into a function that works with any model with the same type and number of predictors for a binomial glm. Are there better ways to carry out my code with more tidyverse/dplyr code?
#import parasites file
df<-parasites
m1<-glm(data=df, infected~age+weight+sex, family = "binomial")
summary(m1)
age_grid <- round(seq(min(df$age), max(df$age), length.out = 15))
weight_grid <- round(seq(min(df$weight), max(df$weight), length.out = 15))
newdat <- expand.grid(weight =weight_grid,
age = age_grid, sex = c("female", "male"))
pred <- predict.glm(m1, newdata = newdat, type="link", se=TRUE)
ymin <- m1$family$linkinv(pred$fit - 1.96 * pred$se.fit)
ymax <- m1$family$linkinv(pred$fit + 1.96 * pred$se.fit)
fit <- m1$family$linkinv(pred$fit)
z <- matrix(fit, length(age_grid))
ci.low <- matrix(ymin, length(age_grid))
ci.up <- matrix(ymax, length(age_grid))
x<-data.frame(pred = fit,
low = ymin,
high = ymax,
newdat) %>% mutate(category=cut(age, breaks=c(0, 69, 138, 206), labels =
c("0-69", "70-139", "139-206")))
x$age<-as.factor(x$age)
library(ggplot2)
finalgraph<-ggplot(data=x)+
geom_line(aes(x = weight, y = pred, color = age))+
geom_ribbon(aes(x = weight, ymin = low, ymax = high, fill = age), alpha = 0.1) +
facet_grid(category~sex) +theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
ylab(expression(bold(y = "Infection Probability"))) + xlab(expression(bold("Weight"))) +
theme(legend.position = "right",strip.text.x = element_text(face = "bold", size=12),
strip.text.y = element_text(size=10),
axis.text.y = element_text(size=10, face = "bold"), axis.text.x = element_text(size=10),
axis.title = element_text(size=12),
legend.text=element_text(size=10), legend.title = element_text(size=12, face="bold"))+
labs(linetype="Age (months)", colour="Age (months)", fill = "Age (months)")
finalgraph
Code notes:
Essentially I made a model, created a bunch of values from my predictors (age_grid, v_grid) and made all possible combinations of these values along with the categorical variable of sex using expand.grid.
Then I just used the predict.glm function to extract predicted values based off of expand.grid object. I also extracted std. errors and calculated confidence intervals (ci.up and ci. low). Then I used some dplyr functions to create a dataframe with all this information and also made a new column called category. Category breaks down one of my variables (age) into four distinct groups based of f of breaks I decided on and labelled as decided as well. Then I plotted all of this data using ggplot2.
r statistics data-visualization
New contributor
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have successfully created a plot of a binomial glm using example data. https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/125/2018/11/parasites.txt
The predictors of the model include 3 predictors (one categorical, 2 continuous)
The code works fine but I have been wanting to try and incorporating more dplyr functions and pipes to streamline code. Ultimately, I want to make my block of code into a function that works with any model with the same type and number of predictors for a binomial glm. Are there better ways to carry out my code with more tidyverse/dplyr code?
#import parasites file
df<-parasites
m1<-glm(data=df, infected~age+weight+sex, family = "binomial")
summary(m1)
age_grid <- round(seq(min(df$age), max(df$age), length.out = 15))
weight_grid <- round(seq(min(df$weight), max(df$weight), length.out = 15))
newdat <- expand.grid(weight =weight_grid,
age = age_grid, sex = c("female", "male"))
pred <- predict.glm(m1, newdata = newdat, type="link", se=TRUE)
ymin <- m1$family$linkinv(pred$fit - 1.96 * pred$se.fit)
ymax <- m1$family$linkinv(pred$fit + 1.96 * pred$se.fit)
fit <- m1$family$linkinv(pred$fit)
z <- matrix(fit, length(age_grid))
ci.low <- matrix(ymin, length(age_grid))
ci.up <- matrix(ymax, length(age_grid))
x<-data.frame(pred = fit,
low = ymin,
high = ymax,
newdat) %>% mutate(category=cut(age, breaks=c(0, 69, 138, 206), labels =
c("0-69", "70-139", "139-206")))
x$age<-as.factor(x$age)
library(ggplot2)
finalgraph<-ggplot(data=x)+
geom_line(aes(x = weight, y = pred, color = age))+
geom_ribbon(aes(x = weight, ymin = low, ymax = high, fill = age), alpha = 0.1) +
facet_grid(category~sex) +theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
ylab(expression(bold(y = "Infection Probability"))) + xlab(expression(bold("Weight"))) +
theme(legend.position = "right",strip.text.x = element_text(face = "bold", size=12),
strip.text.y = element_text(size=10),
axis.text.y = element_text(size=10, face = "bold"), axis.text.x = element_text(size=10),
axis.title = element_text(size=12),
legend.text=element_text(size=10), legend.title = element_text(size=12, face="bold"))+
labs(linetype="Age (months)", colour="Age (months)", fill = "Age (months)")
finalgraph
Code notes:
Essentially I made a model, created a bunch of values from my predictors (age_grid, v_grid) and made all possible combinations of these values along with the categorical variable of sex using expand.grid.
Then I just used the predict.glm function to extract predicted values based off of expand.grid object. I also extracted std. errors and calculated confidence intervals (ci.up and ci. low). Then I used some dplyr functions to create a dataframe with all this information and also made a new column called category. Category breaks down one of my variables (age) into four distinct groups based of f of breaks I decided on and labelled as decided as well. Then I plotted all of this data using ggplot2.
r statistics data-visualization
r statistics data-visualization
New contributor
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 32 mins ago
200_success
127k15148411
127k15148411
New contributor
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 1 hour ago
Leo Ohyama
1062
1062
New contributor
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.
Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.
Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.
Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208355%2fusing-dplyr-and-pipes-for-logistic-regression-plotting%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown