Using dplyr and pipes for logistic regression plotting











up vote
1
down vote

favorite












I have successfully created a plot of a binomial glm using example data. https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/125/2018/11/parasites.txt



The predictors of the model include 3 predictors (one categorical, 2 continuous)



The code works fine but I have been wanting to try and incorporating more dplyr functions and pipes to streamline code. Ultimately, I want to make my block of code into a function that works with any model with the same type and number of predictors for a binomial glm. Are there better ways to carry out my code with more tidyverse/dplyr code?



#import parasites file

df<-parasites
m1<-glm(data=df, infected~age+weight+sex, family = "binomial")
summary(m1)
age_grid <- round(seq(min(df$age), max(df$age), length.out = 15))
weight_grid <- round(seq(min(df$weight), max(df$weight), length.out = 15))
newdat <- expand.grid(weight =weight_grid,
age = age_grid, sex = c("female", "male"))

pred <- predict.glm(m1, newdata = newdat, type="link", se=TRUE)
ymin <- m1$family$linkinv(pred$fit - 1.96 * pred$se.fit)
ymax <- m1$family$linkinv(pred$fit + 1.96 * pred$se.fit)
fit <- m1$family$linkinv(pred$fit)
z <- matrix(fit, length(age_grid))
ci.low <- matrix(ymin, length(age_grid))
ci.up <- matrix(ymax, length(age_grid))

x<-data.frame(pred = fit,
low = ymin,
high = ymax,
newdat) %>% mutate(category=cut(age, breaks=c(0, 69, 138, 206), labels =
c("0-69", "70-139", "139-206")))

x$age<-as.factor(x$age)

library(ggplot2)
finalgraph<-ggplot(data=x)+
geom_line(aes(x = weight, y = pred, color = age))+
geom_ribbon(aes(x = weight, ymin = low, ymax = high, fill = age), alpha = 0.1) +
facet_grid(category~sex) +theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
ylab(expression(bold(y = "Infection Probability"))) + xlab(expression(bold("Weight"))) +
theme(legend.position = "right",strip.text.x = element_text(face = "bold", size=12),
strip.text.y = element_text(size=10),
axis.text.y = element_text(size=10, face = "bold"), axis.text.x = element_text(size=10),
axis.title = element_text(size=12),
legend.text=element_text(size=10), legend.title = element_text(size=12, face="bold"))+
labs(linetype="Age (months)", colour="Age (months)", fill = "Age (months)")
finalgraph


Code notes:
Essentially I made a model, created a bunch of values from my predictors (age_grid, v_grid) and made all possible combinations of these values along with the categorical variable of sex using expand.grid.



Then I just used the predict.glm function to extract predicted values based off of expand.grid object. I also extracted std. errors and calculated confidence intervals (ci.up and ci. low). Then I used some dplyr functions to create a dataframe with all this information and also made a new column called category. Category breaks down one of my variables (age) into four distinct groups based of f of breaks I decided on and labelled as decided as well. Then I plotted all of this data using ggplot2.










share|improve this question









New contributor




Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    1
    down vote

    favorite












    I have successfully created a plot of a binomial glm using example data. https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/125/2018/11/parasites.txt



    The predictors of the model include 3 predictors (one categorical, 2 continuous)



    The code works fine but I have been wanting to try and incorporating more dplyr functions and pipes to streamline code. Ultimately, I want to make my block of code into a function that works with any model with the same type and number of predictors for a binomial glm. Are there better ways to carry out my code with more tidyverse/dplyr code?



    #import parasites file

    df<-parasites
    m1<-glm(data=df, infected~age+weight+sex, family = "binomial")
    summary(m1)
    age_grid <- round(seq(min(df$age), max(df$age), length.out = 15))
    weight_grid <- round(seq(min(df$weight), max(df$weight), length.out = 15))
    newdat <- expand.grid(weight =weight_grid,
    age = age_grid, sex = c("female", "male"))

    pred <- predict.glm(m1, newdata = newdat, type="link", se=TRUE)
    ymin <- m1$family$linkinv(pred$fit - 1.96 * pred$se.fit)
    ymax <- m1$family$linkinv(pred$fit + 1.96 * pred$se.fit)
    fit <- m1$family$linkinv(pred$fit)
    z <- matrix(fit, length(age_grid))
    ci.low <- matrix(ymin, length(age_grid))
    ci.up <- matrix(ymax, length(age_grid))

    x<-data.frame(pred = fit,
    low = ymin,
    high = ymax,
    newdat) %>% mutate(category=cut(age, breaks=c(0, 69, 138, 206), labels =
    c("0-69", "70-139", "139-206")))

    x$age<-as.factor(x$age)

    library(ggplot2)
    finalgraph<-ggplot(data=x)+
    geom_line(aes(x = weight, y = pred, color = age))+
    geom_ribbon(aes(x = weight, ymin = low, ymax = high, fill = age), alpha = 0.1) +
    facet_grid(category~sex) +theme(panel.grid.major = element_blank(),
    panel.grid.minor = element_blank())+
    ylab(expression(bold(y = "Infection Probability"))) + xlab(expression(bold("Weight"))) +
    theme(legend.position = "right",strip.text.x = element_text(face = "bold", size=12),
    strip.text.y = element_text(size=10),
    axis.text.y = element_text(size=10, face = "bold"), axis.text.x = element_text(size=10),
    axis.title = element_text(size=12),
    legend.text=element_text(size=10), legend.title = element_text(size=12, face="bold"))+
    labs(linetype="Age (months)", colour="Age (months)", fill = "Age (months)")
    finalgraph


    Code notes:
    Essentially I made a model, created a bunch of values from my predictors (age_grid, v_grid) and made all possible combinations of these values along with the categorical variable of sex using expand.grid.



    Then I just used the predict.glm function to extract predicted values based off of expand.grid object. I also extracted std. errors and calculated confidence intervals (ci.up and ci. low). Then I used some dplyr functions to create a dataframe with all this information and also made a new column called category. Category breaks down one of my variables (age) into four distinct groups based of f of breaks I decided on and labelled as decided as well. Then I plotted all of this data using ggplot2.










    share|improve this question









    New contributor




    Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have successfully created a plot of a binomial glm using example data. https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/125/2018/11/parasites.txt



      The predictors of the model include 3 predictors (one categorical, 2 continuous)



      The code works fine but I have been wanting to try and incorporating more dplyr functions and pipes to streamline code. Ultimately, I want to make my block of code into a function that works with any model with the same type and number of predictors for a binomial glm. Are there better ways to carry out my code with more tidyverse/dplyr code?



      #import parasites file

      df<-parasites
      m1<-glm(data=df, infected~age+weight+sex, family = "binomial")
      summary(m1)
      age_grid <- round(seq(min(df$age), max(df$age), length.out = 15))
      weight_grid <- round(seq(min(df$weight), max(df$weight), length.out = 15))
      newdat <- expand.grid(weight =weight_grid,
      age = age_grid, sex = c("female", "male"))

      pred <- predict.glm(m1, newdata = newdat, type="link", se=TRUE)
      ymin <- m1$family$linkinv(pred$fit - 1.96 * pred$se.fit)
      ymax <- m1$family$linkinv(pred$fit + 1.96 * pred$se.fit)
      fit <- m1$family$linkinv(pred$fit)
      z <- matrix(fit, length(age_grid))
      ci.low <- matrix(ymin, length(age_grid))
      ci.up <- matrix(ymax, length(age_grid))

      x<-data.frame(pred = fit,
      low = ymin,
      high = ymax,
      newdat) %>% mutate(category=cut(age, breaks=c(0, 69, 138, 206), labels =
      c("0-69", "70-139", "139-206")))

      x$age<-as.factor(x$age)

      library(ggplot2)
      finalgraph<-ggplot(data=x)+
      geom_line(aes(x = weight, y = pred, color = age))+
      geom_ribbon(aes(x = weight, ymin = low, ymax = high, fill = age), alpha = 0.1) +
      facet_grid(category~sex) +theme(panel.grid.major = element_blank(),
      panel.grid.minor = element_blank())+
      ylab(expression(bold(y = "Infection Probability"))) + xlab(expression(bold("Weight"))) +
      theme(legend.position = "right",strip.text.x = element_text(face = "bold", size=12),
      strip.text.y = element_text(size=10),
      axis.text.y = element_text(size=10, face = "bold"), axis.text.x = element_text(size=10),
      axis.title = element_text(size=12),
      legend.text=element_text(size=10), legend.title = element_text(size=12, face="bold"))+
      labs(linetype="Age (months)", colour="Age (months)", fill = "Age (months)")
      finalgraph


      Code notes:
      Essentially I made a model, created a bunch of values from my predictors (age_grid, v_grid) and made all possible combinations of these values along with the categorical variable of sex using expand.grid.



      Then I just used the predict.glm function to extract predicted values based off of expand.grid object. I also extracted std. errors and calculated confidence intervals (ci.up and ci. low). Then I used some dplyr functions to create a dataframe with all this information and also made a new column called category. Category breaks down one of my variables (age) into four distinct groups based of f of breaks I decided on and labelled as decided as well. Then I plotted all of this data using ggplot2.










      share|improve this question









      New contributor




      Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have successfully created a plot of a binomial glm using example data. https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/125/2018/11/parasites.txt



      The predictors of the model include 3 predictors (one categorical, 2 continuous)



      The code works fine but I have been wanting to try and incorporating more dplyr functions and pipes to streamline code. Ultimately, I want to make my block of code into a function that works with any model with the same type and number of predictors for a binomial glm. Are there better ways to carry out my code with more tidyverse/dplyr code?



      #import parasites file

      df<-parasites
      m1<-glm(data=df, infected~age+weight+sex, family = "binomial")
      summary(m1)
      age_grid <- round(seq(min(df$age), max(df$age), length.out = 15))
      weight_grid <- round(seq(min(df$weight), max(df$weight), length.out = 15))
      newdat <- expand.grid(weight =weight_grid,
      age = age_grid, sex = c("female", "male"))

      pred <- predict.glm(m1, newdata = newdat, type="link", se=TRUE)
      ymin <- m1$family$linkinv(pred$fit - 1.96 * pred$se.fit)
      ymax <- m1$family$linkinv(pred$fit + 1.96 * pred$se.fit)
      fit <- m1$family$linkinv(pred$fit)
      z <- matrix(fit, length(age_grid))
      ci.low <- matrix(ymin, length(age_grid))
      ci.up <- matrix(ymax, length(age_grid))

      x<-data.frame(pred = fit,
      low = ymin,
      high = ymax,
      newdat) %>% mutate(category=cut(age, breaks=c(0, 69, 138, 206), labels =
      c("0-69", "70-139", "139-206")))

      x$age<-as.factor(x$age)

      library(ggplot2)
      finalgraph<-ggplot(data=x)+
      geom_line(aes(x = weight, y = pred, color = age))+
      geom_ribbon(aes(x = weight, ymin = low, ymax = high, fill = age), alpha = 0.1) +
      facet_grid(category~sex) +theme(panel.grid.major = element_blank(),
      panel.grid.minor = element_blank())+
      ylab(expression(bold(y = "Infection Probability"))) + xlab(expression(bold("Weight"))) +
      theme(legend.position = "right",strip.text.x = element_text(face = "bold", size=12),
      strip.text.y = element_text(size=10),
      axis.text.y = element_text(size=10, face = "bold"), axis.text.x = element_text(size=10),
      axis.title = element_text(size=12),
      legend.text=element_text(size=10), legend.title = element_text(size=12, face="bold"))+
      labs(linetype="Age (months)", colour="Age (months)", fill = "Age (months)")
      finalgraph


      Code notes:
      Essentially I made a model, created a bunch of values from my predictors (age_grid, v_grid) and made all possible combinations of these values along with the categorical variable of sex using expand.grid.



      Then I just used the predict.glm function to extract predicted values based off of expand.grid object. I also extracted std. errors and calculated confidence intervals (ci.up and ci. low). Then I used some dplyr functions to create a dataframe with all this information and also made a new column called category. Category breaks down one of my variables (age) into four distinct groups based of f of breaks I decided on and labelled as decided as well. Then I plotted all of this data using ggplot2.







      r statistics data-visualization






      share|improve this question









      New contributor




      Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 32 mins ago









      200_success

      127k15148411




      127k15148411






      New contributor




      Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 1 hour ago









      Leo Ohyama

      1062




      1062




      New contributor




      Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Leo Ohyama is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.



























          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208355%2fusing-dplyr-and-pipes-for-logistic-regression-plotting%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.













          Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.












          Leo Ohyama is a new contributor. Be nice, and check out our Code of Conduct.















           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208355%2fusing-dplyr-and-pipes-for-logistic-regression-plotting%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Feedback on college project

          Futebolista

          Albești (Vaslui)