Combining 2 data frames that together, create one social network into 1 larger dataframe in R












2














So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:



id age
01 14
02 23
03 52
04 41
05 32


The other data frame contains the information for the connections between the nodes and looks something like this:



id1 id2
01 02
01 05
03 04
05 02


So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:



id age  friend1 friend2  Connections
01 14 02 05 2
02 23 01 05 2
03 52 04 1
04 41 03 1
05 32 01 02 2









share|improve this question
























  • @TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
    – Jack
    Nov 21 at 1:14












  • My solution would work no matter how many connections any of the ids has.
    – iod
    Nov 21 at 1:36
















2














So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:



id age
01 14
02 23
03 52
04 41
05 32


The other data frame contains the information for the connections between the nodes and looks something like this:



id1 id2
01 02
01 05
03 04
05 02


So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:



id age  friend1 friend2  Connections
01 14 02 05 2
02 23 01 05 2
03 52 04 1
04 41 03 1
05 32 01 02 2









share|improve this question
























  • @TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
    – Jack
    Nov 21 at 1:14












  • My solution would work no matter how many connections any of the ids has.
    – iod
    Nov 21 at 1:36














2












2








2







So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:



id age
01 14
02 23
03 52
04 41
05 32


The other data frame contains the information for the connections between the nodes and looks something like this:



id1 id2
01 02
01 05
03 04
05 02


So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:



id age  friend1 friend2  Connections
01 14 02 05 2
02 23 01 05 2
03 52 04 1
04 41 03 1
05 32 01 02 2









share|improve this question















So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:



id age
01 14
02 23
03 52
04 41
05 32


The other data frame contains the information for the connections between the nodes and looks something like this:



id1 id2
01 02
01 05
03 04
05 02


So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:



id age  friend1 friend2  Connections
01 14 02 05 2
02 23 01 05 2
03 52 04 1
04 41 03 1
05 32 01 02 2






r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 at 1:13

























asked Nov 21 at 1:02









Jack

164




164












  • @TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
    – Jack
    Nov 21 at 1:14












  • My solution would work no matter how many connections any of the ids has.
    – iod
    Nov 21 at 1:36


















  • @TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
    – Jack
    Nov 21 at 1:14












  • My solution would work no matter how many connections any of the ids has.
    – iod
    Nov 21 at 1:36
















@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14






@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14














My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36




My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36












2 Answers
2






active

oldest

votes


















0














With dplyr and tidyr:



df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))

require(dplyr,tidyr)
dff<-left_join(df1,df2,by=c(id="id1")) %>%
left_join(df2,by=c(id="id2")) %>%
gather("friend","friend_id",id1,id2) %>%
filter(!is.na(friend_id)) %>%
group_by(id) %>%
mutate(friend=paste0("friend",row_number()),connections=n()) %>%
spread(friend,friend_id)

# A tibble: 5 x 5
# Groups: id [5]
id age connections friend1 friend2
<int> <int> <int> <int> <int>
1 1 14 2 2 5
2 2 23 2 1 5
3 3 52 1 4 NA
4 4 41 1 3 NA
5 5 32 2 1 2





share|improve this answer























  • If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
    – Jack
    Nov 21 at 1:41












  • That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
    – iod
    Nov 21 at 1:47





















2














It might be better to convert the second data.frame to adjacency matrix.



# DATA
df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))

df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))

library(igraph)
g = graph.data.frame(df2)
m = as.matrix(get.adjacency(g))
m
# 01 03 05 02 04
#01 0 0 1 1 0
#03 0 0 0 0 1
#05 0 0 0 1 0
#02 0 0 0 0 0
#04 0 0 0 0 0

pmax(rowSums(m), colSums(m)) #Total Connections
#01 03 05 02 04
# 2 1 1 2 1

#Add a column for age
mydat = as.data.frame(m)
mydat$id = row.names(mydat)
merge(mydat, df1, by = "id")
# id 01 03 05 02 04 age
#1 01 0 0 1 1 0 14
#2 02 0 0 0 0 0 23
#3 03 0 0 0 0 1 52
#4 04 0 0 0 0 0 41
#5 05 0 0 0 1 0 32





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403878%2fcombining-2-data-frames-that-together-create-one-social-network-into-1-larger-d%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    With dplyr and tidyr:



    df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
    df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))

    require(dplyr,tidyr)
    dff<-left_join(df1,df2,by=c(id="id1")) %>%
    left_join(df2,by=c(id="id2")) %>%
    gather("friend","friend_id",id1,id2) %>%
    filter(!is.na(friend_id)) %>%
    group_by(id) %>%
    mutate(friend=paste0("friend",row_number()),connections=n()) %>%
    spread(friend,friend_id)

    # A tibble: 5 x 5
    # Groups: id [5]
    id age connections friend1 friend2
    <int> <int> <int> <int> <int>
    1 1 14 2 2 5
    2 2 23 2 1 5
    3 3 52 1 4 NA
    4 4 41 1 3 NA
    5 5 32 2 1 2





    share|improve this answer























    • If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
      – Jack
      Nov 21 at 1:41












    • That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
      – iod
      Nov 21 at 1:47


















    0














    With dplyr and tidyr:



    df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
    df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))

    require(dplyr,tidyr)
    dff<-left_join(df1,df2,by=c(id="id1")) %>%
    left_join(df2,by=c(id="id2")) %>%
    gather("friend","friend_id",id1,id2) %>%
    filter(!is.na(friend_id)) %>%
    group_by(id) %>%
    mutate(friend=paste0("friend",row_number()),connections=n()) %>%
    spread(friend,friend_id)

    # A tibble: 5 x 5
    # Groups: id [5]
    id age connections friend1 friend2
    <int> <int> <int> <int> <int>
    1 1 14 2 2 5
    2 2 23 2 1 5
    3 3 52 1 4 NA
    4 4 41 1 3 NA
    5 5 32 2 1 2





    share|improve this answer























    • If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
      – Jack
      Nov 21 at 1:41












    • That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
      – iod
      Nov 21 at 1:47
















    0












    0








    0






    With dplyr and tidyr:



    df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
    df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))

    require(dplyr,tidyr)
    dff<-left_join(df1,df2,by=c(id="id1")) %>%
    left_join(df2,by=c(id="id2")) %>%
    gather("friend","friend_id",id1,id2) %>%
    filter(!is.na(friend_id)) %>%
    group_by(id) %>%
    mutate(friend=paste0("friend",row_number()),connections=n()) %>%
    spread(friend,friend_id)

    # A tibble: 5 x 5
    # Groups: id [5]
    id age connections friend1 friend2
    <int> <int> <int> <int> <int>
    1 1 14 2 2 5
    2 2 23 2 1 5
    3 3 52 1 4 NA
    4 4 41 1 3 NA
    5 5 32 2 1 2





    share|improve this answer














    With dplyr and tidyr:



    df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
    df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))

    require(dplyr,tidyr)
    dff<-left_join(df1,df2,by=c(id="id1")) %>%
    left_join(df2,by=c(id="id2")) %>%
    gather("friend","friend_id",id1,id2) %>%
    filter(!is.na(friend_id)) %>%
    group_by(id) %>%
    mutate(friend=paste0("friend",row_number()),connections=n()) %>%
    spread(friend,friend_id)

    # A tibble: 5 x 5
    # Groups: id [5]
    id age connections friend1 friend2
    <int> <int> <int> <int> <int>
    1 1 14 2 2 5
    2 2 23 2 1 5
    3 3 52 1 4 NA
    4 4 41 1 3 NA
    5 5 32 2 1 2






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 21 at 1:46

























    answered Nov 21 at 1:28









    iod

    3,4892721




    3,4892721












    • If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
      – Jack
      Nov 21 at 1:41












    • That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
      – iod
      Nov 21 at 1:47




















    • If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
      – Jack
      Nov 21 at 1:41












    • That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
      – iod
      Nov 21 at 1:47


















    If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
    – Jack
    Nov 21 at 1:41






    If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
    – Jack
    Nov 21 at 1:41














    That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
    – iod
    Nov 21 at 1:47






    That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
    – iod
    Nov 21 at 1:47















    2














    It might be better to convert the second data.frame to adjacency matrix.



    # DATA
    df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
    23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))

    df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
    "05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))

    library(igraph)
    g = graph.data.frame(df2)
    m = as.matrix(get.adjacency(g))
    m
    # 01 03 05 02 04
    #01 0 0 1 1 0
    #03 0 0 0 0 1
    #05 0 0 0 1 0
    #02 0 0 0 0 0
    #04 0 0 0 0 0

    pmax(rowSums(m), colSums(m)) #Total Connections
    #01 03 05 02 04
    # 2 1 1 2 1

    #Add a column for age
    mydat = as.data.frame(m)
    mydat$id = row.names(mydat)
    merge(mydat, df1, by = "id")
    # id 01 03 05 02 04 age
    #1 01 0 0 1 1 0 14
    #2 02 0 0 0 0 0 23
    #3 03 0 0 0 0 1 52
    #4 04 0 0 0 0 0 41
    #5 05 0 0 0 1 0 32





    share|improve this answer




























      2














      It might be better to convert the second data.frame to adjacency matrix.



      # DATA
      df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
      23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))

      df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
      "05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))

      library(igraph)
      g = graph.data.frame(df2)
      m = as.matrix(get.adjacency(g))
      m
      # 01 03 05 02 04
      #01 0 0 1 1 0
      #03 0 0 0 0 1
      #05 0 0 0 1 0
      #02 0 0 0 0 0
      #04 0 0 0 0 0

      pmax(rowSums(m), colSums(m)) #Total Connections
      #01 03 05 02 04
      # 2 1 1 2 1

      #Add a column for age
      mydat = as.data.frame(m)
      mydat$id = row.names(mydat)
      merge(mydat, df1, by = "id")
      # id 01 03 05 02 04 age
      #1 01 0 0 1 1 0 14
      #2 02 0 0 0 0 0 23
      #3 03 0 0 0 0 1 52
      #4 04 0 0 0 0 0 41
      #5 05 0 0 0 1 0 32





      share|improve this answer


























        2












        2








        2






        It might be better to convert the second data.frame to adjacency matrix.



        # DATA
        df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
        23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))

        df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
        "05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))

        library(igraph)
        g = graph.data.frame(df2)
        m = as.matrix(get.adjacency(g))
        m
        # 01 03 05 02 04
        #01 0 0 1 1 0
        #03 0 0 0 0 1
        #05 0 0 0 1 0
        #02 0 0 0 0 0
        #04 0 0 0 0 0

        pmax(rowSums(m), colSums(m)) #Total Connections
        #01 03 05 02 04
        # 2 1 1 2 1

        #Add a column for age
        mydat = as.data.frame(m)
        mydat$id = row.names(mydat)
        merge(mydat, df1, by = "id")
        # id 01 03 05 02 04 age
        #1 01 0 0 1 1 0 14
        #2 02 0 0 0 0 0 23
        #3 03 0 0 0 0 1 52
        #4 04 0 0 0 0 0 41
        #5 05 0 0 0 1 0 32





        share|improve this answer














        It might be better to convert the second data.frame to adjacency matrix.



        # DATA
        df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
        23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))

        df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
        "05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))

        library(igraph)
        g = graph.data.frame(df2)
        m = as.matrix(get.adjacency(g))
        m
        # 01 03 05 02 04
        #01 0 0 1 1 0
        #03 0 0 0 0 1
        #05 0 0 0 1 0
        #02 0 0 0 0 0
        #04 0 0 0 0 0

        pmax(rowSums(m), colSums(m)) #Total Connections
        #01 03 05 02 04
        # 2 1 1 2 1

        #Add a column for age
        mydat = as.data.frame(m)
        mydat$id = row.names(mydat)
        merge(mydat, df1, by = "id")
        # id 01 03 05 02 04 age
        #1 01 0 0 1 1 0 14
        #2 02 0 0 0 0 0 23
        #3 03 0 0 0 0 1 52
        #4 04 0 0 0 0 0 41
        #5 05 0 0 0 1 0 32






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 21 at 2:37

























        answered Nov 21 at 1:20









        d.b

        18.5k41846




        18.5k41846






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403878%2fcombining-2-data-frames-that-together-create-one-social-network-into-1-larger-d%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            TypeError: fit_transform() missing 1 required positional argument: 'X'