Combining 2 data frames that together, create one social network into 1 larger dataframe in R

So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:

The other data frame contains the information for the connections between the nodes and looks something like this:

So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:

id age  friend1 friend2  Connections

01  14  02     05        2

02  23  01     05        2

03  52  04               1

04  41  03               1

05  32  01     02        2

edited Nov 21 at 1:13

asked Nov 21 at 1:02

Jack

164

@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14

My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36

add a comment |

So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:

The other data frame contains the information for the connections between the nodes and looks something like this:

id age  friend1 friend2  Connections

01  14  02     05        2

02  23  01     05        2

03  52  04               1

04  41  03               1

05  32  01     02        2

edited Nov 21 at 1:13

asked Nov 21 at 1:02

Jack

164

@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14

My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36

add a comment |

So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:

The other data frame contains the information for the connections between the nodes and looks something like this:

id age  friend1 friend2  Connections

01  14  02     05        2

02  23  01     05        2

03  52  04               1

04  41  03               1

05  32  01     02        2

edited Nov 21 at 1:13

asked Nov 21 at 1:02

Jack

164

So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:

The other data frame contains the information for the connections between the nodes and looks something like this:

id age  friend1 friend2  Connections

01  14  02     05        2

02  23  01     05        2

03  52  04               1

04  41  03               1

05  32  01     02        2

edited Nov 21 at 1:13

asked Nov 21 at 1:02

Jack

164

edited Nov 21 at 1:13

asked Nov 21 at 1:02

Jack

164

edited Nov 21 at 1:13

asked Nov 21 at 1:02

Jack

164

asked Nov 21 at 1:02

Jack

164

asked Nov 21 at 1:02

Jack

164

@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14

My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36

add a comment |

@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14

My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36

@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14

My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36

add a comment |

2 Answers
2

active

oldest

votes

With dplyr and tidyr:

df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))

df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))



require(dplyr,tidyr)

dff<-left_join(df1,df2,by=c(id="id1")) %>% 

  left_join(df2,by=c(id="id2")) %>% 

  gather("friend","friend_id",id1,id2) %>%

  filter(!is.na(friend_id)) %>%

  group_by(id) %>%

  mutate(friend=paste0("friend",row_number()),connections=n()) %>%

  spread(friend,friend_id)



# A tibble: 5 x 5

# Groups:   id [5]

     id   age connections friend1 friend2

  <int> <int>       <int>   <int>   <int>

1     1    14           2       2       5

2     2    23           2       1       5

3     3    52           1       4      NA

4     4    41           1       3      NA

5     5    32           2       1       2

edited Nov 21 at 1:46

answered Nov 21 at 1:28

iod

3,4892721

If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41

That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47

add a comment |

It might be better to convert the second data.frame to adjacency matrix.

# DATA

df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14, 

23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))



df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02", 

"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))



library(igraph)

g = graph.data.frame(df2)

m = as.matrix(get.adjacency(g))

m

#   01 03 05 02 04

#01  0  0  1  1  0

#03  0  0  0  0  1

#05  0  0  0  1  0

#02  0  0  0  0  0

#04  0  0  0  0  0



pmax(rowSums(m), colSums(m)) #Total Connections

#01 03 05 02 04 

# 2  1  1  2  1



#Add a column for age

mydat = as.data.frame(m)

mydat$id = row.names(mydat)

merge(mydat, df1, by = "id")

#  id 01 03 05 02 04 age

#1 01  0  0  1  1  0  14

#2 02  0  0  0  0  0  23

#3 03  0  0  0  0  1  52

#4 04  0  0  0  0  0  41

#5 05  0  0  0  1  0  32

edited Nov 21 at 2:37

answered Nov 21 at 1:20

d.b

18.5k41846

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403878%2fcombining-2-data-frames-that-together-create-one-social-network-into-1-larger-d%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

With dplyr and tidyr:

df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))

df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))



require(dplyr,tidyr)

dff<-left_join(df1,df2,by=c(id="id1")) %>% 

  left_join(df2,by=c(id="id2")) %>% 

  gather("friend","friend_id",id1,id2) %>%

  filter(!is.na(friend_id)) %>%

  group_by(id) %>%

  mutate(friend=paste0("friend",row_number()),connections=n()) %>%

  spread(friend,friend_id)



# A tibble: 5 x 5

# Groups:   id [5]

     id   age connections friend1 friend2

  <int> <int>       <int>   <int>   <int>

1     1    14           2       2       5

2     2    23           2       1       5

3     3    52           1       4      NA

4     4    41           1       3      NA

5     5    32           2       1       2

edited Nov 21 at 1:46

answered Nov 21 at 1:28

iod

3,4892721

If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41

That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47

add a comment |

With dplyr and tidyr:

df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))

df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))



require(dplyr,tidyr)

dff<-left_join(df1,df2,by=c(id="id1")) %>% 

  left_join(df2,by=c(id="id2")) %>% 

  gather("friend","friend_id",id1,id2) %>%

  filter(!is.na(friend_id)) %>%

  group_by(id) %>%

  mutate(friend=paste0("friend",row_number()),connections=n()) %>%

  spread(friend,friend_id)



# A tibble: 5 x 5

# Groups:   id [5]

     id   age connections friend1 friend2

  <int> <int>       <int>   <int>   <int>

1     1    14           2       2       5

2     2    23           2       1       5

3     3    52           1       4      NA

4     4    41           1       3      NA

5     5    32           2       1       2

edited Nov 21 at 1:46

answered Nov 21 at 1:28

iod

3,4892721

If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41

That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47

add a comment |

With dplyr and tidyr:

df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))

df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))



require(dplyr,tidyr)

dff<-left_join(df1,df2,by=c(id="id1")) %>% 

  left_join(df2,by=c(id="id2")) %>% 

  gather("friend","friend_id",id1,id2) %>%

  filter(!is.na(friend_id)) %>%

  group_by(id) %>%

  mutate(friend=paste0("friend",row_number()),connections=n()) %>%

  spread(friend,friend_id)



# A tibble: 5 x 5

# Groups:   id [5]

     id   age connections friend1 friend2

  <int> <int>       <int>   <int>   <int>

1     1    14           2       2       5

2     2    23           2       1       5

3     3    52           1       4      NA

4     4    41           1       3      NA

5     5    32           2       1       2

edited Nov 21 at 1:46

answered Nov 21 at 1:28

iod

3,4892721

With dplyr and tidyr:

df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))

df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))



require(dplyr,tidyr)

dff<-left_join(df1,df2,by=c(id="id1")) %>% 

  left_join(df2,by=c(id="id2")) %>% 

  gather("friend","friend_id",id1,id2) %>%

  filter(!is.na(friend_id)) %>%

  group_by(id) %>%

  mutate(friend=paste0("friend",row_number()),connections=n()) %>%

  spread(friend,friend_id)



# A tibble: 5 x 5

# Groups:   id [5]

     id   age connections friend1 friend2

  <int> <int>       <int>   <int>   <int>

1     1    14           2       2       5

2     2    23           2       1       5

3     3    52           1       4      NA

4     4    41           1       3      NA

5     5    32           2       1       2

edited Nov 21 at 1:46

answered Nov 21 at 1:28

iod

3,4892721

edited Nov 21 at 1:46

answered Nov 21 at 1:28

iod

3,4892721

answered Nov 21 at 1:28

iod

3,4892721

answered Nov 21 at 1:28

iod

3,4892721

If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41

That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47

add a comment |

If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41

That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47

If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41

That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47

add a comment |

It might be better to convert the second data.frame to adjacency matrix.

# DATA

df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14, 

23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))



df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02", 

"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))



library(igraph)

g = graph.data.frame(df2)

m = as.matrix(get.adjacency(g))

m

#   01 03 05 02 04

#01  0  0  1  1  0

#03  0  0  0  0  1

#05  0  0  0  1  0

#02  0  0  0  0  0

#04  0  0  0  0  0



pmax(rowSums(m), colSums(m)) #Total Connections

#01 03 05 02 04 

# 2  1  1  2  1



#Add a column for age

mydat = as.data.frame(m)

mydat$id = row.names(mydat)

merge(mydat, df1, by = "id")

#  id 01 03 05 02 04 age

#1 01  0  0  1  1  0  14

#2 02  0  0  0  0  0  23

#3 03  0  0  0  0  1  52

#4 04  0  0  0  0  0  41

#5 05  0  0  0  1  0  32

edited Nov 21 at 2:37

answered Nov 21 at 1:20

d.b

18.5k41846

add a comment |

It might be better to convert the second data.frame to adjacency matrix.

# DATA

df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14, 

23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))



df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02", 

"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))



library(igraph)

g = graph.data.frame(df2)

m = as.matrix(get.adjacency(g))

m

#   01 03 05 02 04

#01  0  0  1  1  0

#03  0  0  0  0  1

#05  0  0  0  1  0

#02  0  0  0  0  0

#04  0  0  0  0  0



pmax(rowSums(m), colSums(m)) #Total Connections

#01 03 05 02 04 

# 2  1  1  2  1



#Add a column for age

mydat = as.data.frame(m)

mydat$id = row.names(mydat)

merge(mydat, df1, by = "id")

#  id 01 03 05 02 04 age

#1 01  0  0  1  1  0  14

#2 02  0  0  0  0  0  23

#3 03  0  0  0  0  1  52

#4 04  0  0  0  0  0  41

#5 05  0  0  0  1  0  32

edited Nov 21 at 2:37

answered Nov 21 at 1:20

d.b

18.5k41846

add a comment |

It might be better to convert the second data.frame to adjacency matrix.

# DATA

df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14, 

23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))



df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02", 

"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))



library(igraph)

g = graph.data.frame(df2)

m = as.matrix(get.adjacency(g))

m

#   01 03 05 02 04

#01  0  0  1  1  0

#03  0  0  0  0  1

#05  0  0  0  1  0

#02  0  0  0  0  0

#04  0  0  0  0  0



pmax(rowSums(m), colSums(m)) #Total Connections

#01 03 05 02 04 

# 2  1  1  2  1



#Add a column for age

mydat = as.data.frame(m)

mydat$id = row.names(mydat)

merge(mydat, df1, by = "id")

#  id 01 03 05 02 04 age

#1 01  0  0  1  1  0  14

#2 02  0  0  0  0  0  23

#3 03  0  0  0  0  1  52

#4 04  0  0  0  0  0  41

#5 05  0  0  0  1  0  32

edited Nov 21 at 2:37

answered Nov 21 at 1:20

d.b

18.5k41846

It might be better to convert the second data.frame to adjacency matrix.

# DATA

df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14, 

23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))



df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02", 

"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))



library(igraph)

g = graph.data.frame(df2)

m = as.matrix(get.adjacency(g))

m

#   01 03 05 02 04

#01  0  0  1  1  0

#03  0  0  0  0  1

#05  0  0  0  1  0

#02  0  0  0  0  0

#04  0  0  0  0  0



pmax(rowSums(m), colSums(m)) #Total Connections

#01 03 05 02 04 

# 2  1  1  2  1



#Add a column for age

mydat = as.data.frame(m)

mydat$id = row.names(mydat)

merge(mydat, df1, by = "id")

#  id 01 03 05 02 04 age

#1 01  0  0  1  1  0  14

#2 02  0  0  0  0  0  23

#3 03  0  0  0  0  1  52

#4 04  0  0  0  0  0  41

#5 05  0  0  0  1  0  32

edited Nov 21 at 2:37

answered Nov 21 at 1:20

d.b

18.5k41846

edited Nov 21 at 2:37

answered Nov 21 at 1:20

d.b

18.5k41846

answered Nov 21 at 1:20

d.b

18.5k41846

answered Nov 21 at 1:20

d.b

18.5k41846

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk