Combining 2 data frames that together, create one social network into 1 larger dataframe in R
So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:
id age
01 14
02 23
03 52
04 41
05 32
The other data frame contains the information for the connections between the nodes and looks something like this:
id1 id2
01 02
01 05
03 04
05 02
So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:
id age friend1 friend2 Connections
01 14 02 05 2
02 23 01 05 2
03 52 04 1
04 41 03 1
05 32 01 02 2
r
add a comment |
So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:
id age
01 14
02 23
03 52
04 41
05 32
The other data frame contains the information for the connections between the nodes and looks something like this:
id1 id2
01 02
01 05
03 04
05 02
So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:
id age friend1 friend2 Connections
01 14 02 05 2
02 23 01 05 2
03 52 04 1
04 41 03 1
05 32 01 02 2
r
@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14
My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36
add a comment |
So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:
id age
01 14
02 23
03 52
04 41
05 32
The other data frame contains the information for the connections between the nodes and looks something like this:
id1 id2
01 02
01 05
03 04
05 02
So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:
id age friend1 friend2 Connections
01 14 02 05 2
02 23 01 05 2
03 52 04 1
04 41 03 1
05 32 01 02 2
r
So, I currently have two data frames. One contains information on nodes for a given network and it looks something like this:
id age
01 14
02 23
03 52
04 41
05 32
The other data frame contains the information for the connections between the nodes and looks something like this:
id1 id2
01 02
01 05
03 04
05 02
So given these 2 data frames what I want to do is combine them in such a way that the resultant data frame looks like the first one but now has columns for every connection that the node has as well as a column that contains the number of connections that node has (assuming that it never exceeds 5). So it looks something like this:
id age friend1 friend2 Connections
01 14 02 05 2
02 23 01 05 2
03 52 04 1
04 41 03 1
05 32 01 02 2
r
r
edited Nov 21 at 1:13
asked Nov 21 at 1:02
Jack
164
164
@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14
My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36
add a comment |
@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14
My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36
@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14
@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14
My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36
My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36
add a comment |
2 Answers
2
active
oldest
votes
With dplyr
and tidyr
:
df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))
require(dplyr,tidyr)
dff<-left_join(df1,df2,by=c(id="id1")) %>%
left_join(df2,by=c(id="id2")) %>%
gather("friend","friend_id",id1,id2) %>%
filter(!is.na(friend_id)) %>%
group_by(id) %>%
mutate(friend=paste0("friend",row_number()),connections=n()) %>%
spread(friend,friend_id)
# A tibble: 5 x 5
# Groups: id [5]
id age connections friend1 friend2
<int> <int> <int> <int> <int>
1 1 14 2 2 5
2 2 23 2 1 5
3 3 52 1 4 NA
4 4 41 1 3 NA
5 5 32 2 1 2
If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41
That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47
add a comment |
It might be better to convert the second data.frame to adjacency matrix.
# DATA
df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))
df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))
library(igraph)
g = graph.data.frame(df2)
m = as.matrix(get.adjacency(g))
m
# 01 03 05 02 04
#01 0 0 1 1 0
#03 0 0 0 0 1
#05 0 0 0 1 0
#02 0 0 0 0 0
#04 0 0 0 0 0
pmax(rowSums(m), colSums(m)) #Total Connections
#01 03 05 02 04
# 2 1 1 2 1
#Add a column for age
mydat = as.data.frame(m)
mydat$id = row.names(mydat)
merge(mydat, df1, by = "id")
# id 01 03 05 02 04 age
#1 01 0 0 1 1 0 14
#2 02 0 0 0 0 0 23
#3 03 0 0 0 0 1 52
#4 04 0 0 0 0 0 41
#5 05 0 0 0 1 0 32
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403878%2fcombining-2-data-frames-that-together-create-one-social-network-into-1-larger-d%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
With dplyr
and tidyr
:
df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))
require(dplyr,tidyr)
dff<-left_join(df1,df2,by=c(id="id1")) %>%
left_join(df2,by=c(id="id2")) %>%
gather("friend","friend_id",id1,id2) %>%
filter(!is.na(friend_id)) %>%
group_by(id) %>%
mutate(friend=paste0("friend",row_number()),connections=n()) %>%
spread(friend,friend_id)
# A tibble: 5 x 5
# Groups: id [5]
id age connections friend1 friend2
<int> <int> <int> <int> <int>
1 1 14 2 2 5
2 2 23 2 1 5
3 3 52 1 4 NA
4 4 41 1 3 NA
5 5 32 2 1 2
If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41
That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47
add a comment |
With dplyr
and tidyr
:
df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))
require(dplyr,tidyr)
dff<-left_join(df1,df2,by=c(id="id1")) %>%
left_join(df2,by=c(id="id2")) %>%
gather("friend","friend_id",id1,id2) %>%
filter(!is.na(friend_id)) %>%
group_by(id) %>%
mutate(friend=paste0("friend",row_number()),connections=n()) %>%
spread(friend,friend_id)
# A tibble: 5 x 5
# Groups: id [5]
id age connections friend1 friend2
<int> <int> <int> <int> <int>
1 1 14 2 2 5
2 2 23 2 1 5
3 3 52 1 4 NA
4 4 41 1 3 NA
5 5 32 2 1 2
If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41
That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47
add a comment |
With dplyr
and tidyr
:
df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))
require(dplyr,tidyr)
dff<-left_join(df1,df2,by=c(id="id1")) %>%
left_join(df2,by=c(id="id2")) %>%
gather("friend","friend_id",id1,id2) %>%
filter(!is.na(friend_id)) %>%
group_by(id) %>%
mutate(friend=paste0("friend",row_number()),connections=n()) %>%
spread(friend,friend_id)
# A tibble: 5 x 5
# Groups: id [5]
id age connections friend1 friend2
<int> <int> <int> <int> <int>
1 1 14 2 2 5
2 2 23 2 1 5
3 3 52 1 4 NA
4 4 41 1 3 NA
5 5 32 2 1 2
With dplyr
and tidyr
:
df1<-structure(list(id = 1:5, age = c(14L, 23L, 52L, 41L, 32L)), .Names = c("id", "age"), class = "data.frame", row.names = c(NA, -5L))
df2<-structure(list(id1 = c(1L, 1L, 3L, 5L), id2 = c(2L, 5L, 4L, 2L)), .Names = c("id1", "id2"), class = "data.frame", row.names = c(NA, -4L))
require(dplyr,tidyr)
dff<-left_join(df1,df2,by=c(id="id1")) %>%
left_join(df2,by=c(id="id2")) %>%
gather("friend","friend_id",id1,id2) %>%
filter(!is.na(friend_id)) %>%
group_by(id) %>%
mutate(friend=paste0("friend",row_number()),connections=n()) %>%
spread(friend,friend_id)
# A tibble: 5 x 5
# Groups: id [5]
id age connections friend1 friend2
<int> <int> <int> <int> <int>
1 1 14 2 2 5
2 2 23 2 1 5
3 3 52 1 4 NA
4 4 41 1 3 NA
5 5 32 2 1 2
edited Nov 21 at 1:46
answered Nov 21 at 1:28
iod
3,4892721
3,4892721
If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41
That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47
add a comment |
If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41
That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47
If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41
If I run this code I get an error: in UseMethod("left_join") : no applicable method for 'left_join' applied to an object of class "function"
– Jack
Nov 21 at 1:41
That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47
That's probably because we didn't use the same names for the dataframes. I've changed the df names to the same ones used in the other answer (df1 and df2). Now the error should be gone.
– iod
Nov 21 at 1:47
add a comment |
It might be better to convert the second data.frame to adjacency matrix.
# DATA
df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))
df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))
library(igraph)
g = graph.data.frame(df2)
m = as.matrix(get.adjacency(g))
m
# 01 03 05 02 04
#01 0 0 1 1 0
#03 0 0 0 0 1
#05 0 0 0 1 0
#02 0 0 0 0 0
#04 0 0 0 0 0
pmax(rowSums(m), colSums(m)) #Total Connections
#01 03 05 02 04
# 2 1 1 2 1
#Add a column for age
mydat = as.data.frame(m)
mydat$id = row.names(mydat)
merge(mydat, df1, by = "id")
# id 01 03 05 02 04 age
#1 01 0 0 1 1 0 14
#2 02 0 0 0 0 0 23
#3 03 0 0 0 0 1 52
#4 04 0 0 0 0 0 41
#5 05 0 0 0 1 0 32
add a comment |
It might be better to convert the second data.frame to adjacency matrix.
# DATA
df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))
df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))
library(igraph)
g = graph.data.frame(df2)
m = as.matrix(get.adjacency(g))
m
# 01 03 05 02 04
#01 0 0 1 1 0
#03 0 0 0 0 1
#05 0 0 0 1 0
#02 0 0 0 0 0
#04 0 0 0 0 0
pmax(rowSums(m), colSums(m)) #Total Connections
#01 03 05 02 04
# 2 1 1 2 1
#Add a column for age
mydat = as.data.frame(m)
mydat$id = row.names(mydat)
merge(mydat, df1, by = "id")
# id 01 03 05 02 04 age
#1 01 0 0 1 1 0 14
#2 02 0 0 0 0 0 23
#3 03 0 0 0 0 1 52
#4 04 0 0 0 0 0 41
#5 05 0 0 0 1 0 32
add a comment |
It might be better to convert the second data.frame to adjacency matrix.
# DATA
df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))
df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))
library(igraph)
g = graph.data.frame(df2)
m = as.matrix(get.adjacency(g))
m
# 01 03 05 02 04
#01 0 0 1 1 0
#03 0 0 0 0 1
#05 0 0 0 1 0
#02 0 0 0 0 0
#04 0 0 0 0 0
pmax(rowSums(m), colSums(m)) #Total Connections
#01 03 05 02 04
# 2 1 1 2 1
#Add a column for age
mydat = as.data.frame(m)
mydat$id = row.names(mydat)
merge(mydat, df1, by = "id")
# id 01 03 05 02 04 age
#1 01 0 0 1 1 0 14
#2 02 0 0 0 0 0 23
#3 03 0 0 0 0 1 52
#4 04 0 0 0 0 0 41
#5 05 0 0 0 1 0 32
It might be better to convert the second data.frame to adjacency matrix.
# DATA
df1 = structure(list(id = c("01", "02", "03", "04", "05"), age = c(14,
23, 52, 41, 32)), class = "data.frame", row.names = c(NA, -5L))
df2 = structure(list(id1 = c("01", "01", "03", "05"), id2 = c("02",
"05", "04", "02")), class = "data.frame", row.names = c(NA, -4L))
library(igraph)
g = graph.data.frame(df2)
m = as.matrix(get.adjacency(g))
m
# 01 03 05 02 04
#01 0 0 1 1 0
#03 0 0 0 0 1
#05 0 0 0 1 0
#02 0 0 0 0 0
#04 0 0 0 0 0
pmax(rowSums(m), colSums(m)) #Total Connections
#01 03 05 02 04
# 2 1 1 2 1
#Add a column for age
mydat = as.data.frame(m)
mydat$id = row.names(mydat)
merge(mydat, df1, by = "id")
# id 01 03 05 02 04 age
#1 01 0 0 1 1 0 14
#2 02 0 0 0 0 0 23
#3 03 0 0 0 0 1 52
#4 04 0 0 0 0 0 41
#5 05 0 0 0 1 0 32
edited Nov 21 at 2:37
answered Nov 21 at 1:20
d.b
18.5k41846
18.5k41846
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403878%2fcombining-2-data-frames-that-together-create-one-social-network-into-1-larger-d%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
@TimBiegeleisen, Thank you for the pointer. I edited the post to clarify that a node will have no more than 5 connections at any given moment.
– Jack
Nov 21 at 1:14
My solution would work no matter how many connections any of the ids has.
– iod
Nov 21 at 1:36