Pyspark join with different WHERE condition for each row
up vote
-1
down vote
favorite
I have a table of events with some properties
and an ID
:
table1:
NUMBER; ARRAY; NUMBER; STRING
ID; color; size; description
1; [blue, green, yellow, red]; 40; 'very nice thing'
2; [blue, green, yellow]; 30; 'most beautiful'
And another one with certain properties and an according ID:
table2:
NUMBER; ARRAY
ID; properties
1; [green, 40, nice]
1; [red, 40, nice]
The thing is to INNER JOIN
these two columns ON their IDs with a WHERE condition depending on the properties array in table2:
If the array contains [green, 40, nice], I want to join it with table1 only if:
- 'green' appears in
table1.color
- 40 is in
table1.size
- 'nice' is a part of
table1.description
So the result for the above example is:
ID; color; size; description; properties
1; [blue, green, yellow, red]; 40; 'very nice thing'; [green, 40, nice]
1; [blue, green, yellow, red]; 40; 'very nice thing'; [red, 40, nice]
sql apache-spark pyspark apache-spark-sql
add a comment |
up vote
-1
down vote
favorite
I have a table of events with some properties
and an ID
:
table1:
NUMBER; ARRAY; NUMBER; STRING
ID; color; size; description
1; [blue, green, yellow, red]; 40; 'very nice thing'
2; [blue, green, yellow]; 30; 'most beautiful'
And another one with certain properties and an according ID:
table2:
NUMBER; ARRAY
ID; properties
1; [green, 40, nice]
1; [red, 40, nice]
The thing is to INNER JOIN
these two columns ON their IDs with a WHERE condition depending on the properties array in table2:
If the array contains [green, 40, nice], I want to join it with table1 only if:
- 'green' appears in
table1.color
- 40 is in
table1.size
- 'nice' is a part of
table1.description
So the result for the above example is:
ID; color; size; description; properties
1; [blue, green, yellow, red]; 40; 'very nice thing'; [green, 40, nice]
1; [blue, green, yellow, red]; 40; 'very nice thing'; [red, 40, nice]
sql apache-spark pyspark apache-spark-sql
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I have a table of events with some properties
and an ID
:
table1:
NUMBER; ARRAY; NUMBER; STRING
ID; color; size; description
1; [blue, green, yellow, red]; 40; 'very nice thing'
2; [blue, green, yellow]; 30; 'most beautiful'
And another one with certain properties and an according ID:
table2:
NUMBER; ARRAY
ID; properties
1; [green, 40, nice]
1; [red, 40, nice]
The thing is to INNER JOIN
these two columns ON their IDs with a WHERE condition depending on the properties array in table2:
If the array contains [green, 40, nice], I want to join it with table1 only if:
- 'green' appears in
table1.color
- 40 is in
table1.size
- 'nice' is a part of
table1.description
So the result for the above example is:
ID; color; size; description; properties
1; [blue, green, yellow, red]; 40; 'very nice thing'; [green, 40, nice]
1; [blue, green, yellow, red]; 40; 'very nice thing'; [red, 40, nice]
sql apache-spark pyspark apache-spark-sql
I have a table of events with some properties
and an ID
:
table1:
NUMBER; ARRAY; NUMBER; STRING
ID; color; size; description
1; [blue, green, yellow, red]; 40; 'very nice thing'
2; [blue, green, yellow]; 30; 'most beautiful'
And another one with certain properties and an according ID:
table2:
NUMBER; ARRAY
ID; properties
1; [green, 40, nice]
1; [red, 40, nice]
The thing is to INNER JOIN
these two columns ON their IDs with a WHERE condition depending on the properties array in table2:
If the array contains [green, 40, nice], I want to join it with table1 only if:
- 'green' appears in
table1.color
- 40 is in
table1.size
- 'nice' is a part of
table1.description
So the result for the above example is:
ID; color; size; description; properties
1; [blue, green, yellow, red]; 40; 'very nice thing'; [green, 40, nice]
1; [blue, green, yellow, red]; 40; 'very nice thing'; [red, 40, nice]
sql apache-spark pyspark apache-spark-sql
sql apache-spark pyspark apache-spark-sql
edited yesterday
Ali AzG
432413
432413
asked yesterday
bry888
548
548
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53371441%2fpyspark-join-with-different-where-condition-for-each-row%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown