Merge different length dataframes, Join column in dataframe dont have unique values
I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.
enter code here
Df1
Ticket Fare Cabin Embarked
0 110152 86.50 B79 S
1 110152 92.50 B77 S
2 110413 79.65 E67 S
3 110413 79.65 E68 S
4 110465 52.00 C110 S
5 110465 52.00 A14 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
df2
Survived Ticket
PassengerId
1 0 A/5 21171
2 1 PC 17599
3 1 STON/O2. 3101282
4 1 113803
5 0 373450
6 0 330877
7 0 17463
8 0 349909
9 1 347742
10 1 237736
There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.
eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.
pass
engerID Survived Ticket Fare Cabin Embarked
0 0 110152 86.50 NaN S
0 1 110152 90.50 C85 C
1 1 STON/O2.3101 7.9250 NaN S
2 1 113803 53.1000 C123 S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
Here passenger 0 is having to records with different prices but it should have only one record after merge.
python pandas join merge left-join
add a comment |
I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.
enter code here
Df1
Ticket Fare Cabin Embarked
0 110152 86.50 B79 S
1 110152 92.50 B77 S
2 110413 79.65 E67 S
3 110413 79.65 E68 S
4 110465 52.00 C110 S
5 110465 52.00 A14 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
df2
Survived Ticket
PassengerId
1 0 A/5 21171
2 1 PC 17599
3 1 STON/O2. 3101282
4 1 113803
5 0 373450
6 0 330877
7 0 17463
8 0 349909
9 1 347742
10 1 237736
There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.
eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.
pass
engerID Survived Ticket Fare Cabin Embarked
0 0 110152 86.50 NaN S
0 1 110152 90.50 C85 C
1 1 STON/O2.3101 7.9250 NaN S
2 1 113803 53.1000 C123 S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
Here passenger 0 is having to records with different prices but it should have only one record after merge.
python pandas join merge left-join
add a comment |
I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.
enter code here
Df1
Ticket Fare Cabin Embarked
0 110152 86.50 B79 S
1 110152 92.50 B77 S
2 110413 79.65 E67 S
3 110413 79.65 E68 S
4 110465 52.00 C110 S
5 110465 52.00 A14 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
df2
Survived Ticket
PassengerId
1 0 A/5 21171
2 1 PC 17599
3 1 STON/O2. 3101282
4 1 113803
5 0 373450
6 0 330877
7 0 17463
8 0 349909
9 1 347742
10 1 237736
There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.
eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.
pass
engerID Survived Ticket Fare Cabin Embarked
0 0 110152 86.50 NaN S
0 1 110152 90.50 C85 C
1 1 STON/O2.3101 7.9250 NaN S
2 1 113803 53.1000 C123 S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
Here passenger 0 is having to records with different prices but it should have only one record after merge.
python pandas join merge left-join
I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.
enter code here
Df1
Ticket Fare Cabin Embarked
0 110152 86.50 B79 S
1 110152 92.50 B77 S
2 110413 79.65 E67 S
3 110413 79.65 E68 S
4 110465 52.00 C110 S
5 110465 52.00 A14 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
df2
Survived Ticket
PassengerId
1 0 A/5 21171
2 1 PC 17599
3 1 STON/O2. 3101282
4 1 113803
5 0 373450
6 0 330877
7 0 17463
8 0 349909
9 1 347742
10 1 237736
There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.
eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.
pass
engerID Survived Ticket Fare Cabin Embarked
0 0 110152 86.50 NaN S
0 1 110152 90.50 C85 C
1 1 STON/O2.3101 7.9250 NaN S
2 1 113803 53.1000 C123 S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
Here passenger 0 is having to records with different prices but it should have only one record after merge.
python pandas join merge left-join
python pandas join merge left-join
asked Nov 25 '18 at 10:58
punit kumar Sharmapunit kumar Sharma
419
419
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
If I understand correctly, the issue is with multiple records coming after the merge statement.
You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:
In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)
In [299]: df1
Out[299]:
Ticket Fare Cabin Embarked rank
0 110152 86.50 B79 S 2.0
1 110152 92.50 B77 S 1.0
2 110413 79.65 E67 S 1.0
3 110413 79.65 E68 S 2.0
4 110465 52.00 C110 S 1.0
5 110465 52.00 A14 S 2.0
6 110564 26.55 C52 S 1.0
7 110813 75.25 D37 C 1.0
8 111240 33.50 B19 S 1.0
9 111320 38.50 E63 S 1.0
In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)
In [304]: df1
Out[304]:
Ticket Fare Cabin Embarked
1 110152 92.50 B77 S
2 110413 79.65 E67 S
4 110465 52.00 C110 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
Now, if you see , df1
has only 1 record per ticket number. Now, you merge
statement will not produce duplicates.
Let me know if this helps.
@punitkumarsharma Let me know if the answer helped you?
– Mayank Porwal
Nov 26 '18 at 13:28
1
well that's a legit one. your solution worked for me thank you so much.
– punit kumar Sharma
Nov 27 '18 at 9:47
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466778%2fmerge-different-length-dataframes-join-column-in-dataframe-dont-have-unique-val%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If I understand correctly, the issue is with multiple records coming after the merge statement.
You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:
In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)
In [299]: df1
Out[299]:
Ticket Fare Cabin Embarked rank
0 110152 86.50 B79 S 2.0
1 110152 92.50 B77 S 1.0
2 110413 79.65 E67 S 1.0
3 110413 79.65 E68 S 2.0
4 110465 52.00 C110 S 1.0
5 110465 52.00 A14 S 2.0
6 110564 26.55 C52 S 1.0
7 110813 75.25 D37 C 1.0
8 111240 33.50 B19 S 1.0
9 111320 38.50 E63 S 1.0
In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)
In [304]: df1
Out[304]:
Ticket Fare Cabin Embarked
1 110152 92.50 B77 S
2 110413 79.65 E67 S
4 110465 52.00 C110 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
Now, if you see , df1
has only 1 record per ticket number. Now, you merge
statement will not produce duplicates.
Let me know if this helps.
@punitkumarsharma Let me know if the answer helped you?
– Mayank Porwal
Nov 26 '18 at 13:28
1
well that's a legit one. your solution worked for me thank you so much.
– punit kumar Sharma
Nov 27 '18 at 9:47
add a comment |
If I understand correctly, the issue is with multiple records coming after the merge statement.
You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:
In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)
In [299]: df1
Out[299]:
Ticket Fare Cabin Embarked rank
0 110152 86.50 B79 S 2.0
1 110152 92.50 B77 S 1.0
2 110413 79.65 E67 S 1.0
3 110413 79.65 E68 S 2.0
4 110465 52.00 C110 S 1.0
5 110465 52.00 A14 S 2.0
6 110564 26.55 C52 S 1.0
7 110813 75.25 D37 C 1.0
8 111240 33.50 B19 S 1.0
9 111320 38.50 E63 S 1.0
In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)
In [304]: df1
Out[304]:
Ticket Fare Cabin Embarked
1 110152 92.50 B77 S
2 110413 79.65 E67 S
4 110465 52.00 C110 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
Now, if you see , df1
has only 1 record per ticket number. Now, you merge
statement will not produce duplicates.
Let me know if this helps.
@punitkumarsharma Let me know if the answer helped you?
– Mayank Porwal
Nov 26 '18 at 13:28
1
well that's a legit one. your solution worked for me thank you so much.
– punit kumar Sharma
Nov 27 '18 at 9:47
add a comment |
If I understand correctly, the issue is with multiple records coming after the merge statement.
You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:
In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)
In [299]: df1
Out[299]:
Ticket Fare Cabin Embarked rank
0 110152 86.50 B79 S 2.0
1 110152 92.50 B77 S 1.0
2 110413 79.65 E67 S 1.0
3 110413 79.65 E68 S 2.0
4 110465 52.00 C110 S 1.0
5 110465 52.00 A14 S 2.0
6 110564 26.55 C52 S 1.0
7 110813 75.25 D37 C 1.0
8 111240 33.50 B19 S 1.0
9 111320 38.50 E63 S 1.0
In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)
In [304]: df1
Out[304]:
Ticket Fare Cabin Embarked
1 110152 92.50 B77 S
2 110413 79.65 E67 S
4 110465 52.00 C110 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
Now, if you see , df1
has only 1 record per ticket number. Now, you merge
statement will not produce duplicates.
Let me know if this helps.
If I understand correctly, the issue is with multiple records coming after the merge statement.
You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:
In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)
In [299]: df1
Out[299]:
Ticket Fare Cabin Embarked rank
0 110152 86.50 B79 S 2.0
1 110152 92.50 B77 S 1.0
2 110413 79.65 E67 S 1.0
3 110413 79.65 E68 S 2.0
4 110465 52.00 C110 S 1.0
5 110465 52.00 A14 S 2.0
6 110564 26.55 C52 S 1.0
7 110813 75.25 D37 C 1.0
8 111240 33.50 B19 S 1.0
9 111320 38.50 E63 S 1.0
In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)
In [304]: df1
Out[304]:
Ticket Fare Cabin Embarked
1 110152 92.50 B77 S
2 110413 79.65 E67 S
4 110465 52.00 C110 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S
Now, if you see , df1
has only 1 record per ticket number. Now, you merge
statement will not produce duplicates.
Let me know if this helps.
answered Nov 25 '18 at 11:58
Mayank PorwalMayank Porwal
4,9452724
4,9452724
@punitkumarsharma Let me know if the answer helped you?
– Mayank Porwal
Nov 26 '18 at 13:28
1
well that's a legit one. your solution worked for me thank you so much.
– punit kumar Sharma
Nov 27 '18 at 9:47
add a comment |
@punitkumarsharma Let me know if the answer helped you?
– Mayank Porwal
Nov 26 '18 at 13:28
1
well that's a legit one. your solution worked for me thank you so much.
– punit kumar Sharma
Nov 27 '18 at 9:47
@punitkumarsharma Let me know if the answer helped you?
– Mayank Porwal
Nov 26 '18 at 13:28
@punitkumarsharma Let me know if the answer helped you?
– Mayank Porwal
Nov 26 '18 at 13:28
1
1
well that's a legit one. your solution worked for me thank you so much.
– punit kumar Sharma
Nov 27 '18 at 9:47
well that's a legit one. your solution worked for me thank you so much.
– punit kumar Sharma
Nov 27 '18 at 9:47
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466778%2fmerge-different-length-dataframes-join-column-in-dataframe-dont-have-unique-val%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown