Merge different length dataframes, Join column in dataframe dont have unique values

I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.

enter code here

Df1

    Ticket  Fare    Cabin   Embarked

0   110152  86.50   B79       S

1   110152  92.50   B77       S

2   110413  79.65   E67       S

3   110413  79.65   E68       S

4   110465  52.00   C110      S

5   110465  52.00   A14       S

6   110564  26.55   C52       S

7   110813  75.25   D37       C

8   111240  33.50   B19       S

9   111320  38.50   E63       S



df2 



        Survived    Ticket

PassengerId     

1         0         A/5 21171

2         1         PC 17599

3         1         STON/O2. 3101282

4         1         113803

5         0         373450

6         0         330877

7         0         17463

8         0         349909

9         1         347742

10        1         237736

There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.

eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.

 pass

engerID   Survived  Ticket   Fare     Cabin  Embarked

 0    0       110152        86.50      NaN      S

 0    1       110152        90.50      C85      C

 1    1     STON/O2.3101   7.9250      NaN      S

 2    1      113803        53.1000     C123     S

 3    0      113803        53.1000     C123     S

 4    0       373450       8.0500       NaN     S

Here passenger 0 is having to records with different prices but it should have only one record after merge.

asked Nov 25 '18 at 10:58

punit kumar Sharma

419

add a comment |

enter code here

Df1

    Ticket  Fare    Cabin   Embarked

0   110152  86.50   B79       S

1   110152  92.50   B77       S

2   110413  79.65   E67       S

3   110413  79.65   E68       S

4   110465  52.00   C110      S

5   110465  52.00   A14       S

6   110564  26.55   C52       S

7   110813  75.25   D37       C

8   111240  33.50   B19       S

9   111320  38.50   E63       S



df2 



        Survived    Ticket

PassengerId     

1         0         A/5 21171

2         1         PC 17599

3         1         STON/O2. 3101282

4         1         113803

5         0         373450

6         0         330877

7         0         17463

8         0         349909

9         1         347742

10        1         237736

There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.

eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.

 pass

engerID   Survived  Ticket   Fare     Cabin  Embarked

 0    0       110152        86.50      NaN      S

 0    1       110152        90.50      C85      C

 1    1     STON/O2.3101   7.9250      NaN      S

 2    1      113803        53.1000     C123     S

 3    0      113803        53.1000     C123     S

 4    0       373450       8.0500       NaN     S

Here passenger 0 is having to records with different prices but it should have only one record after merge.

asked Nov 25 '18 at 10:58

punit kumar Sharma

419

add a comment |

enter code here

Df1

    Ticket  Fare    Cabin   Embarked

0   110152  86.50   B79       S

1   110152  92.50   B77       S

2   110413  79.65   E67       S

3   110413  79.65   E68       S

4   110465  52.00   C110      S

5   110465  52.00   A14       S

6   110564  26.55   C52       S

7   110813  75.25   D37       C

8   111240  33.50   B19       S

9   111320  38.50   E63       S



df2 



        Survived    Ticket

PassengerId     

1         0         A/5 21171

2         1         PC 17599

3         1         STON/O2. 3101282

4         1         113803

5         0         373450

6         0         330877

7         0         17463

8         0         349909

9         1         347742

10        1         237736

There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.

eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.

 pass

engerID   Survived  Ticket   Fare     Cabin  Embarked

 0    0       110152        86.50      NaN      S

 0    1       110152        90.50      C85      C

 1    1     STON/O2.3101   7.9250      NaN      S

 2    1      113803        53.1000     C123     S

 3    0      113803        53.1000     C123     S

 4    0       373450       8.0500       NaN     S

Here passenger 0 is having to records with different prices but it should have only one record after merge.

asked Nov 25 '18 at 10:58

punit kumar Sharma

419

enter code here

Df1

    Ticket  Fare    Cabin   Embarked

0   110152  86.50   B79       S

1   110152  92.50   B77       S

2   110413  79.65   E67       S

3   110413  79.65   E68       S

4   110465  52.00   C110      S

5   110465  52.00   A14       S

6   110564  26.55   C52       S

7   110813  75.25   D37       C

8   111240  33.50   B19       S

9   111320  38.50   E63       S



df2 



        Survived    Ticket

PassengerId     

1         0         A/5 21171

2         1         PC 17599

3         1         STON/O2. 3101282

4         1         113803

5         0         373450

6         0         330877

7         0         17463

8         0         349909

9         1         347742

10        1         237736

There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.

eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.

 pass

engerID   Survived  Ticket   Fare     Cabin  Embarked

 0    0       110152        86.50      NaN      S

 0    1       110152        90.50      C85      C

 1    1     STON/O2.3101   7.9250      NaN      S

 2    1      113803        53.1000     C123     S

 3    0      113803        53.1000     C123     S

 4    0       373450       8.0500       NaN     S

Here passenger 0 is having to records with different prices but it should have only one record after merge.

python pandas join merge left-join

asked Nov 25 '18 at 10:58

punit kumar Sharma

419

asked Nov 25 '18 at 10:58

punit kumar Sharma

419

asked Nov 25 '18 at 10:58

punit kumar Sharma

419

asked Nov 25 '18 at 10:58

punit kumar Sharma

419

asked Nov 25 '18 at 10:58

punit kumar Sharma

419

add a comment |

1 Answer
1

active

oldest

votes

If I understand correctly, the issue is with multiple records coming after the merge statement.

You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:

In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)



In [299]: df1

Out[299]: 

   Ticket   Fare Cabin Embarked  rank

0  110152  86.50   B79        S   2.0

1  110152  92.50   B77        S   1.0

2  110413  79.65   E67        S   1.0

3  110413  79.65   E68        S   2.0

4  110465  52.00  C110        S   1.0

5  110465  52.00   A14        S   2.0

6  110564  26.55   C52        S   1.0

7  110813  75.25   D37        C   1.0

8  111240  33.50   B19        S   1.0

9  111320  38.50   E63        S   1.0



In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)



In [304]: df1

Out[304]: 



   Ticket   Fare Cabin Embarked

1  110152  92.50   B77        S

2  110413  79.65   E67        S

4  110465  52.00  C110        S

6  110564  26.55   C52        S

7  110813  75.25   D37        C

8  111240  33.50   B19        S

9  111320  38.50   E63        S

Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.

Let me know if this helps.

answered Nov 25 '18 at 11:58

Mayank Porwal

4,9452724

@punitkumarsharma Let me know if the answer helped you?

– Mayank Porwal
Nov 26 '18 at 13:28

1

well that's a legit one. your solution worked for me thank you so much.

– punit kumar Sharma
Nov 27 '18 at 9:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466778%2fmerge-different-length-dataframes-join-column-in-dataframe-dont-have-unique-val%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If I understand correctly, the issue is with multiple records coming after the merge statement.

You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:

In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)



In [299]: df1

Out[299]: 

   Ticket   Fare Cabin Embarked  rank

0  110152  86.50   B79        S   2.0

1  110152  92.50   B77        S   1.0

2  110413  79.65   E67        S   1.0

3  110413  79.65   E68        S   2.0

4  110465  52.00  C110        S   1.0

5  110465  52.00   A14        S   2.0

6  110564  26.55   C52        S   1.0

7  110813  75.25   D37        C   1.0

8  111240  33.50   B19        S   1.0

9  111320  38.50   E63        S   1.0



In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)



In [304]: df1

Out[304]: 



   Ticket   Fare Cabin Embarked

1  110152  92.50   B77        S

2  110413  79.65   E67        S

4  110465  52.00  C110        S

6  110564  26.55   C52        S

7  110813  75.25   D37        C

8  111240  33.50   B19        S

9  111320  38.50   E63        S

Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.

Let me know if this helps.

answered Nov 25 '18 at 11:58

Mayank Porwal

4,9452724

@punitkumarsharma Let me know if the answer helped you?

– Mayank Porwal
Nov 26 '18 at 13:28

1

well that's a legit one. your solution worked for me thank you so much.

– punit kumar Sharma
Nov 27 '18 at 9:47

add a comment |

If I understand correctly, the issue is with multiple records coming after the merge statement.

You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:

In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)



In [299]: df1

Out[299]: 

   Ticket   Fare Cabin Embarked  rank

0  110152  86.50   B79        S   2.0

1  110152  92.50   B77        S   1.0

2  110413  79.65   E67        S   1.0

3  110413  79.65   E68        S   2.0

4  110465  52.00  C110        S   1.0

5  110465  52.00   A14        S   2.0

6  110564  26.55   C52        S   1.0

7  110813  75.25   D37        C   1.0

8  111240  33.50   B19        S   1.0

9  111320  38.50   E63        S   1.0



In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)



In [304]: df1

Out[304]: 



   Ticket   Fare Cabin Embarked

1  110152  92.50   B77        S

2  110413  79.65   E67        S

4  110465  52.00  C110        S

6  110564  26.55   C52        S

7  110813  75.25   D37        C

8  111240  33.50   B19        S

9  111320  38.50   E63        S

Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.

Let me know if this helps.

answered Nov 25 '18 at 11:58

Mayank Porwal

4,9452724

@punitkumarsharma Let me know if the answer helped you?

– Mayank Porwal
Nov 26 '18 at 13:28

1

well that's a legit one. your solution worked for me thank you so much.

– punit kumar Sharma
Nov 27 '18 at 9:47

add a comment |

If I understand correctly, the issue is with multiple records coming after the merge statement.

You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:

In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)



In [299]: df1

Out[299]: 

   Ticket   Fare Cabin Embarked  rank

0  110152  86.50   B79        S   2.0

1  110152  92.50   B77        S   1.0

2  110413  79.65   E67        S   1.0

3  110413  79.65   E68        S   2.0

4  110465  52.00  C110        S   1.0

5  110465  52.00   A14        S   2.0

6  110564  26.55   C52        S   1.0

7  110813  75.25   D37        C   1.0

8  111240  33.50   B19        S   1.0

9  111320  38.50   E63        S   1.0



In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)



In [304]: df1

Out[304]: 



   Ticket   Fare Cabin Embarked

1  110152  92.50   B77        S

2  110413  79.65   E67        S

4  110465  52.00  C110        S

6  110564  26.55   C52        S

7  110813  75.25   D37        C

8  111240  33.50   B19        S

9  111320  38.50   E63        S

Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.

Let me know if this helps.

answered Nov 25 '18 at 11:58

Mayank Porwal

4,9452724

If I understand correctly, the issue is with multiple records coming after the merge statement.

You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:

In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)



In [299]: df1

Out[299]: 

   Ticket   Fare Cabin Embarked  rank

0  110152  86.50   B79        S   2.0

1  110152  92.50   B77        S   1.0

2  110413  79.65   E67        S   1.0

3  110413  79.65   E68        S   2.0

4  110465  52.00  C110        S   1.0

5  110465  52.00   A14        S   2.0

6  110564  26.55   C52        S   1.0

7  110813  75.25   D37        C   1.0

8  111240  33.50   B19        S   1.0

9  111320  38.50   E63        S   1.0



In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)



In [304]: df1

Out[304]: 



   Ticket   Fare Cabin Embarked

1  110152  92.50   B77        S

2  110413  79.65   E67        S

4  110465  52.00  C110        S

6  110564  26.55   C52        S

7  110813  75.25   D37        C

8  111240  33.50   B19        S

9  111320  38.50   E63        S

Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.

Let me know if this helps.

answered Nov 25 '18 at 11:58

Mayank Porwal

4,9452724

answered Nov 25 '18 at 11:58

Mayank Porwal

4,9452724

answered Nov 25 '18 at 11:58

Mayank Porwal

4,9452724

answered Nov 25 '18 at 11:58

Mayank Porwal

4,9452724

@punitkumarsharma Let me know if the answer helped you?

– Mayank Porwal
Nov 26 '18 at 13:28

1

well that's a legit one. your solution worked for me thank you so much.

– punit kumar Sharma
Nov 27 '18 at 9:47

add a comment |

@punitkumarsharma Let me know if the answer helped you?

– Mayank Porwal
Nov 26 '18 at 13:28

1

well that's a legit one. your solution worked for me thank you so much.

– punit kumar Sharma
Nov 27 '18 at 9:47

@punitkumarsharma Let me know if the answer helped you?

– Mayank Porwal
Nov 26 '18 at 13:28

well that's a legit one. your solution worked for me thank you so much.

– punit kumar Sharma
Nov 27 '18 at 9:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk