Merge different length dataframes, Join column in dataframe dont have unique values












0















I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.



enter code here


Df1



    Ticket  Fare    Cabin   Embarked
0 110152 86.50 B79 S
1 110152 92.50 B77 S
2 110413 79.65 E67 S
3 110413 79.65 E68 S
4 110465 52.00 C110 S
5 110465 52.00 A14 S
6 110564 26.55 C52 S
7 110813 75.25 D37 C
8 111240 33.50 B19 S
9 111320 38.50 E63 S

df2

Survived Ticket
PassengerId
1 0 A/5 21171
2 1 PC 17599
3 1 STON/O2. 3101282
4 1 113803
5 0 373450
6 0 330877
7 0 17463
8 0 349909
9 1 347742
10 1 237736


There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.



eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.



 pass
engerID Survived Ticket Fare Cabin Embarked
0 0 110152 86.50 NaN S
0 1 110152 90.50 C85 C
1 1 STON/O2.3101 7.9250 NaN S
2 1 113803 53.1000 C123 S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S


Here passenger 0 is having to records with different prices but it should have only one record after merge.










share|improve this question



























    0















    I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.



    enter code here


    Df1



        Ticket  Fare    Cabin   Embarked
    0 110152 86.50 B79 S
    1 110152 92.50 B77 S
    2 110413 79.65 E67 S
    3 110413 79.65 E68 S
    4 110465 52.00 C110 S
    5 110465 52.00 A14 S
    6 110564 26.55 C52 S
    7 110813 75.25 D37 C
    8 111240 33.50 B19 S
    9 111320 38.50 E63 S

    df2

    Survived Ticket
    PassengerId
    1 0 A/5 21171
    2 1 PC 17599
    3 1 STON/O2. 3101282
    4 1 113803
    5 0 373450
    6 0 330877
    7 0 17463
    8 0 349909
    9 1 347742
    10 1 237736


    There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.



    eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.



     pass
    engerID Survived Ticket Fare Cabin Embarked
    0 0 110152 86.50 NaN S
    0 1 110152 90.50 C85 C
    1 1 STON/O2.3101 7.9250 NaN S
    2 1 113803 53.1000 C123 S
    3 0 113803 53.1000 C123 S
    4 0 373450 8.0500 NaN S


    Here passenger 0 is having to records with different prices but it should have only one record after merge.










    share|improve this question

























      0












      0








      0








      I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.



      enter code here


      Df1



          Ticket  Fare    Cabin   Embarked
      0 110152 86.50 B79 S
      1 110152 92.50 B77 S
      2 110413 79.65 E67 S
      3 110413 79.65 E68 S
      4 110465 52.00 C110 S
      5 110465 52.00 A14 S
      6 110564 26.55 C52 S
      7 110813 75.25 D37 C
      8 111240 33.50 B19 S
      9 111320 38.50 E63 S

      df2

      Survived Ticket
      PassengerId
      1 0 A/5 21171
      2 1 PC 17599
      3 1 STON/O2. 3101282
      4 1 113803
      5 0 373450
      6 0 330877
      7 0 17463
      8 0 349909
      9 1 347742
      10 1 237736


      There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.



      eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.



       pass
      engerID Survived Ticket Fare Cabin Embarked
      0 0 110152 86.50 NaN S
      0 1 110152 90.50 C85 C
      1 1 STON/O2.3101 7.9250 NaN S
      2 1 113803 53.1000 C123 S
      3 0 113803 53.1000 C123 S
      4 0 373450 8.0500 NaN S


      Here passenger 0 is having to records with different prices but it should have only one record after merge.










      share|improve this question














      I have Titanic dataset with data in different csv files. I need to combined all the files in one dataframe to use the data. But one of file dont not the any column which posses unique values. I am trying to merge the data using merge command but number of records increases.



      enter code here


      Df1



          Ticket  Fare    Cabin   Embarked
      0 110152 86.50 B79 S
      1 110152 92.50 B77 S
      2 110413 79.65 E67 S
      3 110413 79.65 E68 S
      4 110465 52.00 C110 S
      5 110465 52.00 A14 S
      6 110564 26.55 C52 S
      7 110813 75.25 D37 C
      8 111240 33.50 B19 S
      9 111320 38.50 E63 S

      df2

      Survived Ticket
      PassengerId
      1 0 A/5 21171
      2 1 PC 17599
      3 1 STON/O2. 3101282
      4 1 113803
      5 0 373450
      6 0 330877
      7 0 17463
      8 0 349909
      9 1 347742
      10 1 237736


      There are some tickets which are having different prices for the same ticket number. Which is adding two records for same ticket number for that passenger for the different price.



      eg. Ticket 110152 is having two prices. whichever customer buys this ticket is having two records after the merge with two different prices.



       pass
      engerID Survived Ticket Fare Cabin Embarked
      0 0 110152 86.50 NaN S
      0 1 110152 90.50 C85 C
      1 1 STON/O2.3101 7.9250 NaN S
      2 1 113803 53.1000 C123 S
      3 0 113803 53.1000 C123 S
      4 0 373450 8.0500 NaN S


      Here passenger 0 is having to records with different prices but it should have only one record after merge.







      python pandas join merge left-join






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 25 '18 at 10:58









      punit kumar Sharmapunit kumar Sharma

      419




      419
























          1 Answer
          1






          active

          oldest

          votes


















          1














          If I understand correctly, the issue is with multiple records coming after the merge statement.



          You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:



          In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)

          In [299]: df1
          Out[299]:
          Ticket Fare Cabin Embarked rank
          0 110152 86.50 B79 S 2.0
          1 110152 92.50 B77 S 1.0
          2 110413 79.65 E67 S 1.0
          3 110413 79.65 E68 S 2.0
          4 110465 52.00 C110 S 1.0
          5 110465 52.00 A14 S 2.0
          6 110564 26.55 C52 S 1.0
          7 110813 75.25 D37 C 1.0
          8 111240 33.50 B19 S 1.0
          9 111320 38.50 E63 S 1.0

          In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)

          In [304]: df1
          Out[304]:

          Ticket Fare Cabin Embarked
          1 110152 92.50 B77 S
          2 110413 79.65 E67 S
          4 110465 52.00 C110 S
          6 110564 26.55 C52 S
          7 110813 75.25 D37 C
          8 111240 33.50 B19 S
          9 111320 38.50 E63 S


          Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.



          Let me know if this helps.






          share|improve this answer
























          • @punitkumarsharma Let me know if the answer helped you?

            – Mayank Porwal
            Nov 26 '18 at 13:28






          • 1





            well that's a legit one. your solution worked for me thank you so much.

            – punit kumar Sharma
            Nov 27 '18 at 9:47











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466778%2fmerge-different-length-dataframes-join-column-in-dataframe-dont-have-unique-val%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          If I understand correctly, the issue is with multiple records coming after the merge statement.



          You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:



          In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)

          In [299]: df1
          Out[299]:
          Ticket Fare Cabin Embarked rank
          0 110152 86.50 B79 S 2.0
          1 110152 92.50 B77 S 1.0
          2 110413 79.65 E67 S 1.0
          3 110413 79.65 E68 S 2.0
          4 110465 52.00 C110 S 1.0
          5 110465 52.00 A14 S 2.0
          6 110564 26.55 C52 S 1.0
          7 110813 75.25 D37 C 1.0
          8 111240 33.50 B19 S 1.0
          9 111320 38.50 E63 S 1.0

          In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)

          In [304]: df1
          Out[304]:

          Ticket Fare Cabin Embarked
          1 110152 92.50 B77 S
          2 110413 79.65 E67 S
          4 110465 52.00 C110 S
          6 110564 26.55 C52 S
          7 110813 75.25 D37 C
          8 111240 33.50 B19 S
          9 111320 38.50 E63 S


          Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.



          Let me know if this helps.






          share|improve this answer
























          • @punitkumarsharma Let me know if the answer helped you?

            – Mayank Porwal
            Nov 26 '18 at 13:28






          • 1





            well that's a legit one. your solution worked for me thank you so much.

            – punit kumar Sharma
            Nov 27 '18 at 9:47
















          1














          If I understand correctly, the issue is with multiple records coming after the merge statement.



          You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:



          In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)

          In [299]: df1
          Out[299]:
          Ticket Fare Cabin Embarked rank
          0 110152 86.50 B79 S 2.0
          1 110152 92.50 B77 S 1.0
          2 110413 79.65 E67 S 1.0
          3 110413 79.65 E68 S 2.0
          4 110465 52.00 C110 S 1.0
          5 110465 52.00 A14 S 2.0
          6 110564 26.55 C52 S 1.0
          7 110813 75.25 D37 C 1.0
          8 111240 33.50 B19 S 1.0
          9 111320 38.50 E63 S 1.0

          In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)

          In [304]: df1
          Out[304]:

          Ticket Fare Cabin Embarked
          1 110152 92.50 B77 S
          2 110413 79.65 E67 S
          4 110465 52.00 C110 S
          6 110564 26.55 C52 S
          7 110813 75.25 D37 C
          8 111240 33.50 B19 S
          9 111320 38.50 E63 S


          Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.



          Let me know if this helps.






          share|improve this answer
























          • @punitkumarsharma Let me know if the answer helped you?

            – Mayank Porwal
            Nov 26 '18 at 13:28






          • 1





            well that's a legit one. your solution worked for me thank you so much.

            – punit kumar Sharma
            Nov 27 '18 at 9:47














          1












          1








          1







          If I understand correctly, the issue is with multiple records coming after the merge statement.



          You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:



          In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)

          In [299]: df1
          Out[299]:
          Ticket Fare Cabin Embarked rank
          0 110152 86.50 B79 S 2.0
          1 110152 92.50 B77 S 1.0
          2 110413 79.65 E67 S 1.0
          3 110413 79.65 E68 S 2.0
          4 110465 52.00 C110 S 1.0
          5 110465 52.00 A14 S 2.0
          6 110564 26.55 C52 S 1.0
          7 110813 75.25 D37 C 1.0
          8 111240 33.50 B19 S 1.0
          9 111320 38.50 E63 S 1.0

          In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)

          In [304]: df1
          Out[304]:

          Ticket Fare Cabin Embarked
          1 110152 92.50 B77 S
          2 110413 79.65 E67 S
          4 110465 52.00 C110 S
          6 110564 26.55 C52 S
          7 110813 75.25 D37 C
          8 111240 33.50 B19 S
          9 111320 38.50 E63 S


          Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.



          Let me know if this helps.






          share|improve this answer













          If I understand correctly, the issue is with multiple records coming after the merge statement.



          You can eliminate multiple records for the same Ticket number and keep only 1 record. Something like this:



          In [298]: df1['rank'] = df1.groupby('Ticket')['Fare'].rank('first',ascending=False)

          In [299]: df1
          Out[299]:
          Ticket Fare Cabin Embarked rank
          0 110152 86.50 B79 S 2.0
          1 110152 92.50 B77 S 1.0
          2 110413 79.65 E67 S 1.0
          3 110413 79.65 E68 S 2.0
          4 110465 52.00 C110 S 1.0
          5 110465 52.00 A14 S 2.0
          6 110564 26.55 C52 S 1.0
          7 110813 75.25 D37 C 1.0
          8 111240 33.50 B19 S 1.0
          9 111320 38.50 E63 S 1.0

          In [303]: df1 = df1.query('rank == 1.0').drop('rank',1)

          In [304]: df1
          Out[304]:

          Ticket Fare Cabin Embarked
          1 110152 92.50 B77 S
          2 110413 79.65 E67 S
          4 110465 52.00 C110 S
          6 110564 26.55 C52 S
          7 110813 75.25 D37 C
          8 111240 33.50 B19 S
          9 111320 38.50 E63 S


          Now, if you see , df1 has only 1 record per ticket number. Now, you merge statement will not produce duplicates.



          Let me know if this helps.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 25 '18 at 11:58









          Mayank PorwalMayank Porwal

          4,9452724




          4,9452724













          • @punitkumarsharma Let me know if the answer helped you?

            – Mayank Porwal
            Nov 26 '18 at 13:28






          • 1





            well that's a legit one. your solution worked for me thank you so much.

            – punit kumar Sharma
            Nov 27 '18 at 9:47



















          • @punitkumarsharma Let me know if the answer helped you?

            – Mayank Porwal
            Nov 26 '18 at 13:28






          • 1





            well that's a legit one. your solution worked for me thank you so much.

            – punit kumar Sharma
            Nov 27 '18 at 9:47

















          @punitkumarsharma Let me know if the answer helped you?

          – Mayank Porwal
          Nov 26 '18 at 13:28





          @punitkumarsharma Let me know if the answer helped you?

          – Mayank Porwal
          Nov 26 '18 at 13:28




          1




          1





          well that's a legit one. your solution worked for me thank you so much.

          – punit kumar Sharma
          Nov 27 '18 at 9:47





          well that's a legit one. your solution worked for me thank you so much.

          – punit kumar Sharma
          Nov 27 '18 at 9:47




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466778%2fmerge-different-length-dataframes-join-column-in-dataframe-dont-have-unique-val%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          404 Error Contact Form 7 ajax form submitting

          How to know if a Active Directory user can login interactively

          TypeError: fit_transform() missing 1 required positional argument: 'X'