Join 2 Pandas dataframes on specific columns accounting for nan












2















I have 2 pandas DataFrames dfA and dfB with X and Y columns and rows.



I need to append dfB on dfA, creating dfC. I need the appending to be on columns with column names in both dfA and dfB.



Moreover, if column name in dfA does not appear in dfB, then dfC should contain Nan values.





To simplify,



dfA looks like,



    c1   c2   c3
0 AX1 AX2 AX3
1 AY1 AY2 AY3
2 AZ1 AZ2 AZ3


while dfB looks like,



    c1   c2   c4
0 BX1 BX2 BX3
1 BY1 BY2 BY3
2 BZ1 BZ2 BZ3


such that dfC should look like,



    c1   c2   c3
0 AX1 AX2 AX3
1 AY1 AY2 AY3
2 AZ1 AZ2 AZ3
3 BX1 BX2 Nan
4 BY1 BY2 Nan
5 BZ1 BZ2 Nan


How do I do this?










share|improve this question



























    2















    I have 2 pandas DataFrames dfA and dfB with X and Y columns and rows.



    I need to append dfB on dfA, creating dfC. I need the appending to be on columns with column names in both dfA and dfB.



    Moreover, if column name in dfA does not appear in dfB, then dfC should contain Nan values.





    To simplify,



    dfA looks like,



        c1   c2   c3
    0 AX1 AX2 AX3
    1 AY1 AY2 AY3
    2 AZ1 AZ2 AZ3


    while dfB looks like,



        c1   c2   c4
    0 BX1 BX2 BX3
    1 BY1 BY2 BY3
    2 BZ1 BZ2 BZ3


    such that dfC should look like,



        c1   c2   c3
    0 AX1 AX2 AX3
    1 AY1 AY2 AY3
    2 AZ1 AZ2 AZ3
    3 BX1 BX2 Nan
    4 BY1 BY2 Nan
    5 BZ1 BZ2 Nan


    How do I do this?










    share|improve this question

























      2












      2








      2








      I have 2 pandas DataFrames dfA and dfB with X and Y columns and rows.



      I need to append dfB on dfA, creating dfC. I need the appending to be on columns with column names in both dfA and dfB.



      Moreover, if column name in dfA does not appear in dfB, then dfC should contain Nan values.





      To simplify,



      dfA looks like,



          c1   c2   c3
      0 AX1 AX2 AX3
      1 AY1 AY2 AY3
      2 AZ1 AZ2 AZ3


      while dfB looks like,



          c1   c2   c4
      0 BX1 BX2 BX3
      1 BY1 BY2 BY3
      2 BZ1 BZ2 BZ3


      such that dfC should look like,



          c1   c2   c3
      0 AX1 AX2 AX3
      1 AY1 AY2 AY3
      2 AZ1 AZ2 AZ3
      3 BX1 BX2 Nan
      4 BY1 BY2 Nan
      5 BZ1 BZ2 Nan


      How do I do this?










      share|improve this question














      I have 2 pandas DataFrames dfA and dfB with X and Y columns and rows.



      I need to append dfB on dfA, creating dfC. I need the appending to be on columns with column names in both dfA and dfB.



      Moreover, if column name in dfA does not appear in dfB, then dfC should contain Nan values.





      To simplify,



      dfA looks like,



          c1   c2   c3
      0 AX1 AX2 AX3
      1 AY1 AY2 AY3
      2 AZ1 AZ2 AZ3


      while dfB looks like,



          c1   c2   c4
      0 BX1 BX2 BX3
      1 BY1 BY2 BY3
      2 BZ1 BZ2 BZ3


      such that dfC should look like,



          c1   c2   c3
      0 AX1 AX2 AX3
      1 AY1 AY2 AY3
      2 AZ1 AZ2 AZ3
      3 BX1 BX2 Nan
      4 BY1 BY2 Nan
      5 BZ1 BZ2 Nan


      How do I do this?







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 11:37









      LucSpanLucSpan

      678519




      678519
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Use concat and filter columns of dfA:



          df = pd.concat([dfA, dfB])[dfA.columns]


          Or filter columns by intersection:



          cols = dfA.columns.intersection(dfB.columns)
          df = pd.concat([dfA, dfB[cols]])





          share|improve this answer

































            1














            try this,



            pd.concat([dfA,dfB[['c1','c2']]])


            To Avoid Hard coding columns



            try this,



            inter=list(set(dfA.columns)&set(dfB.columns))
            print pd.concat([dfA,dfB[inter]])





            share|improve this answer





















            • 1





              Thanks, but with @jezrael's answer I don't have to check column names.

              – LucSpan
              Nov 23 '18 at 11:43






            • 1





              @LucSpan - Solution updated

              – Mohamed Thasin ah
              Nov 23 '18 at 11:46











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445991%2fjoin-2-pandas-dataframes-on-specific-columns-accounting-for-nan%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            Use concat and filter columns of dfA:



            df = pd.concat([dfA, dfB])[dfA.columns]


            Or filter columns by intersection:



            cols = dfA.columns.intersection(dfB.columns)
            df = pd.concat([dfA, dfB[cols]])





            share|improve this answer






























              1














              Use concat and filter columns of dfA:



              df = pd.concat([dfA, dfB])[dfA.columns]


              Or filter columns by intersection:



              cols = dfA.columns.intersection(dfB.columns)
              df = pd.concat([dfA, dfB[cols]])





              share|improve this answer




























                1












                1








                1







                Use concat and filter columns of dfA:



                df = pd.concat([dfA, dfB])[dfA.columns]


                Or filter columns by intersection:



                cols = dfA.columns.intersection(dfB.columns)
                df = pd.concat([dfA, dfB[cols]])





                share|improve this answer















                Use concat and filter columns of dfA:



                df = pd.concat([dfA, dfB])[dfA.columns]


                Or filter columns by intersection:



                cols = dfA.columns.intersection(dfB.columns)
                df = pd.concat([dfA, dfB[cols]])






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 23 '18 at 11:42

























                answered Nov 23 '18 at 11:39









                jezraeljezrael

                331k24273351




                331k24273351

























                    1














                    try this,



                    pd.concat([dfA,dfB[['c1','c2']]])


                    To Avoid Hard coding columns



                    try this,



                    inter=list(set(dfA.columns)&set(dfB.columns))
                    print pd.concat([dfA,dfB[inter]])





                    share|improve this answer





















                    • 1





                      Thanks, but with @jezrael's answer I don't have to check column names.

                      – LucSpan
                      Nov 23 '18 at 11:43






                    • 1





                      @LucSpan - Solution updated

                      – Mohamed Thasin ah
                      Nov 23 '18 at 11:46
















                    1














                    try this,



                    pd.concat([dfA,dfB[['c1','c2']]])


                    To Avoid Hard coding columns



                    try this,



                    inter=list(set(dfA.columns)&set(dfB.columns))
                    print pd.concat([dfA,dfB[inter]])





                    share|improve this answer





















                    • 1





                      Thanks, but with @jezrael's answer I don't have to check column names.

                      – LucSpan
                      Nov 23 '18 at 11:43






                    • 1





                      @LucSpan - Solution updated

                      – Mohamed Thasin ah
                      Nov 23 '18 at 11:46














                    1












                    1








                    1







                    try this,



                    pd.concat([dfA,dfB[['c1','c2']]])


                    To Avoid Hard coding columns



                    try this,



                    inter=list(set(dfA.columns)&set(dfB.columns))
                    print pd.concat([dfA,dfB[inter]])





                    share|improve this answer















                    try this,



                    pd.concat([dfA,dfB[['c1','c2']]])


                    To Avoid Hard coding columns



                    try this,



                    inter=list(set(dfA.columns)&set(dfB.columns))
                    print pd.concat([dfA,dfB[inter]])






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 23 '18 at 11:43

























                    answered Nov 23 '18 at 11:39









                    Mohamed Thasin ahMohamed Thasin ah

                    3,55131540




                    3,55131540








                    • 1





                      Thanks, but with @jezrael's answer I don't have to check column names.

                      – LucSpan
                      Nov 23 '18 at 11:43






                    • 1





                      @LucSpan - Solution updated

                      – Mohamed Thasin ah
                      Nov 23 '18 at 11:46














                    • 1





                      Thanks, but with @jezrael's answer I don't have to check column names.

                      – LucSpan
                      Nov 23 '18 at 11:43






                    • 1





                      @LucSpan - Solution updated

                      – Mohamed Thasin ah
                      Nov 23 '18 at 11:46








                    1




                    1





                    Thanks, but with @jezrael's answer I don't have to check column names.

                    – LucSpan
                    Nov 23 '18 at 11:43





                    Thanks, but with @jezrael's answer I don't have to check column names.

                    – LucSpan
                    Nov 23 '18 at 11:43




                    1




                    1





                    @LucSpan - Solution updated

                    – Mohamed Thasin ah
                    Nov 23 '18 at 11:46





                    @LucSpan - Solution updated

                    – Mohamed Thasin ah
                    Nov 23 '18 at 11:46


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445991%2fjoin-2-pandas-dataframes-on-specific-columns-accounting-for-nan%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    404 Error Contact Form 7 ajax form submitting

                    How to know if a Active Directory user can login interactively

                    Refactoring coordinates for Minecraft Pi buildings written in Python