Split string from a preset list of strings from pandas df column











up vote
3
down vote

favorite












I have a pandas dataframe that looks like below. It has about a million rows.



name = ['Jake','Matt', 'Henry']

0 A
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake


I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.



0   A
1 Jake
2 Matt
3 Matt
4 Henry
5 Jake


Thanks in advance. I am new to python so still figuring out the easiest way to do this.










share|improve this question




















  • 2




    what if value of column A doesn't exist in list?
    – Sociopath
    Nov 20 at 5:31






  • 1




    What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
    – CIsForCookies
    Nov 20 at 5:31










  • Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
    – Matt
    Nov 20 at 5:33

















up vote
3
down vote

favorite












I have a pandas dataframe that looks like below. It has about a million rows.



name = ['Jake','Matt', 'Henry']

0 A
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake


I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.



0   A
1 Jake
2 Matt
3 Matt
4 Henry
5 Jake


Thanks in advance. I am new to python so still figuring out the easiest way to do this.










share|improve this question




















  • 2




    what if value of column A doesn't exist in list?
    – Sociopath
    Nov 20 at 5:31






  • 1




    What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
    – CIsForCookies
    Nov 20 at 5:31










  • Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
    – Matt
    Nov 20 at 5:33















up vote
3
down vote

favorite









up vote
3
down vote

favorite











I have a pandas dataframe that looks like below. It has about a million rows.



name = ['Jake','Matt', 'Henry']

0 A
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake


I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.



0   A
1 Jake
2 Matt
3 Matt
4 Henry
5 Jake


Thanks in advance. I am new to python so still figuring out the easiest way to do this.










share|improve this question















I have a pandas dataframe that looks like below. It has about a million rows.



name = ['Jake','Matt', 'Henry']

0 A
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake


I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.



0   A
1 Jake
2 Matt
3 Matt
4 Henry
5 Jake


Thanks in advance. I am new to python so still figuring out the easiest way to do this.







python python-3.x pandas python-2.7






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 5:45

























asked Nov 20 at 5:29









Matt

546




546








  • 2




    what if value of column A doesn't exist in list?
    – Sociopath
    Nov 20 at 5:31






  • 1




    What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
    – CIsForCookies
    Nov 20 at 5:31










  • Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
    – Matt
    Nov 20 at 5:33
















  • 2




    what if value of column A doesn't exist in list?
    – Sociopath
    Nov 20 at 5:31






  • 1




    What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
    – CIsForCookies
    Nov 20 at 5:31










  • Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
    – Matt
    Nov 20 at 5:33










2




2




what if value of column A doesn't exist in list?
– Sociopath
Nov 20 at 5:31




what if value of column A doesn't exist in list?
– Sociopath
Nov 20 at 5:31




1




1




What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
– CIsForCookies
Nov 20 at 5:31




What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
– CIsForCookies
Nov 20 at 5:31












Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
– Matt
Nov 20 at 5:33






Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
– Matt
Nov 20 at 5:33














7 Answers
7






active

oldest

votes

















up vote
2
down vote



accepted










You need:



first_name = ['Jake','Matt', 'Henry']

df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})

def func(x):
for k in first_name:
if k in x:
return k
return x

df['A'] = df['A'].apply(lambda x: func(x))


Output:



            A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
5 Dwayne John





share|improve this answer























  • Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
    – Matt
    Nov 20 at 5:48


















up vote
3
down vote













You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.



df.A.str.extract(r'({})'.format('|'.join(name)))




       0
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake





share|improve this answer




























    up vote
    1
    down vote













    Here is one method to achieve this:



    first_name = ['Jake','Matt', 'Henry']

    df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})

    df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))


    and you get:



                 A      B
    0 Jake Hill Jake
    1 Matt Dawn Matt
    2 Matt King Matt
    3 Henry White Henry
    4 Jake Hyde Jake





    share|improve this answer





















    • Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
      – Matt
      Nov 20 at 5:46


















    up vote
    0
    down vote













    name = ['Jake','Matt', 'Henry']
    df = pd.read_csv("file.csv")

    #filling nan values in-case if it is there
    df.fillna(0, inplace = True)
    df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")


    Output:



                 A First Name
    0 Jake Hill Jake
    1 Matt Dawn Matt
    2 Matt King Matt
    3 Henry White Henry
    4 Hyde Jake Jake





    share|improve this answer























    • Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
      – Matt
      Nov 20 at 5:47


















    up vote
    0
    down vote













    Try using:



    A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
    A_final[0]

    , your problem is resolved.






    share|improve this answer























    • What is this doing?
      – pygo
      Nov 20 at 6:03


















    up vote
    0
    down vote













    In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A Fist and choose the First Index of of it and passing to lambda using apply method.



    DataFrame Structure:



    df
    A
    0 Jake Hill
    1 Matt Dawn
    2 Matt King
    3 Henry White
    4 Jake Hyde


    Your name Var..



    $ name
    ['Jake', 'Matt', 'Henry']


    Your Final desired Dataset:



    Parameter n can be used to limit the number of splits in the output.



    df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))

    print(df)
    A
    0 Jake
    1 Matt
    2 Matt
    3 Henry
    4 Jake




    It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :



    >>> df
    A
    0 Jake Hill
    1 Matt Dawn
    2 Matt King
    3 Henry White
    4 Jake Hyde


    >>> df['A'].str.split(n=1, expand=True)[0]
    0 Jake
    1 Matt
    2 Matt
    3 Henry
    4 Jake
    Name: 0, dtype: object


    OR In case you want inplace replacement for column A ..



    df['A'] = df['A'].str.split(n=1, expand=True)[0]





    share|improve this answer























    • your input df is different from the user input. In this problem first name is customised.
      – Mohamed Thasin ah
      Nov 20 at 5:59










    • @MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
      – pygo
      Nov 20 at 6:00












    • In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.
      – Mohamed Thasin ah
      Nov 20 at 6:02




















    up vote
    0
    down vote













    This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).



    # split the name strings into columns as new dataframe
    df1 = df.A.str.split(' ', expand=True)
    # Keep the first names in the new dataframe and fill the rest with
    # empty strings, then sum the df1 column string values to make a new array
    names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
    # find the array indexes where no first names were found
    no_match_idx = np.where(names_result == '')[0]
    # fill the no first name index locations with original dataframe values
    names_result[no_match_idx] = df.A.values[no_match_idx]
    # make a dataframe using the results
    df_out = pd.DataFrame(names_result, columns=['A'])

    # to find names with a first and last name that are both found in the
    # first names list:
    # df_out['dups'] = df1.isin(name).sum(axis=1) > 1





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53386763%2fsplit-string-from-a-preset-list-of-strings-from-pandas-df-column%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      7 Answers
      7






      active

      oldest

      votes








      7 Answers
      7






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote



      accepted










      You need:



      first_name = ['Jake','Matt', 'Henry']

      df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})

      def func(x):
      for k in first_name:
      if k in x:
      return k
      return x

      df['A'] = df['A'].apply(lambda x: func(x))


      Output:



                  A
      0 Jake
      1 Matt
      2 Matt
      3 Henry
      4 Jake
      5 Dwayne John





      share|improve this answer























      • Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
        – Matt
        Nov 20 at 5:48















      up vote
      2
      down vote



      accepted










      You need:



      first_name = ['Jake','Matt', 'Henry']

      df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})

      def func(x):
      for k in first_name:
      if k in x:
      return k
      return x

      df['A'] = df['A'].apply(lambda x: func(x))


      Output:



                  A
      0 Jake
      1 Matt
      2 Matt
      3 Henry
      4 Jake
      5 Dwayne John





      share|improve this answer























      • Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
        – Matt
        Nov 20 at 5:48













      up vote
      2
      down vote



      accepted







      up vote
      2
      down vote



      accepted






      You need:



      first_name = ['Jake','Matt', 'Henry']

      df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})

      def func(x):
      for k in first_name:
      if k in x:
      return k
      return x

      df['A'] = df['A'].apply(lambda x: func(x))


      Output:



                  A
      0 Jake
      1 Matt
      2 Matt
      3 Henry
      4 Jake
      5 Dwayne John





      share|improve this answer














      You need:



      first_name = ['Jake','Matt', 'Henry']

      df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})

      def func(x):
      for k in first_name:
      if k in x:
      return k
      return x

      df['A'] = df['A'].apply(lambda x: func(x))


      Output:



                  A
      0 Jake
      1 Matt
      2 Matt
      3 Henry
      4 Jake
      5 Dwayne John






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 20 at 5:53

























      answered Nov 20 at 5:37









      Sociopath

      3,30971535




      3,30971535












      • Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
        – Matt
        Nov 20 at 5:48


















      • Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
        – Matt
        Nov 20 at 5:48
















      Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
      – Matt
      Nov 20 at 5:48




      Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
      – Matt
      Nov 20 at 5:48












      up vote
      3
      down vote













      You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.



      df.A.str.extract(r'({})'.format('|'.join(name)))




             0
      0 Jake
      1 Matt
      2 Matt
      3 Henry
      4 Jake





      share|improve this answer

























        up vote
        3
        down vote













        You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.



        df.A.str.extract(r'({})'.format('|'.join(name)))




               0
        0 Jake
        1 Matt
        2 Matt
        3 Henry
        4 Jake





        share|improve this answer























          up vote
          3
          down vote










          up vote
          3
          down vote









          You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.



          df.A.str.extract(r'({})'.format('|'.join(name)))




                 0
          0 Jake
          1 Matt
          2 Matt
          3 Henry
          4 Jake





          share|improve this answer












          You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract here.



          df.A.str.extract(r'({})'.format('|'.join(name)))




                 0
          0 Jake
          1 Matt
          2 Matt
          3 Henry
          4 Jake






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 at 5:59









          user3483203

          29.7k72353




          29.7k72353






















              up vote
              1
              down vote













              Here is one method to achieve this:



              first_name = ['Jake','Matt', 'Henry']

              df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})

              df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))


              and you get:



                           A      B
              0 Jake Hill Jake
              1 Matt Dawn Matt
              2 Matt King Matt
              3 Henry White Henry
              4 Jake Hyde Jake





              share|improve this answer





















              • Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
                – Matt
                Nov 20 at 5:46















              up vote
              1
              down vote













              Here is one method to achieve this:



              first_name = ['Jake','Matt', 'Henry']

              df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})

              df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))


              and you get:



                           A      B
              0 Jake Hill Jake
              1 Matt Dawn Matt
              2 Matt King Matt
              3 Henry White Henry
              4 Jake Hyde Jake





              share|improve this answer





















              • Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
                – Matt
                Nov 20 at 5:46













              up vote
              1
              down vote










              up vote
              1
              down vote









              Here is one method to achieve this:



              first_name = ['Jake','Matt', 'Henry']

              df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})

              df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))


              and you get:



                           A      B
              0 Jake Hill Jake
              1 Matt Dawn Matt
              2 Matt King Matt
              3 Henry White Henry
              4 Jake Hyde Jake





              share|improve this answer












              Here is one method to achieve this:



              first_name = ['Jake','Matt', 'Henry']

              df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})

              df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))


              and you get:



                           A      B
              0 Jake Hill Jake
              1 Matt Dawn Matt
              2 Matt King Matt
              3 Henry White Henry
              4 Jake Hyde Jake






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 20 at 5:37









              Gerges Dib

              2,7331719




              2,7331719












              • Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
                – Matt
                Nov 20 at 5:46


















              • Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
                – Matt
                Nov 20 at 5:46
















              Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
              – Matt
              Nov 20 at 5:46




              Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
              – Matt
              Nov 20 at 5:46










              up vote
              0
              down vote













              name = ['Jake','Matt', 'Henry']
              df = pd.read_csv("file.csv")

              #filling nan values in-case if it is there
              df.fillna(0, inplace = True)
              df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")


              Output:



                           A First Name
              0 Jake Hill Jake
              1 Matt Dawn Matt
              2 Matt King Matt
              3 Henry White Henry
              4 Hyde Jake Jake





              share|improve this answer























              • Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
                – Matt
                Nov 20 at 5:47















              up vote
              0
              down vote













              name = ['Jake','Matt', 'Henry']
              df = pd.read_csv("file.csv")

              #filling nan values in-case if it is there
              df.fillna(0, inplace = True)
              df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")


              Output:



                           A First Name
              0 Jake Hill Jake
              1 Matt Dawn Matt
              2 Matt King Matt
              3 Henry White Henry
              4 Hyde Jake Jake





              share|improve this answer























              • Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
                – Matt
                Nov 20 at 5:47













              up vote
              0
              down vote










              up vote
              0
              down vote









              name = ['Jake','Matt', 'Henry']
              df = pd.read_csv("file.csv")

              #filling nan values in-case if it is there
              df.fillna(0, inplace = True)
              df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")


              Output:



                           A First Name
              0 Jake Hill Jake
              1 Matt Dawn Matt
              2 Matt King Matt
              3 Henry White Henry
              4 Hyde Jake Jake





              share|improve this answer














              name = ['Jake','Matt', 'Henry']
              df = pd.read_csv("file.csv")

              #filling nan values in-case if it is there
              df.fillna(0, inplace = True)
              df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")


              Output:



                           A First Name
              0 Jake Hill Jake
              1 Matt Dawn Matt
              2 Matt King Matt
              3 Henry White Henry
              4 Hyde Jake Jake






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 20 at 5:51

























              answered Nov 20 at 5:40









              Chirag

              1,126311




              1,126311












              • Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
                – Matt
                Nov 20 at 5:47


















              • Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
                – Matt
                Nov 20 at 5:47
















              Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
              – Matt
              Nov 20 at 5:47




              Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
              – Matt
              Nov 20 at 5:47










              up vote
              0
              down vote













              Try using:



              A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
              A_final[0]

              , your problem is resolved.






              share|improve this answer























              • What is this doing?
                – pygo
                Nov 20 at 6:03















              up vote
              0
              down vote













              Try using:



              A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
              A_final[0]

              , your problem is resolved.






              share|improve this answer























              • What is this doing?
                – pygo
                Nov 20 at 6:03













              up vote
              0
              down vote










              up vote
              0
              down vote









              Try using:



              A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
              A_final[0]

              , your problem is resolved.






              share|improve this answer














              Try using:



              A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
              A_final[0]

              , your problem is resolved.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 20 at 6:05

























              answered Nov 20 at 6:01









              Jeet Bhattachariya

              11




              11












              • What is this doing?
                – pygo
                Nov 20 at 6:03


















              • What is this doing?
                – pygo
                Nov 20 at 6:03
















              What is this doing?
              – pygo
              Nov 20 at 6:03




              What is this doing?
              – pygo
              Nov 20 at 6:03










              up vote
              0
              down vote













              In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A Fist and choose the First Index of of it and passing to lambda using apply method.



              DataFrame Structure:



              df
              A
              0 Jake Hill
              1 Matt Dawn
              2 Matt King
              3 Henry White
              4 Jake Hyde


              Your name Var..



              $ name
              ['Jake', 'Matt', 'Henry']


              Your Final desired Dataset:



              Parameter n can be used to limit the number of splits in the output.



              df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))

              print(df)
              A
              0 Jake
              1 Matt
              2 Matt
              3 Henry
              4 Jake




              It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :



              >>> df
              A
              0 Jake Hill
              1 Matt Dawn
              2 Matt King
              3 Henry White
              4 Jake Hyde


              >>> df['A'].str.split(n=1, expand=True)[0]
              0 Jake
              1 Matt
              2 Matt
              3 Henry
              4 Jake
              Name: 0, dtype: object


              OR In case you want inplace replacement for column A ..



              df['A'] = df['A'].str.split(n=1, expand=True)[0]





              share|improve this answer























              • your input df is different from the user input. In this problem first name is customised.
                – Mohamed Thasin ah
                Nov 20 at 5:59










              • @MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
                – pygo
                Nov 20 at 6:00












              • In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.
                – Mohamed Thasin ah
                Nov 20 at 6:02

















              up vote
              0
              down vote













              In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A Fist and choose the First Index of of it and passing to lambda using apply method.



              DataFrame Structure:



              df
              A
              0 Jake Hill
              1 Matt Dawn
              2 Matt King
              3 Henry White
              4 Jake Hyde


              Your name Var..



              $ name
              ['Jake', 'Matt', 'Henry']


              Your Final desired Dataset:



              Parameter n can be used to limit the number of splits in the output.



              df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))

              print(df)
              A
              0 Jake
              1 Matt
              2 Matt
              3 Henry
              4 Jake




              It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :



              >>> df
              A
              0 Jake Hill
              1 Matt Dawn
              2 Matt King
              3 Henry White
              4 Jake Hyde


              >>> df['A'].str.split(n=1, expand=True)[0]
              0 Jake
              1 Matt
              2 Matt
              3 Henry
              4 Jake
              Name: 0, dtype: object


              OR In case you want inplace replacement for column A ..



              df['A'] = df['A'].str.split(n=1, expand=True)[0]





              share|improve this answer























              • your input df is different from the user input. In this problem first name is customised.
                – Mohamed Thasin ah
                Nov 20 at 5:59










              • @MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
                – pygo
                Nov 20 at 6:00












              • In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.
                – Mohamed Thasin ah
                Nov 20 at 6:02















              up vote
              0
              down vote










              up vote
              0
              down vote









              In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A Fist and choose the First Index of of it and passing to lambda using apply method.



              DataFrame Structure:



              df
              A
              0 Jake Hill
              1 Matt Dawn
              2 Matt King
              3 Henry White
              4 Jake Hyde


              Your name Var..



              $ name
              ['Jake', 'Matt', 'Henry']


              Your Final desired Dataset:



              Parameter n can be used to limit the number of splits in the output.



              df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))

              print(df)
              A
              0 Jake
              1 Matt
              2 Matt
              3 Henry
              4 Jake




              It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :



              >>> df
              A
              0 Jake Hill
              1 Matt Dawn
              2 Matt King
              3 Henry White
              4 Jake Hyde


              >>> df['A'].str.split(n=1, expand=True)[0]
              0 Jake
              1 Matt
              2 Matt
              3 Henry
              4 Jake
              Name: 0, dtype: object


              OR In case you want inplace replacement for column A ..



              df['A'] = df['A'].str.split(n=1, expand=True)[0]





              share|improve this answer














              In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A Fist and choose the First Index of of it and passing to lambda using apply method.



              DataFrame Structure:



              df
              A
              0 Jake Hill
              1 Matt Dawn
              2 Matt King
              3 Henry White
              4 Jake Hyde


              Your name Var..



              $ name
              ['Jake', 'Matt', 'Henry']


              Your Final desired Dataset:



              Parameter n can be used to limit the number of splits in the output.



              df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))

              print(df)
              A
              0 Jake
              1 Matt
              2 Matt
              3 Henry
              4 Jake




              It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :



              >>> df
              A
              0 Jake Hill
              1 Matt Dawn
              2 Matt King
              3 Henry White
              4 Jake Hyde


              >>> df['A'].str.split(n=1, expand=True)[0]
              0 Jake
              1 Matt
              2 Matt
              3 Henry
              4 Jake
              Name: 0, dtype: object


              OR In case you want inplace replacement for column A ..



              df['A'] = df['A'].str.split(n=1, expand=True)[0]






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 20 at 6:55

























              answered Nov 20 at 5:44









              pygo

              1,7391416




              1,7391416












              • your input df is different from the user input. In this problem first name is customised.
                – Mohamed Thasin ah
                Nov 20 at 5:59










              • @MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
                – pygo
                Nov 20 at 6:00












              • In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.
                – Mohamed Thasin ah
                Nov 20 at 6:02




















              • your input df is different from the user input. In this problem first name is customised.
                – Mohamed Thasin ah
                Nov 20 at 5:59










              • @MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
                – pygo
                Nov 20 at 6:00












              • In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.
                – Mohamed Thasin ah
                Nov 20 at 6:02


















              your input df is different from the user input. In this problem first name is customised.
              – Mohamed Thasin ah
              Nov 20 at 5:59




              your input df is different from the user input. In this problem first name is customised.
              – Mohamed Thasin ah
              Nov 20 at 5:59












              @MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
              – pygo
              Nov 20 at 6:00






              @MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
              – pygo
              Nov 20 at 6:00














              In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.
              – Mohamed Thasin ah
              Nov 20 at 6:02






              In your input df at 3 rd index, user provides as White Henry but you took it as Henry White.
              – Mohamed Thasin ah
              Nov 20 at 6:02












              up vote
              0
              down vote













              This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).



              # split the name strings into columns as new dataframe
              df1 = df.A.str.split(' ', expand=True)
              # Keep the first names in the new dataframe and fill the rest with
              # empty strings, then sum the df1 column string values to make a new array
              names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
              # find the array indexes where no first names were found
              no_match_idx = np.where(names_result == '')[0]
              # fill the no first name index locations with original dataframe values
              names_result[no_match_idx] = df.A.values[no_match_idx]
              # make a dataframe using the results
              df_out = pd.DataFrame(names_result, columns=['A'])

              # to find names with a first and last name that are both found in the
              # first names list:
              # df_out['dups'] = df1.isin(name).sum(axis=1) > 1





              share|improve this answer



























                up vote
                0
                down vote













                This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).



                # split the name strings into columns as new dataframe
                df1 = df.A.str.split(' ', expand=True)
                # Keep the first names in the new dataframe and fill the rest with
                # empty strings, then sum the df1 column string values to make a new array
                names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
                # find the array indexes where no first names were found
                no_match_idx = np.where(names_result == '')[0]
                # fill the no first name index locations with original dataframe values
                names_result[no_match_idx] = df.A.values[no_match_idx]
                # make a dataframe using the results
                df_out = pd.DataFrame(names_result, columns=['A'])

                # to find names with a first and last name that are both found in the
                # first names list:
                # df_out['dups'] = df1.isin(name).sum(axis=1) > 1





                share|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).



                  # split the name strings into columns as new dataframe
                  df1 = df.A.str.split(' ', expand=True)
                  # Keep the first names in the new dataframe and fill the rest with
                  # empty strings, then sum the df1 column string values to make a new array
                  names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
                  # find the array indexes where no first names were found
                  no_match_idx = np.where(names_result == '')[0]
                  # fill the no first name index locations with original dataframe values
                  names_result[no_match_idx] = df.A.values[no_match_idx]
                  # make a dataframe using the results
                  df_out = pd.DataFrame(names_result, columns=['A'])

                  # to find names with a first and last name that are both found in the
                  # first names list:
                  # df_out['dups'] = df1.isin(name).sum(axis=1) > 1





                  share|improve this answer














                  This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).



                  # split the name strings into columns as new dataframe
                  df1 = df.A.str.split(' ', expand=True)
                  # Keep the first names in the new dataframe and fill the rest with
                  # empty strings, then sum the df1 column string values to make a new array
                  names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
                  # find the array indexes where no first names were found
                  no_match_idx = np.where(names_result == '')[0]
                  # fill the no first name index locations with original dataframe values
                  names_result[no_match_idx] = df.A.values[no_match_idx]
                  # make a dataframe using the results
                  df_out = pd.DataFrame(names_result, columns=['A'])

                  # to find names with a first and last name that are both found in the
                  # first names list:
                  # df_out['dups'] = df1.isin(name).sum(axis=1) > 1






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 21 at 2:38

























                  answered Nov 21 at 2:00









                  b2002

                  536148




                  536148






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53386763%2fsplit-string-from-a-preset-list-of-strings-from-pandas-df-column%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      404 Error Contact Form 7 ajax form submitting

                      How to know if a Active Directory user can login interactively

                      Refactoring coordinates for Minecraft Pi buildings written in Python