Split string from a preset list of strings from pandas df column
up vote
3
down vote
favorite
I have a pandas dataframe that looks like below. It has about a million rows.
name = ['Jake','Matt', 'Henry']
0 A
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake
I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.
0 A
1 Jake
2 Matt
3 Matt
4 Henry
5 Jake
Thanks in advance. I am new to python so still figuring out the easiest way to do this.
python python-3.x pandas python-2.7
add a comment |
up vote
3
down vote
favorite
I have a pandas dataframe that looks like below. It has about a million rows.
name = ['Jake','Matt', 'Henry']
0 A
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake
I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.
0 A
1 Jake
2 Matt
3 Matt
4 Henry
5 Jake
Thanks in advance. I am new to python so still figuring out the easiest way to do this.
python python-3.x pandas python-2.7
2
what if value of column A doesn't exist in list?
– Sociopath
Nov 20 at 5:31
1
What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
– CIsForCookies
Nov 20 at 5:31
Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
– Matt
Nov 20 at 5:33
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I have a pandas dataframe that looks like below. It has about a million rows.
name = ['Jake','Matt', 'Henry']
0 A
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake
I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.
0 A
1 Jake
2 Matt
3 Matt
4 Henry
5 Jake
Thanks in advance. I am new to python so still figuring out the easiest way to do this.
python python-3.x pandas python-2.7
I have a pandas dataframe that looks like below. It has about a million rows.
name = ['Jake','Matt', 'Henry']
0 A
1 Jake Hill
2 Matt Dawn
3 Matt King
4 White Henry
5 Hyde Jake
I want to iterate over the list and the df['A'] column and return only the first names. For example, the final dataframe should look like this.
0 A
1 Jake
2 Matt
3 Matt
4 Henry
5 Jake
Thanks in advance. I am new to python so still figuring out the easiest way to do this.
python python-3.x pandas python-2.7
python python-3.x pandas python-2.7
edited Nov 20 at 5:45
asked Nov 20 at 5:29
Matt
546
546
2
what if value of column A doesn't exist in list?
– Sociopath
Nov 20 at 5:31
1
What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
– CIsForCookies
Nov 20 at 5:31
Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
– Matt
Nov 20 at 5:33
add a comment |
2
what if value of column A doesn't exist in list?
– Sociopath
Nov 20 at 5:31
1
What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
– CIsForCookies
Nov 20 at 5:31
Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
– Matt
Nov 20 at 5:33
2
2
what if value of column A doesn't exist in list?
– Sociopath
Nov 20 at 5:31
what if value of column A doesn't exist in list?
– Sociopath
Nov 20 at 5:31
1
1
What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
– CIsForCookies
Nov 20 at 5:31
What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
– CIsForCookies
Nov 20 at 5:31
Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
– Matt
Nov 20 at 5:33
Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
– Matt
Nov 20 at 5:33
add a comment |
7 Answers
7
active
oldest
votes
up vote
2
down vote
accepted
You need:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})
def func(x):
for k in first_name:
if k in x:
return k
return x
df['A'] = df['A'].apply(lambda x: func(x))
Output:
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
5 Dwayne John
Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:48
add a comment |
up vote
3
down vote
You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract
here.
df.A.str.extract(r'({})'.format('|'.join(name)))
0
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
add a comment |
up vote
1
down vote
Here is one method to achieve this:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})
df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))
and you get:
A B
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Jake Hyde Jake
Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:46
add a comment |
up vote
0
down vote
name = ['Jake','Matt', 'Henry']
df = pd.read_csv("file.csv")
#filling nan values in-case if it is there
df.fillna(0, inplace = True)
df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")
Output:
A First Name
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Hyde Jake Jake
Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:47
add a comment |
up vote
0
down vote
Try using:
A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
A_final[0]
, your problem is resolved.
What is this doing?
– pygo
Nov 20 at 6:03
add a comment |
up vote
0
down vote
In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A
Fist and choose the First Index of of it and passing to lambda using apply
method.
DataFrame Structure:
df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
Your name
Var..
$ name
['Jake', 'Matt', 'Henry']
Your Final desired Dataset:
Parameter n can be used to limit the number of splits in the output.
df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))
print(df)
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :
>>> df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
>>> df['A'].str.split(n=1, expand=True)[0]
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
Name: 0, dtype: object
OR In case you want inplace replacement for column A
..
df['A'] = df['A'].str.split(n=1, expand=True)[0]
your input df is different from the user input. In this problem first name is customised.
– Mohamed Thasin ah
Nov 20 at 5:59
@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
– pygo
Nov 20 at 6:00
In your input df at 3 rd index, user provides asWhite Henry
but you took it asHenry White
.
– Mohamed Thasin ah
Nov 20 at 6:02
add a comment |
up vote
0
down vote
This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).
# split the name strings into columns as new dataframe
df1 = df.A.str.split(' ', expand=True)
# Keep the first names in the new dataframe and fill the rest with
# empty strings, then sum the df1 column string values to make a new array
names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
# find the array indexes where no first names were found
no_match_idx = np.where(names_result == '')[0]
# fill the no first name index locations with original dataframe values
names_result[no_match_idx] = df.A.values[no_match_idx]
# make a dataframe using the results
df_out = pd.DataFrame(names_result, columns=['A'])
# to find names with a first and last name that are both found in the
# first names list:
# df_out['dups'] = df1.isin(name).sum(axis=1) > 1
add a comment |
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
You need:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})
def func(x):
for k in first_name:
if k in x:
return k
return x
df['A'] = df['A'].apply(lambda x: func(x))
Output:
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
5 Dwayne John
Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:48
add a comment |
up vote
2
down vote
accepted
You need:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})
def func(x):
for k in first_name:
if k in x:
return k
return x
df['A'] = df['A'].apply(lambda x: func(x))
Output:
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
5 Dwayne John
Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:48
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
You need:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})
def func(x):
for k in first_name:
if k in x:
return k
return x
df['A'] = df['A'].apply(lambda x: func(x))
Output:
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
5 Dwayne John
You need:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White','Jake Hyde','Dwayne John']})
def func(x):
for k in first_name:
if k in x:
return k
return x
df['A'] = df['A'].apply(lambda x: func(x))
Output:
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
5 Dwayne John
edited Nov 20 at 5:53
answered Nov 20 at 5:37
Sociopath
3,30971535
3,30971535
Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:48
add a comment |
Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:48
Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:48
Hey. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:48
add a comment |
up vote
3
down vote
You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract
here.
df.A.str.extract(r'({})'.format('|'.join(name)))
0
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
add a comment |
up vote
3
down vote
You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract
here.
df.A.str.extract(r'({})'.format('|'.join(name)))
0
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
add a comment |
up vote
3
down vote
up vote
3
down vote
You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract
here.
df.A.str.extract(r'({})'.format('|'.join(name)))
0
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
You have a list of names to match, and a Series of names to check against. Use a regular expression with str.extract
here.
df.A.str.extract(r'({})'.format('|'.join(name)))
0
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
answered Nov 20 at 5:59
user3483203
29.7k72353
29.7k72353
add a comment |
add a comment |
up vote
1
down vote
Here is one method to achieve this:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})
df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))
and you get:
A B
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Jake Hyde Jake
Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:46
add a comment |
up vote
1
down vote
Here is one method to achieve this:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})
df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))
and you get:
A B
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Jake Hyde Jake
Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:46
add a comment |
up vote
1
down vote
up vote
1
down vote
Here is one method to achieve this:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})
df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))
and you get:
A B
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Jake Hyde Jake
Here is one method to achieve this:
first_name = ['Jake','Matt', 'Henry']
df = pd.DataFrame({'A': ['Jake Hill', 'Matt Dawn', 'Matt King', 'Henry White', 'Jake Hyde']})
df['B'] = df['A'].str.split().apply(lambda x: x[0] if x[0] in first_name else ' '.join(x))
and you get:
A B
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Jake Hyde Jake
answered Nov 20 at 5:37
Gerges Dib
2,7331719
2,7331719
Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:46
add a comment |
Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:46
Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:46
Hey Gerges. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:46
add a comment |
up vote
0
down vote
name = ['Jake','Matt', 'Henry']
df = pd.read_csv("file.csv")
#filling nan values in-case if it is there
df.fillna(0, inplace = True)
df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")
Output:
A First Name
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Hyde Jake Jake
Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:47
add a comment |
up vote
0
down vote
name = ['Jake','Matt', 'Henry']
df = pd.read_csv("file.csv")
#filling nan values in-case if it is there
df.fillna(0, inplace = True)
df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")
Output:
A First Name
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Hyde Jake Jake
Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:47
add a comment |
up vote
0
down vote
up vote
0
down vote
name = ['Jake','Matt', 'Henry']
df = pd.read_csv("file.csv")
#filling nan values in-case if it is there
df.fillna(0, inplace = True)
df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")
Output:
A First Name
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Hyde Jake Jake
name = ['Jake','Matt', 'Henry']
df = pd.read_csv("file.csv")
#filling nan values in-case if it is there
df.fillna(0, inplace = True)
df["First Name"] = df.A.apply(lambda x: list(set(x.split(" ")) & set(name))[0] if x != 0 else "Not Found")
Output:
A First Name
0 Jake Hill Jake
1 Matt Dawn Matt
2 Matt King Matt
3 Henry White Henry
4 Hyde Jake Jake
edited Nov 20 at 5:51
answered Nov 20 at 5:40
Chirag
1,126311
1,126311
Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:47
add a comment |
Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:47
Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:47
Hey Chirag. This specifically takes the first string after the split but it does not work if you have to extract a specific string from the column rows. I edited the question a little bit. Please have a look.
– Matt
Nov 20 at 5:47
add a comment |
up vote
0
down vote
Try using:
A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
A_final[0]
, your problem is resolved.
What is this doing?
– pygo
Nov 20 at 6:03
add a comment |
up vote
0
down vote
Try using:
A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
A_final[0]
, your problem is resolved.
What is this doing?
– pygo
Nov 20 at 6:03
add a comment |
up vote
0
down vote
up vote
0
down vote
Try using:
A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
A_final[0]
, your problem is resolved.
Try using:
A_final=A[0].str.split(' ',expand=True, n=1).str.get(0)
A_final[0]
, your problem is resolved.
edited Nov 20 at 6:05
answered Nov 20 at 6:01
Jeet Bhattachariya
11
11
What is this doing?
– pygo
Nov 20 at 6:03
add a comment |
What is this doing?
– pygo
Nov 20 at 6:03
What is this doing?
– pygo
Nov 20 at 6:03
What is this doing?
– pygo
Nov 20 at 6:03
add a comment |
up vote
0
down vote
In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A
Fist and choose the First Index of of it and passing to lambda using apply
method.
DataFrame Structure:
df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
Your name
Var..
$ name
['Jake', 'Matt', 'Henry']
Your Final desired Dataset:
Parameter n can be used to limit the number of splits in the output.
df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))
print(df)
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :
>>> df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
>>> df['A'].str.split(n=1, expand=True)[0]
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
Name: 0, dtype: object
OR In case you want inplace replacement for column A
..
df['A'] = df['A'].str.split(n=1, expand=True)[0]
your input df is different from the user input. In this problem first name is customised.
– Mohamed Thasin ah
Nov 20 at 5:59
@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
– pygo
Nov 20 at 6:00
In your input df at 3 rd index, user provides asWhite Henry
but you took it asHenry White
.
– Mohamed Thasin ah
Nov 20 at 6:02
add a comment |
up vote
0
down vote
In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A
Fist and choose the First Index of of it and passing to lambda using apply
method.
DataFrame Structure:
df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
Your name
Var..
$ name
['Jake', 'Matt', 'Henry']
Your Final desired Dataset:
Parameter n can be used to limit the number of splits in the output.
df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))
print(df)
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :
>>> df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
>>> df['A'].str.split(n=1, expand=True)[0]
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
Name: 0, dtype: object
OR In case you want inplace replacement for column A
..
df['A'] = df['A'].str.split(n=1, expand=True)[0]
your input df is different from the user input. In this problem first name is customised.
– Mohamed Thasin ah
Nov 20 at 5:59
@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
– pygo
Nov 20 at 6:00
In your input df at 3 rd index, user provides asWhite Henry
but you took it asHenry White
.
– Mohamed Thasin ah
Nov 20 at 6:02
add a comment |
up vote
0
down vote
up vote
0
down vote
In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A
Fist and choose the First Index of of it and passing to lambda using apply
method.
DataFrame Structure:
df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
Your name
Var..
$ name
['Jake', 'Matt', 'Henry']
Your Final desired Dataset:
Parameter n can be used to limit the number of splits in the output.
df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))
print(df)
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :
>>> df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
>>> df['A'].str.split(n=1, expand=True)[0]
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
Name: 0, dtype: object
OR In case you want inplace replacement for column A
..
df['A'] = df['A'].str.split(n=1, expand=True)[0]
In addition to earlier edit, Which i understood now you want to inplace replacement, Which can be done with list comprehension as follows with splitting the column A
Fist and choose the First Index of of it and passing to lambda using apply
method.
DataFrame Structure:
df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
Your name
Var..
$ name
['Jake', 'Matt', 'Henry']
Your Final desired Dataset:
Parameter n can be used to limit the number of splits in the output.
df['A'] = df['A'].str.split(n=1, expand=True)[0].apply(lambda x: x if x in name else ' '.join(x))
print(df)
A
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
It should be simple if you not pressed to take names from a Var and end goal is to get the First name from the dataframe :
>>> df
A
0 Jake Hill
1 Matt Dawn
2 Matt King
3 Henry White
4 Jake Hyde
>>> df['A'].str.split(n=1, expand=True)[0]
0 Jake
1 Matt
2 Matt
3 Henry
4 Jake
Name: 0, dtype: object
OR In case you want inplace replacement for column A
..
df['A'] = df['A'].str.split(n=1, expand=True)[0]
edited Nov 20 at 6:55
answered Nov 20 at 5:44
pygo
1,7391416
1,7391416
your input df is different from the user input. In this problem first name is customised.
– Mohamed Thasin ah
Nov 20 at 5:59
@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
– pygo
Nov 20 at 6:00
In your input df at 3 rd index, user provides asWhite Henry
but you took it asHenry White
.
– Mohamed Thasin ah
Nov 20 at 6:02
add a comment |
your input df is different from the user input. In this problem first name is customised.
– Mohamed Thasin ah
Nov 20 at 5:59
@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
– pygo
Nov 20 at 6:00
In your input df at 3 rd index, user provides asWhite Henry
but you took it asHenry White
.
– Mohamed Thasin ah
Nov 20 at 6:02
your input df is different from the user input. In this problem first name is customised.
– Mohamed Thasin ah
Nov 20 at 5:59
your input df is different from the user input. In this problem first name is customised.
– Mohamed Thasin ah
Nov 20 at 5:59
@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
– pygo
Nov 20 at 6:00
@MohamedThasinah, thnx for the feedback but did not get you, but intent is same.
– pygo
Nov 20 at 6:00
In your input df at 3 rd index, user provides as
White Henry
but you took it as Henry White
.– Mohamed Thasin ah
Nov 20 at 6:02
In your input df at 3 rd index, user provides as
White Henry
but you took it as Henry White
.– Mohamed Thasin ah
Nov 20 at 6:02
add a comment |
up vote
0
down vote
This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).
# split the name strings into columns as new dataframe
df1 = df.A.str.split(' ', expand=True)
# Keep the first names in the new dataframe and fill the rest with
# empty strings, then sum the df1 column string values to make a new array
names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
# find the array indexes where no first names were found
no_match_idx = np.where(names_result == '')[0]
# fill the no first name index locations with original dataframe values
names_result[no_match_idx] = df.A.values[no_match_idx]
# make a dataframe using the results
df_out = pd.DataFrame(names_result, columns=['A'])
# to find names with a first and last name that are both found in the
# first names list:
# df_out['dups'] = df1.isin(name).sum(axis=1) > 1
add a comment |
up vote
0
down vote
This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).
# split the name strings into columns as new dataframe
df1 = df.A.str.split(' ', expand=True)
# Keep the first names in the new dataframe and fill the rest with
# empty strings, then sum the df1 column string values to make a new array
names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
# find the array indexes where no first names were found
no_match_idx = np.where(names_result == '')[0]
# fill the no first name index locations with original dataframe values
names_result[no_match_idx] = df.A.values[no_match_idx]
# make a dataframe using the results
df_out = pd.DataFrame(names_result, columns=['A'])
# to find names with a first and last name that are both found in the
# first names list:
# df_out['dups'] = df1.isin(name).sum(axis=1) > 1
add a comment |
up vote
0
down vote
up vote
0
down vote
This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).
# split the name strings into columns as new dataframe
df1 = df.A.str.split(' ', expand=True)
# Keep the first names in the new dataframe and fill the rest with
# empty strings, then sum the df1 column string values to make a new array
names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
# find the array indexes where no first names were found
no_match_idx = np.where(names_result == '')[0]
# fill the no first name index locations with original dataframe values
names_result[no_match_idx] = df.A.values[no_match_idx]
# make a dataframe using the results
df_out = pd.DataFrame(names_result, columns=['A'])
# to find names with a first and last name that are both found in the
# first names list:
# df_out['dups'] = df1.isin(name).sum(axis=1) > 1
This method won't be fooled by a last name containing one of the first name strings, such as "Matten" or "Jakes", and will combine a first and last name if they are both found in the first names list, such as "Matt Henry" (shows "MattHenry" in the output dataframe).
# split the name strings into columns as new dataframe
df1 = df.A.str.split(' ', expand=True)
# Keep the first names in the new dataframe and fill the rest with
# empty strings, then sum the df1 column string values to make a new array
names_result = np.where(df1.isin(name), df1, '').sum(axis=1)
# find the array indexes where no first names were found
no_match_idx = np.where(names_result == '')[0]
# fill the no first name index locations with original dataframe values
names_result[no_match_idx] = df.A.values[no_match_idx]
# make a dataframe using the results
df_out = pd.DataFrame(names_result, columns=['A'])
# to find names with a first and last name that are both found in the
# first names list:
# df_out['dups'] = df1.isin(name).sum(axis=1) > 1
edited Nov 21 at 2:38
answered Nov 21 at 2:00
b2002
536148
536148
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53386763%2fsplit-string-from-a-preset-list-of-strings-from-pandas-df-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
what if value of column A doesn't exist in list?
– Sociopath
Nov 20 at 5:31
1
What about first names that aren't Jake,Matt,Henry ? Do you want to filter them out?
– CIsForCookies
Nov 20 at 5:31
Then the original name should be retained. For example if the name is Dave Atkins then it should retain the name Dave Atkins but I have made sure that I have all the names. So that should not be a problem.
– Matt
Nov 20 at 5:33