Merge two DataFrames based on a column condition and values of a specific column with Pandas in Python 3.x
up vote
0
down vote
favorite
refering to my a few days ago asked question i now have an additional problem with my data. I have following two DataFrames:
>>> df1
A B date
0 1 1 2015-02
1 1 1 2015-03
2 2 2 2017-01
3 2 2 2017-02
>>> df2
A B C 02-2015 03-2015 01-2017 02-2017
0 1 1 2013-07-01 0.10 0.22 0.55 0.77
1 1 1 2015-01-01 0.20 0.12 0.99 0.125
2 2 2 2016-12-01 0.13 0.15 0.15 0.245
3 2 2 2016-01-01 0.33 0.1 0.888 0.64
What i want is following DataFrame:
>>> df1
A B date value
0 1 1 2015-02 0.20
1 1 1 2015-03 0.12
2 2 2 2017-01 0.15
3 2 2 2017-02 0.245
My current code looks like following:
df1['value'] = df2.set_index('A', 'B').lookup(
df1.set_index('A', 'B').index, df1['date'])
This does not work and my df1 is a NoneType because in df2 are duplicate rows with condition A and B == 1. What I want is an additional condition where it first extracts the earliest date for each unqiue A and B, which would be for A and B == 1 the date 2015-02.
From df2 it should take row number 1 because the delta in months is only 1 instead of row 0 where the delta will be 18.
Many thanks in advance!
python pandas dataframe merge mapping
add a comment |
up vote
0
down vote
favorite
refering to my a few days ago asked question i now have an additional problem with my data. I have following two DataFrames:
>>> df1
A B date
0 1 1 2015-02
1 1 1 2015-03
2 2 2 2017-01
3 2 2 2017-02
>>> df2
A B C 02-2015 03-2015 01-2017 02-2017
0 1 1 2013-07-01 0.10 0.22 0.55 0.77
1 1 1 2015-01-01 0.20 0.12 0.99 0.125
2 2 2 2016-12-01 0.13 0.15 0.15 0.245
3 2 2 2016-01-01 0.33 0.1 0.888 0.64
What i want is following DataFrame:
>>> df1
A B date value
0 1 1 2015-02 0.20
1 1 1 2015-03 0.12
2 2 2 2017-01 0.15
3 2 2 2017-02 0.245
My current code looks like following:
df1['value'] = df2.set_index('A', 'B').lookup(
df1.set_index('A', 'B').index, df1['date'])
This does not work and my df1 is a NoneType because in df2 are duplicate rows with condition A and B == 1. What I want is an additional condition where it first extracts the earliest date for each unqiue A and B, which would be for A and B == 1 the date 2015-02.
From df2 it should take row number 1 because the delta in months is only 1 instead of row 0 where the delta will be 18.
Many thanks in advance!
python pandas dataframe merge mapping
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
refering to my a few days ago asked question i now have an additional problem with my data. I have following two DataFrames:
>>> df1
A B date
0 1 1 2015-02
1 1 1 2015-03
2 2 2 2017-01
3 2 2 2017-02
>>> df2
A B C 02-2015 03-2015 01-2017 02-2017
0 1 1 2013-07-01 0.10 0.22 0.55 0.77
1 1 1 2015-01-01 0.20 0.12 0.99 0.125
2 2 2 2016-12-01 0.13 0.15 0.15 0.245
3 2 2 2016-01-01 0.33 0.1 0.888 0.64
What i want is following DataFrame:
>>> df1
A B date value
0 1 1 2015-02 0.20
1 1 1 2015-03 0.12
2 2 2 2017-01 0.15
3 2 2 2017-02 0.245
My current code looks like following:
df1['value'] = df2.set_index('A', 'B').lookup(
df1.set_index('A', 'B').index, df1['date'])
This does not work and my df1 is a NoneType because in df2 are duplicate rows with condition A and B == 1. What I want is an additional condition where it first extracts the earliest date for each unqiue A and B, which would be for A and B == 1 the date 2015-02.
From df2 it should take row number 1 because the delta in months is only 1 instead of row 0 where the delta will be 18.
Many thanks in advance!
python pandas dataframe merge mapping
refering to my a few days ago asked question i now have an additional problem with my data. I have following two DataFrames:
>>> df1
A B date
0 1 1 2015-02
1 1 1 2015-03
2 2 2 2017-01
3 2 2 2017-02
>>> df2
A B C 02-2015 03-2015 01-2017 02-2017
0 1 1 2013-07-01 0.10 0.22 0.55 0.77
1 1 1 2015-01-01 0.20 0.12 0.99 0.125
2 2 2 2016-12-01 0.13 0.15 0.15 0.245
3 2 2 2016-01-01 0.33 0.1 0.888 0.64
What i want is following DataFrame:
>>> df1
A B date value
0 1 1 2015-02 0.20
1 1 1 2015-03 0.12
2 2 2 2017-01 0.15
3 2 2 2017-02 0.245
My current code looks like following:
df1['value'] = df2.set_index('A', 'B').lookup(
df1.set_index('A', 'B').index, df1['date'])
This does not work and my df1 is a NoneType because in df2 are duplicate rows with condition A and B == 1. What I want is an additional condition where it first extracts the earliest date for each unqiue A and B, which would be for A and B == 1 the date 2015-02.
From df2 it should take row number 1 because the delta in months is only 1 instead of row 0 where the delta will be 18.
Many thanks in advance!
python pandas dataframe merge mapping
python pandas dataframe merge mapping
edited Nov 20 at 10:27
asked Nov 20 at 10:12
Michael Gann
554
554
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
It can be achieved by using melt, lambda, sort_values, drop_dulicates
as below
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
output is
A B date value
0 1.0 1.0 2015-02 0.200
1 1.0 1.0 2015-03 0.120
2 2.0 2.0 2017-01 0.150
3 2.0 2.0 2017-02 0.245
complete example is as below.
import pandas as pd
from datetime import datetime
df1 = pd.DataFrame(columns = ['A', 'B', 'date'])
df1.loc[len(df1)] = [1, 1, '2015-02']
df1.loc[len(df1)] = [1, 1, '2015-03']
df1.loc[len(df1)] = [2, 2, '2017-01']
df1.loc[len(df1)] = [2, 2, '2017-02']
df1[['A', 'B']] = df1[['A', 'B']].astype(float)
df2 = pd.DataFrame(columns = ['A', 'B', 'C', '2015-02', '2015-03', '2017-01', '2017-02'])
df2.loc[len(df2)] = [1, 1, '2013-07-01', 0.10, 0.22, 0.55, 0.77]
df2.loc[len(df2)] = [1, 1, '2015-01-01', 0.20, 0.12, 0.99, 0.125]
df2.loc[len(df2)] = [2, 2, '2016-12-01', 0.13, 0.15, 0.15, 0.245]
df2.loc[len(df2)] = [2, 2, '2016-01-01', 0.33, 0.1, 0.888, 0.64]
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
print(df4)
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
It can be achieved by using melt, lambda, sort_values, drop_dulicates
as below
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
output is
A B date value
0 1.0 1.0 2015-02 0.200
1 1.0 1.0 2015-03 0.120
2 2.0 2.0 2017-01 0.150
3 2.0 2.0 2017-02 0.245
complete example is as below.
import pandas as pd
from datetime import datetime
df1 = pd.DataFrame(columns = ['A', 'B', 'date'])
df1.loc[len(df1)] = [1, 1, '2015-02']
df1.loc[len(df1)] = [1, 1, '2015-03']
df1.loc[len(df1)] = [2, 2, '2017-01']
df1.loc[len(df1)] = [2, 2, '2017-02']
df1[['A', 'B']] = df1[['A', 'B']].astype(float)
df2 = pd.DataFrame(columns = ['A', 'B', 'C', '2015-02', '2015-03', '2017-01', '2017-02'])
df2.loc[len(df2)] = [1, 1, '2013-07-01', 0.10, 0.22, 0.55, 0.77]
df2.loc[len(df2)] = [1, 1, '2015-01-01', 0.20, 0.12, 0.99, 0.125]
df2.loc[len(df2)] = [2, 2, '2016-12-01', 0.13, 0.15, 0.15, 0.245]
df2.loc[len(df2)] = [2, 2, '2016-01-01', 0.33, 0.1, 0.888, 0.64]
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
print(df4)
add a comment |
up vote
0
down vote
It can be achieved by using melt, lambda, sort_values, drop_dulicates
as below
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
output is
A B date value
0 1.0 1.0 2015-02 0.200
1 1.0 1.0 2015-03 0.120
2 2.0 2.0 2017-01 0.150
3 2.0 2.0 2017-02 0.245
complete example is as below.
import pandas as pd
from datetime import datetime
df1 = pd.DataFrame(columns = ['A', 'B', 'date'])
df1.loc[len(df1)] = [1, 1, '2015-02']
df1.loc[len(df1)] = [1, 1, '2015-03']
df1.loc[len(df1)] = [2, 2, '2017-01']
df1.loc[len(df1)] = [2, 2, '2017-02']
df1[['A', 'B']] = df1[['A', 'B']].astype(float)
df2 = pd.DataFrame(columns = ['A', 'B', 'C', '2015-02', '2015-03', '2017-01', '2017-02'])
df2.loc[len(df2)] = [1, 1, '2013-07-01', 0.10, 0.22, 0.55, 0.77]
df2.loc[len(df2)] = [1, 1, '2015-01-01', 0.20, 0.12, 0.99, 0.125]
df2.loc[len(df2)] = [2, 2, '2016-12-01', 0.13, 0.15, 0.15, 0.245]
df2.loc[len(df2)] = [2, 2, '2016-01-01', 0.33, 0.1, 0.888, 0.64]
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
print(df4)
add a comment |
up vote
0
down vote
up vote
0
down vote
It can be achieved by using melt, lambda, sort_values, drop_dulicates
as below
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
output is
A B date value
0 1.0 1.0 2015-02 0.200
1 1.0 1.0 2015-03 0.120
2 2.0 2.0 2017-01 0.150
3 2.0 2.0 2017-02 0.245
complete example is as below.
import pandas as pd
from datetime import datetime
df1 = pd.DataFrame(columns = ['A', 'B', 'date'])
df1.loc[len(df1)] = [1, 1, '2015-02']
df1.loc[len(df1)] = [1, 1, '2015-03']
df1.loc[len(df1)] = [2, 2, '2017-01']
df1.loc[len(df1)] = [2, 2, '2017-02']
df1[['A', 'B']] = df1[['A', 'B']].astype(float)
df2 = pd.DataFrame(columns = ['A', 'B', 'C', '2015-02', '2015-03', '2017-01', '2017-02'])
df2.loc[len(df2)] = [1, 1, '2013-07-01', 0.10, 0.22, 0.55, 0.77]
df2.loc[len(df2)] = [1, 1, '2015-01-01', 0.20, 0.12, 0.99, 0.125]
df2.loc[len(df2)] = [2, 2, '2016-12-01', 0.13, 0.15, 0.15, 0.245]
df2.loc[len(df2)] = [2, 2, '2016-01-01', 0.33, 0.1, 0.888, 0.64]
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
print(df4)
It can be achieved by using melt, lambda, sort_values, drop_dulicates
as below
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
output is
A B date value
0 1.0 1.0 2015-02 0.200
1 1.0 1.0 2015-03 0.120
2 2.0 2.0 2017-01 0.150
3 2.0 2.0 2017-02 0.245
complete example is as below.
import pandas as pd
from datetime import datetime
df1 = pd.DataFrame(columns = ['A', 'B', 'date'])
df1.loc[len(df1)] = [1, 1, '2015-02']
df1.loc[len(df1)] = [1, 1, '2015-03']
df1.loc[len(df1)] = [2, 2, '2017-01']
df1.loc[len(df1)] = [2, 2, '2017-02']
df1[['A', 'B']] = df1[['A', 'B']].astype(float)
df2 = pd.DataFrame(columns = ['A', 'B', 'C', '2015-02', '2015-03', '2017-01', '2017-02'])
df2.loc[len(df2)] = [1, 1, '2013-07-01', 0.10, 0.22, 0.55, 0.77]
df2.loc[len(df2)] = [1, 1, '2015-01-01', 0.20, 0.12, 0.99, 0.125]
df2.loc[len(df2)] = [2, 2, '2016-12-01', 0.13, 0.15, 0.15, 0.245]
df2.loc[len(df2)] = [2, 2, '2016-01-01', 0.33, 0.1, 0.888, 0.64]
df3 = df2.melt(id_vars = ['A', 'B', 'C'], var_name='date')
df3[['A', 'B']] = df3[['A', 'B']].astype(float)
df3['Diff'] = df3.apply(lambda row: abs(datetime.strptime(row['date'], '%Y-%m') - datetime.strptime(row['C'], '%Y-%m-%d')), axis=1)
df3.sort_values(['Diff'], ascending=[True], inplace=True)
df3.drop_duplicates(subset=['A', 'B', 'date'], keep='first', inplace=True)
df3.drop(['C', 'Diff'], 1, inplace=True)
df4 = df1.merge(df3, on=['A', 'B', 'date'], how='left')
print(df4)
answered Nov 20 at 19:11
Prince Francis
24428
24428
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390684%2fmerge-two-dataframes-based-on-a-column-condition-and-values-of-a-specific-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown