Pandas dataframe self-dependency in data to fill a column
I have dataframe with data as:
The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.
Same happens for shoes.
ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.
Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.
python pandas
add a comment |
I have dataframe with data as:
The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.
Same happens for shoes.
ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.
Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.
python pandas
3
Will the codes appear only twice? And should one take the name of the first appearance of the code only?
– Franco Piccolo
Oct 30 '18 at 11:13
add a comment |
I have dataframe with data as:
The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.
Same happens for shoes.
ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.
Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.
python pandas
I have dataframe with data as:
The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.
Same happens for shoes.
ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.
Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.
python pandas
python pandas
edited Oct 30 '18 at 11:19
frozen shine
asked Oct 30 '18 at 11:08
frozen shinefrozen shine
5719
5719
3
Will the codes appear only twice? And should one take the name of the first appearance of the code only?
– Franco Piccolo
Oct 30 '18 at 11:13
add a comment |
3
Will the codes appear only twice? And should one take the name of the first appearance of the code only?
– Franco Piccolo
Oct 30 '18 at 11:13
3
3
Will the codes appear only twice? And should one take the name of the first appearance of the code only?
– Franco Piccolo
Oct 30 '18 at 11:13
Will the codes appear only twice? And should one take the name of the first appearance of the code only?
– Franco Piccolo
Oct 30 '18 at 11:13
add a comment |
3 Answers
3
active
oldest
votes
If want only first dupe value to last duplicated use transform
with first
and then set NaN
values by loc
with duplicated
:
df = pd.DataFrame({'id':[1,2,3,4,5],
'name':list('brslp'),
'codeid':[11,12,13,11,13]})
df['relation'] = df.groupby('codeid')['name'].transform('first')
print (df)
id name codeid relation
0 1 b 11 b
1 2 r 12 r
2 3 s 13 s
3 4 l 11 b
4 5 p 13 s
#get first duplicated values of codeid
print (df['codeid'].duplicated(keep='last'))
0 True
1 False
2 True
3 False
4 False
Name: codeid, dtype: bool
#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
print (~df['codeid'].duplicated(keep=False))
0 False
1 True
2 False
3 False
4 False
Name: codeid, dtype: bool
#chain boolen mask together
print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
0 True
1 True
2 True
3 False
4 False
Name: codeid, dtype: bool
#replace True values by mask by NaN
df.loc[df['codeid'].duplicated(keep='last') |
~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
print (df)
id name codeid relation
0 1 b 11 NaN
1 2 r 12 NaN
2 3 s 13 NaN
3 4 l 11 b
4 5 p 13 s
Could you kindly explain the code, as it seems to be working but it's not working at my end
– frozen shine
Oct 30 '18 at 11:39
@frozenshine - Can you explain more why not working? Problem in sample data or in real?
– jezrael
Oct 30 '18 at 11:40
testing your logic on real data. the last statement is making all values NaN not just the first ones.
– frozen shine
Oct 30 '18 at 11:43
@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?
– jezrael
Oct 30 '18 at 11:46
No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".
– frozen shine
Oct 30 '18 at 11:57
|
show 3 more comments
I think you want to do something like this:
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])
def codeid_analysis(rows):
if rows['codeid'] == 11:
rows['relation'] = 'bag'
elif rows['codeid'] == 12:
rows['relation'] = 'shirt' #for example. You should put what you want here
elif rows['codeid'] == 13:
rows['relation'] = 'pants' #for example. You should put what you want here
return rows
result = df.apply(codeid_analysis, axis = 1)
print(result)
Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(
– frozen shine
Oct 30 '18 at 11:58
add a comment |
It is not the optimal solution since it is costly to your memory, but here is my try. df1
is created in order to hold the null
values of the relation
column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shopper'],
['something',13,""]], columns = ['name', 'codeid', 'relation'])
df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
df1=df1.drop("relation",axis=1)#drop the unneeded column
final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53062984%2fpandas-dataframe-self-dependency-in-data-to-fill-a-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
If want only first dupe value to last duplicated use transform
with first
and then set NaN
values by loc
with duplicated
:
df = pd.DataFrame({'id':[1,2,3,4,5],
'name':list('brslp'),
'codeid':[11,12,13,11,13]})
df['relation'] = df.groupby('codeid')['name'].transform('first')
print (df)
id name codeid relation
0 1 b 11 b
1 2 r 12 r
2 3 s 13 s
3 4 l 11 b
4 5 p 13 s
#get first duplicated values of codeid
print (df['codeid'].duplicated(keep='last'))
0 True
1 False
2 True
3 False
4 False
Name: codeid, dtype: bool
#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
print (~df['codeid'].duplicated(keep=False))
0 False
1 True
2 False
3 False
4 False
Name: codeid, dtype: bool
#chain boolen mask together
print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
0 True
1 True
2 True
3 False
4 False
Name: codeid, dtype: bool
#replace True values by mask by NaN
df.loc[df['codeid'].duplicated(keep='last') |
~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
print (df)
id name codeid relation
0 1 b 11 NaN
1 2 r 12 NaN
2 3 s 13 NaN
3 4 l 11 b
4 5 p 13 s
Could you kindly explain the code, as it seems to be working but it's not working at my end
– frozen shine
Oct 30 '18 at 11:39
@frozenshine - Can you explain more why not working? Problem in sample data or in real?
– jezrael
Oct 30 '18 at 11:40
testing your logic on real data. the last statement is making all values NaN not just the first ones.
– frozen shine
Oct 30 '18 at 11:43
@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?
– jezrael
Oct 30 '18 at 11:46
No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".
– frozen shine
Oct 30 '18 at 11:57
|
show 3 more comments
If want only first dupe value to last duplicated use transform
with first
and then set NaN
values by loc
with duplicated
:
df = pd.DataFrame({'id':[1,2,3,4,5],
'name':list('brslp'),
'codeid':[11,12,13,11,13]})
df['relation'] = df.groupby('codeid')['name'].transform('first')
print (df)
id name codeid relation
0 1 b 11 b
1 2 r 12 r
2 3 s 13 s
3 4 l 11 b
4 5 p 13 s
#get first duplicated values of codeid
print (df['codeid'].duplicated(keep='last'))
0 True
1 False
2 True
3 False
4 False
Name: codeid, dtype: bool
#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
print (~df['codeid'].duplicated(keep=False))
0 False
1 True
2 False
3 False
4 False
Name: codeid, dtype: bool
#chain boolen mask together
print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
0 True
1 True
2 True
3 False
4 False
Name: codeid, dtype: bool
#replace True values by mask by NaN
df.loc[df['codeid'].duplicated(keep='last') |
~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
print (df)
id name codeid relation
0 1 b 11 NaN
1 2 r 12 NaN
2 3 s 13 NaN
3 4 l 11 b
4 5 p 13 s
Could you kindly explain the code, as it seems to be working but it's not working at my end
– frozen shine
Oct 30 '18 at 11:39
@frozenshine - Can you explain more why not working? Problem in sample data or in real?
– jezrael
Oct 30 '18 at 11:40
testing your logic on real data. the last statement is making all values NaN not just the first ones.
– frozen shine
Oct 30 '18 at 11:43
@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?
– jezrael
Oct 30 '18 at 11:46
No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".
– frozen shine
Oct 30 '18 at 11:57
|
show 3 more comments
If want only first dupe value to last duplicated use transform
with first
and then set NaN
values by loc
with duplicated
:
df = pd.DataFrame({'id':[1,2,3,4,5],
'name':list('brslp'),
'codeid':[11,12,13,11,13]})
df['relation'] = df.groupby('codeid')['name'].transform('first')
print (df)
id name codeid relation
0 1 b 11 b
1 2 r 12 r
2 3 s 13 s
3 4 l 11 b
4 5 p 13 s
#get first duplicated values of codeid
print (df['codeid'].duplicated(keep='last'))
0 True
1 False
2 True
3 False
4 False
Name: codeid, dtype: bool
#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
print (~df['codeid'].duplicated(keep=False))
0 False
1 True
2 False
3 False
4 False
Name: codeid, dtype: bool
#chain boolen mask together
print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
0 True
1 True
2 True
3 False
4 False
Name: codeid, dtype: bool
#replace True values by mask by NaN
df.loc[df['codeid'].duplicated(keep='last') |
~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
print (df)
id name codeid relation
0 1 b 11 NaN
1 2 r 12 NaN
2 3 s 13 NaN
3 4 l 11 b
4 5 p 13 s
If want only first dupe value to last duplicated use transform
with first
and then set NaN
values by loc
with duplicated
:
df = pd.DataFrame({'id':[1,2,3,4,5],
'name':list('brslp'),
'codeid':[11,12,13,11,13]})
df['relation'] = df.groupby('codeid')['name'].transform('first')
print (df)
id name codeid relation
0 1 b 11 b
1 2 r 12 r
2 3 s 13 s
3 4 l 11 b
4 5 p 13 s
#get first duplicated values of codeid
print (df['codeid'].duplicated(keep='last'))
0 True
1 False
2 True
3 False
4 False
Name: codeid, dtype: bool
#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows
print (~df['codeid'].duplicated(keep=False))
0 False
1 True
2 False
3 False
4 False
Name: codeid, dtype: bool
#chain boolen mask together
print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))
0 True
1 True
2 True
3 False
4 False
Name: codeid, dtype: bool
#replace True values by mask by NaN
df.loc[df['codeid'].duplicated(keep='last') |
~df['codeid'].duplicated(keep=False), 'relation'] = np.nan
print (df)
id name codeid relation
0 1 b 11 NaN
1 2 r 12 NaN
2 3 s 13 NaN
3 4 l 11 b
4 5 p 13 s
edited Oct 30 '18 at 12:03
answered Oct 30 '18 at 11:19
jezraeljezrael
329k23270349
329k23270349
Could you kindly explain the code, as it seems to be working but it's not working at my end
– frozen shine
Oct 30 '18 at 11:39
@frozenshine - Can you explain more why not working? Problem in sample data or in real?
– jezrael
Oct 30 '18 at 11:40
testing your logic on real data. the last statement is making all values NaN not just the first ones.
– frozen shine
Oct 30 '18 at 11:43
@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?
– jezrael
Oct 30 '18 at 11:46
No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".
– frozen shine
Oct 30 '18 at 11:57
|
show 3 more comments
Could you kindly explain the code, as it seems to be working but it's not working at my end
– frozen shine
Oct 30 '18 at 11:39
@frozenshine - Can you explain more why not working? Problem in sample data or in real?
– jezrael
Oct 30 '18 at 11:40
testing your logic on real data. the last statement is making all values NaN not just the first ones.
– frozen shine
Oct 30 '18 at 11:43
@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?
– jezrael
Oct 30 '18 at 11:46
No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".
– frozen shine
Oct 30 '18 at 11:57
Could you kindly explain the code, as it seems to be working but it's not working at my end
– frozen shine
Oct 30 '18 at 11:39
Could you kindly explain the code, as it seems to be working but it's not working at my end
– frozen shine
Oct 30 '18 at 11:39
@frozenshine - Can you explain more why not working? Problem in sample data or in real?
– jezrael
Oct 30 '18 at 11:40
@frozenshine - Can you explain more why not working? Problem in sample data or in real?
– jezrael
Oct 30 '18 at 11:40
testing your logic on real data. the last statement is making all values NaN not just the first ones.
– frozen shine
Oct 30 '18 at 11:43
testing your logic on real data. the last statement is making all values NaN not just the first ones.
– frozen shine
Oct 30 '18 at 11:43
@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?
– jezrael
Oct 30 '18 at 11:46
@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?
– jezrael
Oct 30 '18 at 11:46
No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".
– frozen shine
Oct 30 '18 at 11:57
No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".
– frozen shine
Oct 30 '18 at 11:57
|
show 3 more comments
I think you want to do something like this:
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])
def codeid_analysis(rows):
if rows['codeid'] == 11:
rows['relation'] = 'bag'
elif rows['codeid'] == 12:
rows['relation'] = 'shirt' #for example. You should put what you want here
elif rows['codeid'] == 13:
rows['relation'] = 'pants' #for example. You should put what you want here
return rows
result = df.apply(codeid_analysis, axis = 1)
print(result)
Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(
– frozen shine
Oct 30 '18 at 11:58
add a comment |
I think you want to do something like this:
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])
def codeid_analysis(rows):
if rows['codeid'] == 11:
rows['relation'] = 'bag'
elif rows['codeid'] == 12:
rows['relation'] = 'shirt' #for example. You should put what you want here
elif rows['codeid'] == 13:
rows['relation'] = 'pants' #for example. You should put what you want here
return rows
result = df.apply(codeid_analysis, axis = 1)
print(result)
Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(
– frozen shine
Oct 30 '18 at 11:58
add a comment |
I think you want to do something like this:
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])
def codeid_analysis(rows):
if rows['codeid'] == 11:
rows['relation'] = 'bag'
elif rows['codeid'] == 12:
rows['relation'] = 'shirt' #for example. You should put what you want here
elif rows['codeid'] == 13:
rows['relation'] = 'pants' #for example. You should put what you want here
return rows
result = df.apply(codeid_analysis, axis = 1)
print(result)
I think you want to do something like this:
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])
def codeid_analysis(rows):
if rows['codeid'] == 11:
rows['relation'] = 'bag'
elif rows['codeid'] == 12:
rows['relation'] = 'shirt' #for example. You should put what you want here
elif rows['codeid'] == 13:
rows['relation'] = 'pants' #for example. You should put what you want here
return rows
result = df.apply(codeid_analysis, axis = 1)
print(result)
answered Oct 30 '18 at 11:30
SnedecorSnedecor
674
674
Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(
– frozen shine
Oct 30 '18 at 11:58
add a comment |
Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(
– frozen shine
Oct 30 '18 at 11:58
Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(
– frozen shine
Oct 30 '18 at 11:58
Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(
– frozen shine
Oct 30 '18 at 11:58
add a comment |
It is not the optimal solution since it is costly to your memory, but here is my try. df1
is created in order to hold the null
values of the relation
column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shopper'],
['something',13,""]], columns = ['name', 'codeid', 'relation'])
df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
df1=df1.drop("relation",axis=1)#drop the unneeded column
final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names
add a comment |
It is not the optimal solution since it is costly to your memory, but here is my try. df1
is created in order to hold the null
values of the relation
column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shopper'],
['something',13,""]], columns = ['name', 'codeid', 'relation'])
df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
df1=df1.drop("relation",axis=1)#drop the unneeded column
final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names
add a comment |
It is not the optimal solution since it is costly to your memory, but here is my try. df1
is created in order to hold the null
values of the relation
column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shopper'],
['something',13,""]], columns = ['name', 'codeid', 'relation'])
df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
df1=df1.drop("relation",axis=1)#drop the unneeded column
final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names
It is not the optimal solution since it is costly to your memory, but here is my try. df1
is created in order to hold the null
values of the relation
column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.
import pandas as pd
df = pd.DataFrame([['bag', 11, 'null'],
['shoes', 12, 'null'],
['shopper', 13, 'null'],
['leather', 11, 'bag'],
['plastic', 13, 'shopper'],
['something',13,""]], columns = ['name', 'codeid', 'relation'])
df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation
df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry
df1=df1.drop("relation",axis=1)#drop the unneeded column
final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names
answered Nov 22 '18 at 20:16
JoPapou13JoPapou13
913
913
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53062984%2fpandas-dataframe-self-dependency-in-data-to-fill-a-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Will the codes appear only twice? And should one take the name of the first appearance of the code only?
– Franco Piccolo
Oct 30 '18 at 11:13