Pandas dataframe self-dependency in data to fill a column

-1

I have dataframe with data as:

enter image description here

The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.

Same happens for shoes.

ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.

Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.

edited Oct 30 '18 at 11:19

asked Oct 30 '18 at 11:08

frozen shine

5719

3

Will the codes appear only twice? And should one take the name of the first appearance of the code only?

– Franco Piccolo
Oct 30 '18 at 11:13

add a comment |

-1

I have dataframe with data as:

enter image description here

The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.

Same happens for shoes.

ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.

Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.

edited Oct 30 '18 at 11:19

asked Oct 30 '18 at 11:08

frozen shine

5719

3

Will the codes appear only twice? And should one take the name of the first appearance of the code only?

– Franco Piccolo
Oct 30 '18 at 11:13

add a comment |

-1

I have dataframe with data as:

enter image description here

The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.

Same happens for shoes.

ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.

Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.

edited Oct 30 '18 at 11:19

asked Oct 30 '18 at 11:08

frozen shine

5719

I have dataframe with data as:

enter image description here

The value of "relation" is determined from the codeid. Leather has "codeid"=11 which is already appeared against bag, so in relation we put the value bag.

Same happens for shoes.

ToDo: Fill the value of "relation", by putting check on codeid in terms of dataframes. Any help would be appreciated.

Edit: Same codeid e.g. 11 can appear > twice. But the "relation" can have only value as bag because bag is the first one to have codeid=11. i have updated the picture as well.

python pandas

edited Oct 30 '18 at 11:19

asked Oct 30 '18 at 11:08

frozen shine

5719

edited Oct 30 '18 at 11:19

asked Oct 30 '18 at 11:08

frozen shine

5719

edited Oct 30 '18 at 11:19

asked Oct 30 '18 at 11:08

frozen shine

5719

asked Oct 30 '18 at 11:08

frozen shine

5719

asked Oct 30 '18 at 11:08

frozen shine

5719

3

Will the codes appear only twice? And should one take the name of the first appearance of the code only?

– Franco Piccolo
Oct 30 '18 at 11:13

add a comment |

3

Will the codes appear only twice? And should one take the name of the first appearance of the code only?

– Franco Piccolo
Oct 30 '18 at 11:13

Will the codes appear only twice? And should one take the name of the first appearance of the code only?

– Franco Piccolo
Oct 30 '18 at 11:13

add a comment |

3 Answers
3

active

oldest

votes

If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:

df = pd.DataFrame({'id':[1,2,3,4,5],

                   'name':list('brslp'),

                   'codeid':[11,12,13,11,13]})



df['relation'] = df.groupby('codeid')['name'].transform('first')

print (df)

   id name  codeid relation

0   1    b      11        b

1   2    r      12        r

2   3    s      13        s

3   4    l      11        b

4   5    p      13        s

#get first duplicated values of codeid

print (df['codeid'].duplicated(keep='last'))

0     True

1    False

2     True

3    False

4    False

Name: codeid, dtype: bool



#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows   

print (~df['codeid'].duplicated(keep=False))

0    False

1     True

2    False

3    False

4    False

Name: codeid, dtype: bool



#chain boolen mask together 

print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))

0     True

1     True

2     True

3    False

4    False

Name: codeid, dtype: bool

#replace True values by mask by NaN 

df.loc[df['codeid'].duplicated(keep='last') | 

       ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan

print (df)

   id name  codeid relation

0   1    b      11      NaN

1   2    r      12      NaN

2   3    s      13      NaN

3   4    l      11        b

4   5    p      13        s

edited Oct 30 '18 at 12:03

answered Oct 30 '18 at 11:19

jezrael

329k23270349

Could you kindly explain the code, as it seems to be working but it's not working at my end

– frozen shine
Oct 30 '18 at 11:39

@frozenshine - Can you explain more why not working? Problem in sample data or in real?

– jezrael
Oct 30 '18 at 11:40

testing your logic on real data. the last statement is making all values NaN not just the first ones.

– frozen shine
Oct 30 '18 at 11:43

@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

– jezrael
Oct 30 '18 at 11:46

No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

– frozen shine
Oct 30 '18 at 11:57

|
show 3 more comments

I think you want to do something like this:

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])



def codeid_analysis(rows):

    if rows['codeid'] == 11:

        rows['relation'] = 'bag'

    elif rows['codeid'] == 12:

        rows['relation'] = 'shirt' #for example. You should put what you want here

    elif rows['codeid'] == 13:

        rows['relation'] = 'pants' #for example. You should put what you want here

    return rows



result = df.apply(codeid_analysis, axis = 1)

print(result)

answered Oct 30 '18 at 11:30

Snedecor

674

Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

– frozen shine
Oct 30 '18 at 11:58

add a comment |

It is not the optimal solution since it is costly to your memory, but here is my try. df1 is created in order to hold the null values of the relation column, since it seems that nulls are the first occurrence. After some cleaning, the two dataframes are merged to provide into one.

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shopper'],

                  ['something',13,""]], columns = ['name', 'codeid', 'relation'])



df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation

df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry

df1=df1.drop("relation",axis=1)#drop the unneeded column



final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names

answered Nov 22 '18 at 20:16

JoPapou13

913

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53062984%2fpandas-dataframe-self-dependency-in-data-to-fill-a-column%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:

df = pd.DataFrame({'id':[1,2,3,4,5],

                   'name':list('brslp'),

                   'codeid':[11,12,13,11,13]})



df['relation'] = df.groupby('codeid')['name'].transform('first')

print (df)

   id name  codeid relation

0   1    b      11        b

1   2    r      12        r

2   3    s      13        s

3   4    l      11        b

4   5    p      13        s

#get first duplicated values of codeid

print (df['codeid'].duplicated(keep='last'))

0     True

1    False

2     True

3    False

4    False

Name: codeid, dtype: bool



#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows   

print (~df['codeid'].duplicated(keep=False))

0    False

1     True

2    False

3    False

4    False

Name: codeid, dtype: bool



#chain boolen mask together 

print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))

0     True

1     True

2     True

3    False

4    False

Name: codeid, dtype: bool

#replace True values by mask by NaN 

df.loc[df['codeid'].duplicated(keep='last') | 

       ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan

print (df)

   id name  codeid relation

0   1    b      11      NaN

1   2    r      12      NaN

2   3    s      13      NaN

3   4    l      11        b

4   5    p      13        s

edited Oct 30 '18 at 12:03

answered Oct 30 '18 at 11:19

jezrael

329k23270349

Could you kindly explain the code, as it seems to be working but it's not working at my end

– frozen shine
Oct 30 '18 at 11:39

@frozenshine - Can you explain more why not working? Problem in sample data or in real?

– jezrael
Oct 30 '18 at 11:40

testing your logic on real data. the last statement is making all values NaN not just the first ones.

– frozen shine
Oct 30 '18 at 11:43

@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

– jezrael
Oct 30 '18 at 11:46

No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

– frozen shine
Oct 30 '18 at 11:57

|
show 3 more comments

If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:

df = pd.DataFrame({'id':[1,2,3,4,5],

                   'name':list('brslp'),

                   'codeid':[11,12,13,11,13]})



df['relation'] = df.groupby('codeid')['name'].transform('first')

print (df)

   id name  codeid relation

0   1    b      11        b

1   2    r      12        r

2   3    s      13        s

3   4    l      11        b

4   5    p      13        s

#get first duplicated values of codeid

print (df['codeid'].duplicated(keep='last'))

0     True

1    False

2     True

3    False

4    False

Name: codeid, dtype: bool



#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows   

print (~df['codeid'].duplicated(keep=False))

0    False

1     True

2    False

3    False

4    False

Name: codeid, dtype: bool



#chain boolen mask together 

print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))

0     True

1     True

2     True

3    False

4    False

Name: codeid, dtype: bool

#replace True values by mask by NaN 

df.loc[df['codeid'].duplicated(keep='last') | 

       ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan

print (df)

   id name  codeid relation

0   1    b      11      NaN

1   2    r      12      NaN

2   3    s      13      NaN

3   4    l      11        b

4   5    p      13        s

edited Oct 30 '18 at 12:03

answered Oct 30 '18 at 11:19

jezrael

329k23270349

Could you kindly explain the code, as it seems to be working but it's not working at my end

– frozen shine
Oct 30 '18 at 11:39

@frozenshine - Can you explain more why not working? Problem in sample data or in real?

– jezrael
Oct 30 '18 at 11:40

testing your logic on real data. the last statement is making all values NaN not just the first ones.

– frozen shine
Oct 30 '18 at 11:43

@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

– jezrael
Oct 30 '18 at 11:46

No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

– frozen shine
Oct 30 '18 at 11:57

|
show 3 more comments

If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:

df = pd.DataFrame({'id':[1,2,3,4,5],

                   'name':list('brslp'),

                   'codeid':[11,12,13,11,13]})



df['relation'] = df.groupby('codeid')['name'].transform('first')

print (df)

   id name  codeid relation

0   1    b      11        b

1   2    r      12        r

2   3    s      13        s

3   4    l      11        b

4   5    p      13        s

#get first duplicated values of codeid

print (df['codeid'].duplicated(keep='last'))

0     True

1    False

2     True

3    False

4    False

Name: codeid, dtype: bool



#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows   

print (~df['codeid'].duplicated(keep=False))

0    False

1     True

2    False

3    False

4    False

Name: codeid, dtype: bool



#chain boolen mask together 

print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))

0     True

1     True

2     True

3    False

4    False

Name: codeid, dtype: bool

#replace True values by mask by NaN 

df.loc[df['codeid'].duplicated(keep='last') | 

       ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan

print (df)

   id name  codeid relation

0   1    b      11      NaN

1   2    r      12      NaN

2   3    s      13      NaN

3   4    l      11        b

4   5    p      13        s

edited Oct 30 '18 at 12:03

answered Oct 30 '18 at 11:19

jezrael

329k23270349

If want only first dupe value to last duplicated use transform with first and then set NaN values by loc with duplicated:

df = pd.DataFrame({'id':[1,2,3,4,5],

                   'name':list('brslp'),

                   'codeid':[11,12,13,11,13]})



df['relation'] = df.groupby('codeid')['name'].transform('first')

print (df)

   id name  codeid relation

0   1    b      11        b

1   2    r      12        r

2   3    s      13        s

3   4    l      11        b

4   5    p      13        s

#get first duplicated values of codeid

print (df['codeid'].duplicated(keep='last'))

0     True

1    False

2     True

3    False

4    False

Name: codeid, dtype: bool



#get all duplicated values of codeid with inverting boolenam mask by ~ for unique rows   

print (~df['codeid'].duplicated(keep=False))

0    False

1     True

2    False

3    False

4    False

Name: codeid, dtype: bool



#chain boolen mask together 

print (df['codeid'].duplicated(keep='last') | ~df['codeid'].duplicated(keep=False))

0     True

1     True

2     True

3    False

4    False

Name: codeid, dtype: bool

#replace True values by mask by NaN 

df.loc[df['codeid'].duplicated(keep='last') | 

       ~df['codeid'].duplicated(keep=False), 'relation'] = np.nan

print (df)

   id name  codeid relation

0   1    b      11      NaN

1   2    r      12      NaN

2   3    s      13      NaN

3   4    l      11        b

4   5    p      13        s

edited Oct 30 '18 at 12:03

answered Oct 30 '18 at 11:19

jezrael

329k23270349

edited Oct 30 '18 at 12:03

answered Oct 30 '18 at 11:19

jezrael

329k23270349

answered Oct 30 '18 at 11:19

jezrael

329k23270349

answered Oct 30 '18 at 11:19

jezrael

329k23270349

Could you kindly explain the code, as it seems to be working but it's not working at my end

– frozen shine
Oct 30 '18 at 11:39

@frozenshine - Can you explain more why not working? Problem in sample data or in real?

– jezrael
Oct 30 '18 at 11:40

testing your logic on real data. the last statement is making all values NaN not just the first ones.

– frozen shine
Oct 30 '18 at 11:43

@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

– jezrael
Oct 30 '18 at 11:46

No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

– frozen shine
Oct 30 '18 at 11:57

|
show 3 more comments

Could you kindly explain the code, as it seems to be working but it's not working at my end

– frozen shine
Oct 30 '18 at 11:39

@frozenshine - Can you explain more why not working? Problem in sample data or in real?

– jezrael
Oct 30 '18 at 11:40

testing your logic on real data. the last statement is making all values NaN not just the first ones.

– frozen shine
Oct 30 '18 at 11:43

@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

– jezrael
Oct 30 '18 at 11:46

No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

– frozen shine
Oct 30 '18 at 11:57

Could you kindly explain the code, as it seems to be working but it's not working at my end

– frozen shine
Oct 30 '18 at 11:39

@frozenshine - Can you explain more why not working? Problem in sample data or in real?

– jezrael
Oct 30 '18 at 11:40

testing your logic on real data. the last statement is making all values NaN not just the first ones.

– frozen shine
Oct 30 '18 at 11:43

@frozenshine - hmmm, so real data are different like sample data, is possible add more rows, create minimal, complete, and verifiable example ?

– jezrael
Oct 30 '18 at 11:46

No the data is exactly following same pattern that I showed. I only need to figure out why only np.nan line is making all rows as "nan".

– frozen shine
Oct 30 '18 at 11:57

|
show 3 more comments

I think you want to do something like this:

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])



def codeid_analysis(rows):

    if rows['codeid'] == 11:

        rows['relation'] = 'bag'

    elif rows['codeid'] == 12:

        rows['relation'] = 'shirt' #for example. You should put what you want here

    elif rows['codeid'] == 13:

        rows['relation'] = 'pants' #for example. You should put what you want here

    return rows



result = df.apply(codeid_analysis, axis = 1)

print(result)

answered Oct 30 '18 at 11:30

Snedecor

674

Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

– frozen shine
Oct 30 '18 at 11:58

add a comment |

I think you want to do something like this:

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])



def codeid_analysis(rows):

    if rows['codeid'] == 11:

        rows['relation'] = 'bag'

    elif rows['codeid'] == 12:

        rows['relation'] = 'shirt' #for example. You should put what you want here

    elif rows['codeid'] == 13:

        rows['relation'] = 'pants' #for example. You should put what you want here

    return rows



result = df.apply(codeid_analysis, axis = 1)

print(result)

answered Oct 30 '18 at 11:30

Snedecor

674

Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

– frozen shine
Oct 30 '18 at 11:58

add a comment |

I think you want to do something like this:

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])



def codeid_analysis(rows):

    if rows['codeid'] == 11:

        rows['relation'] = 'bag'

    elif rows['codeid'] == 12:

        rows['relation'] = 'shirt' #for example. You should put what you want here

    elif rows['codeid'] == 13:

        rows['relation'] = 'pants' #for example. You should put what you want here

    return rows



result = df.apply(codeid_analysis, axis = 1)

print(result)

answered Oct 30 '18 at 11:30

Snedecor

674

I think you want to do something like this:

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shoes']], columns = ['name', 'codeid', 'relation'])



def codeid_analysis(rows):

    if rows['codeid'] == 11:

        rows['relation'] = 'bag'

    elif rows['codeid'] == 12:

        rows['relation'] = 'shirt' #for example. You should put what you want here

    elif rows['codeid'] == 13:

        rows['relation'] = 'pants' #for example. You should put what you want here

    return rows



result = df.apply(codeid_analysis, axis = 1)

print(result)

answered Oct 30 '18 at 11:30

Snedecor

674

answered Oct 30 '18 at 11:30

Snedecor

674

answered Oct 30 '18 at 11:30

Snedecor

674

answered Oct 30 '18 at 11:30

Snedecor

674

Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

– frozen shine
Oct 30 '18 at 11:58

add a comment |

Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

– frozen shine
Oct 30 '18 at 11:58

Thanks but unfortunately, the question showed only sample data, and real data is quiet big. Cant use manual if and else. :(

– frozen shine
Oct 30 '18 at 11:58

add a comment |

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shopper'],

                  ['something',13,""]], columns = ['name', 'codeid', 'relation'])



df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation

df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry

df1=df1.drop("relation",axis=1)#drop the unneeded column



final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names

answered Nov 22 '18 at 20:16

JoPapou13

913

add a comment |

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shopper'],

                  ['something',13,""]], columns = ['name', 'codeid', 'relation'])



df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation

df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry

df1=df1.drop("relation",axis=1)#drop the unneeded column



final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names

answered Nov 22 '18 at 20:16

JoPapou13

913

add a comment |

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shopper'],

                  ['something',13,""]], columns = ['name', 'codeid', 'relation'])



df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation

df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry

df1=df1.drop("relation",axis=1)#drop the unneeded column



final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names

answered Nov 22 '18 at 20:16

JoPapou13

913

import pandas as pd

df = pd.DataFrame([['bag', 11, 'null'], 

                  ['shoes', 12, 'null'], 

                  ['shopper', 13, 'null'], 

                  ['leather', 11, 'bag'], 

                  ['plastic', 13, 'shopper'],

                  ['something',13,""]], columns = ['name', 'codeid', 'relation'])



df1=df.loc[df['relation'] == 'null'].copy()#create a df with only null values in relation

df1.drop_duplicates(subset=['name'], inplace=True)#drops the duplicates and retains the first entry

df1=df1.drop("relation",axis=1)#drop the unneeded column



final_df=pd.merge(df, df1, left_on='codeid', right_on='codeid')#merge the two dfs on the columns names

answered Nov 22 '18 at 20:16

JoPapou13

913

answered Nov 22 '18 at 20:16

JoPapou13

913

answered Nov 22 '18 at 20:16

JoPapou13

913

answered Nov 22 '18 at 20:16

JoPapou13

913

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk