Filter a pandas df based on some rules in a yaml file

up vote
0
down vote

favorite

I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:

variables:

  used_often: ['good','bad', 3]



rules:

  - dataframe_name: my_name

    variables: 

      consequence: used_often

    query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'

    output: {'column_d': 1}

  - more rules like this

As you can see here I want to filter 3 columns in the df my_name where column_a = 5 etc. and then in the result of this filter add/change column_d so that all rows that matched the query have column_d = 1.

My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:

my_name[(my_name["column_a"] == 5) &

 (my_name["column_b"].isin(['good','bad', 3])) &

 (my_name["column_c"] != 1)]

So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?

Thanks!

asked Nov 20 at 12:32

Claudiu Creanga

3,67283072

I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
– Mohit Motwani
Nov 20 at 12:45

@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02

You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17

@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29

Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45

|
show 1 more comment

up vote
0
down vote

favorite

I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:

variables:

  used_often: ['good','bad', 3]



rules:

  - dataframe_name: my_name

    variables: 

      consequence: used_often

    query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'

    output: {'column_d': 1}

  - more rules like this

My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:

my_name[(my_name["column_a"] == 5) &

 (my_name["column_b"].isin(['good','bad', 3])) &

 (my_name["column_c"] != 1)]

Thanks!

asked Nov 20 at 12:32

Claudiu Creanga

3,67283072

I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
– Mohit Motwani
Nov 20 at 12:45

@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02

You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17

@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29

Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45

|
show 1 more comment

up vote
0
down vote

favorite

I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:

variables:

  used_often: ['good','bad', 3]



rules:

  - dataframe_name: my_name

    variables: 

      consequence: used_often

    query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'

    output: {'column_d': 1}

  - more rules like this

My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:

my_name[(my_name["column_a"] == 5) &

 (my_name["column_b"].isin(['good','bad', 3])) &

 (my_name["column_c"] != 1)]

Thanks!

asked Nov 20 at 12:32

Claudiu Creanga

3,67283072

I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:

variables:

  used_often: ['good','bad', 3]



rules:

  - dataframe_name: my_name

    variables: 

      consequence: used_often

    query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'

    output: {'column_d': 1}

  - more rules like this

My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:

my_name[(my_name["column_a"] == 5) &

 (my_name["column_b"].isin(['good','bad', 3])) &

 (my_name["column_c"] != 1)]

Thanks!

python pandas yaml

asked Nov 20 at 12:32

Claudiu Creanga

3,67283072

asked Nov 20 at 12:32

Claudiu Creanga

3,67283072

asked Nov 20 at 12:32

Claudiu Creanga

3,67283072

asked Nov 20 at 12:32

Claudiu Creanga

3,67283072

asked Nov 20 at 12:32

Claudiu Creanga

3,67283072

I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
– Mohit Motwani
Nov 20 at 12:45

@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02

You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17

@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29

Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45

|
show 1 more comment

I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
– Mohit Motwani
Nov 20 at 12:45

@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02

You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17

@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29

Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45

I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
– Mohit Motwani
Nov 20 at 12:45

@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02

You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17

@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29

Then the question becomes off-topic:

Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

(stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45

Then the question becomes off-topic:

Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

(stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45

|
show 1 more comment

1 Answer
1

active

oldest

votes

up vote
0
down vote

There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html

So my query in yaml becomes:

query: 'column_a == 5 and column_b in @consequence and column_c !=1'

and then in python I can read the yaml and filter:

df.query(my_query)

answered Nov 20 at 14:57

Claudiu Creanga

3,67283072

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53393068%2ffilter-a-pandas-df-based-on-some-rules-in-a-yaml-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html

So my query in yaml becomes:

query: 'column_a == 5 and column_b in @consequence and column_c !=1'

and then in python I can read the yaml and filter:

df.query(my_query)

answered Nov 20 at 14:57

Claudiu Creanga

3,67283072

add a comment |

up vote
0
down vote

There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html

So my query in yaml becomes:

query: 'column_a == 5 and column_b in @consequence and column_c !=1'

and then in python I can read the yaml and filter:

df.query(my_query)

answered Nov 20 at 14:57

Claudiu Creanga

3,67283072

add a comment |

up vote
0
down vote

There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html

So my query in yaml becomes:

query: 'column_a == 5 and column_b in @consequence and column_c !=1'

and then in python I can read the yaml and filter:

df.query(my_query)

answered Nov 20 at 14:57

Claudiu Creanga

3,67283072

There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html

So my query in yaml becomes:

query: 'column_a == 5 and column_b in @consequence and column_c !=1'

and then in python I can read the yaml and filter:

df.query(my_query)

answered Nov 20 at 14:57

Claudiu Creanga

3,67283072

answered Nov 20 at 14:57

Claudiu Creanga

3,67283072

answered Nov 20 at 14:57

Claudiu Creanga

3,67283072

answered Nov 20 at 14:57

Claudiu Creanga

3,67283072

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk