Filter a pandas df based on some rules in a yaml file
up vote
0
down vote
favorite
I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:
variables:
used_often: ['good','bad', 3]
rules:
- dataframe_name: my_name
variables:
consequence: used_often
query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'
output: {'column_d': 1}
- more rules like this
As you can see here I want to filter 3 columns in the df my_name
where column_a = 5
etc. and then in the result of this filter add/change column_d
so that all rows that matched the query have column_d = 1
.
My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:
my_name[(my_name["column_a"] == 5) &
(my_name["column_b"].isin(['good','bad', 3])) &
(my_name["column_c"] != 1)]
So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?
Thanks!
python pandas yaml
|
show 1 more comment
up vote
0
down vote
favorite
I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:
variables:
used_often: ['good','bad', 3]
rules:
- dataframe_name: my_name
variables:
consequence: used_often
query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'
output: {'column_d': 1}
- more rules like this
As you can see here I want to filter 3 columns in the df my_name
where column_a = 5
etc. and then in the result of this filter add/change column_d
so that all rows that matched the query have column_d = 1
.
My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:
my_name[(my_name["column_a"] == 5) &
(my_name["column_b"].isin(['good','bad', 3])) &
(my_name["column_c"] != 1)]
So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?
Thanks!
python pandas yaml
I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For examplequery: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1)
.
– Mohit Motwani
Nov 20 at 12:45
@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02
You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17
@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29
Then the question becomes off-topic:Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
(stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45
|
show 1 more comment
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:
variables:
used_often: ['good','bad', 3]
rules:
- dataframe_name: my_name
variables:
consequence: used_often
query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'
output: {'column_d': 1}
- more rules like this
As you can see here I want to filter 3 columns in the df my_name
where column_a = 5
etc. and then in the result of this filter add/change column_d
so that all rows that matched the query have column_d = 1
.
My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:
my_name[(my_name["column_a"] == 5) &
(my_name["column_b"].isin(['good','bad', 3])) &
(my_name["column_c"] != 1)]
So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?
Thanks!
python pandas yaml
I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:
variables:
used_often: ['good','bad', 3]
rules:
- dataframe_name: my_name
variables:
consequence: used_often
query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'
output: {'column_d': 1}
- more rules like this
As you can see here I want to filter 3 columns in the df my_name
where column_a = 5
etc. and then in the result of this filter add/change column_d
so that all rows that matched the query have column_d = 1
.
My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:
my_name[(my_name["column_a"] == 5) &
(my_name["column_b"].isin(['good','bad', 3])) &
(my_name["column_c"] != 1)]
So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?
Thanks!
python pandas yaml
python pandas yaml
asked Nov 20 at 12:32
Claudiu Creanga
3,67283072
3,67283072
I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For examplequery: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1)
.
– Mohit Motwani
Nov 20 at 12:45
@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02
You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17
@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29
Then the question becomes off-topic:Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
(stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45
|
show 1 more comment
I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For examplequery: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1)
.
– Mohit Motwani
Nov 20 at 12:45
@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02
You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17
@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29
Then the question becomes off-topic:Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
(stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45
I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example
query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1)
.– Mohit Motwani
Nov 20 at 12:45
I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example
query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1)
.– Mohit Motwani
Nov 20 at 12:45
@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02
@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02
You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17
You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17
@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29
@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29
Then the question becomes off-topic:
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
(stackoverflow.com/help/on-topic)– user3471881
Nov 20 at 13:45
Then the question becomes off-topic:
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
(stackoverflow.com/help/on-topic)– user3471881
Nov 20 at 13:45
|
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
0
down vote
There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html
So my query in yaml becomes:
query: 'column_a == 5 and column_b in @consequence and column_c !=1'
and then in python I can read the yaml and filter:
df.query(my_query)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53393068%2ffilter-a-pandas-df-based-on-some-rules-in-a-yaml-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html
So my query in yaml becomes:
query: 'column_a == 5 and column_b in @consequence and column_c !=1'
and then in python I can read the yaml and filter:
df.query(my_query)
add a comment |
up vote
0
down vote
There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html
So my query in yaml becomes:
query: 'column_a == 5 and column_b in @consequence and column_c !=1'
and then in python I can read the yaml and filter:
df.query(my_query)
add a comment |
up vote
0
down vote
up vote
0
down vote
There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html
So my query in yaml becomes:
query: 'column_a == 5 and column_b in @consequence and column_c !=1'
and then in python I can read the yaml and filter:
df.query(my_query)
There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html
So my query in yaml becomes:
query: 'column_a == 5 and column_b in @consequence and column_c !=1'
and then in python I can read the yaml and filter:
df.query(my_query)
answered Nov 20 at 14:57
Claudiu Creanga
3,67283072
3,67283072
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53393068%2ffilter-a-pandas-df-based-on-some-rules-in-a-yaml-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example
query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1)
.– Mohit Motwani
Nov 20 at 12:45
@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02
You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17
@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29
Then the question becomes off-topic:
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
(stackoverflow.com/help/on-topic)– user3471881
Nov 20 at 13:45