Filter a pandas df based on some rules in a yaml file











up vote
0
down vote

favorite












I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:



variables:
used_often: ['good','bad', 3]

rules:
- dataframe_name: my_name
variables:
consequence: used_often
query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'
output: {'column_d': 1}
- more rules like this


As you can see here I want to filter 3 columns in the df my_name where column_a = 5 etc. and then in the result of this filter add/change column_d so that all rows that matched the query have column_d = 1.



My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:



my_name[(my_name["column_a"] == 5) &
(my_name["column_b"].isin(['good','bad', 3])) &
(my_name["column_c"] != 1)]


So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?



Thanks!










share|improve this question






















  • I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
    – Mohit Motwani
    Nov 20 at 12:45












  • @MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
    – Claudiu Creanga
    Nov 20 at 13:02










  • You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
    – user3471881
    Nov 20 at 13:17












  • @user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
    – Claudiu Creanga
    Nov 20 at 13:29










  • Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
    – user3471881
    Nov 20 at 13:45















up vote
0
down vote

favorite












I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:



variables:
used_often: ['good','bad', 3]

rules:
- dataframe_name: my_name
variables:
consequence: used_often
query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'
output: {'column_d': 1}
- more rules like this


As you can see here I want to filter 3 columns in the df my_name where column_a = 5 etc. and then in the result of this filter add/change column_d so that all rows that matched the query have column_d = 1.



My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:



my_name[(my_name["column_a"] == 5) &
(my_name["column_b"].isin(['good','bad', 3])) &
(my_name["column_c"] != 1)]


So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?



Thanks!










share|improve this question






















  • I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
    – Mohit Motwani
    Nov 20 at 12:45












  • @MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
    – Claudiu Creanga
    Nov 20 at 13:02










  • You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
    – user3471881
    Nov 20 at 13:17












  • @user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
    – Claudiu Creanga
    Nov 20 at 13:29










  • Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
    – user3471881
    Nov 20 at 13:45













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:



variables:
used_often: ['good','bad', 3]

rules:
- dataframe_name: my_name
variables:
consequence: used_often
query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'
output: {'column_d': 1}
- more rules like this


As you can see here I want to filter 3 columns in the df my_name where column_a = 5 etc. and then in the result of this filter add/change column_d so that all rows that matched the query have column_d = 1.



My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:



my_name[(my_name["column_a"] == 5) &
(my_name["column_b"].isin(['good','bad', 3])) &
(my_name["column_c"] != 1)]


So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?



Thanks!










share|improve this question













I have a configuration yaml file that is supposed to be used by customers and be easy to edit. In the yaml file there are some rules:



variables:
used_often: ['good','bad', 3]

rules:
- dataframe_name: my_name
variables:
consequence: used_often
query: 'column_a = 5 and column_b in ${consequence} and column_c != 1'
output: {'column_d': 1}
- more rules like this


As you can see here I want to filter 3 columns in the df my_name where column_a = 5 etc. and then in the result of this filter add/change column_d so that all rows that matched the query have column_d = 1.



My question is how could I make the query more easy to be filtered with pandas. As it stands, the query above should end up in pandas like this:



my_name[(my_name["column_a"] == 5) &
(my_name["column_b"].isin(['good','bad', 3])) &
(my_name["column_c"] != 1)]


So I have to do a lot of processing to interpret the query in yaml. Are there any tools that could help me or better ways to format the query (I have complete freedom in building the yaml file as long as it is reasonable easy for a client to write it)?



Thanks!







python pandas yaml






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 at 12:32









Claudiu Creanga

3,67283072




3,67283072












  • I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
    – Mohit Motwani
    Nov 20 at 12:45












  • @MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
    – Claudiu Creanga
    Nov 20 at 13:02










  • You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
    – user3471881
    Nov 20 at 13:17












  • @user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
    – Claudiu Creanga
    Nov 20 at 13:29










  • Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
    – user3471881
    Nov 20 at 13:45


















  • I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
    – Mohit Motwani
    Nov 20 at 12:45












  • @MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
    – Claudiu Creanga
    Nov 20 at 13:02










  • You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
    – user3471881
    Nov 20 at 13:17












  • @user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
    – Claudiu Creanga
    Nov 20 at 13:29










  • Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
    – user3471881
    Nov 20 at 13:45
















I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
– Mohit Motwani
Nov 20 at 12:45






I don't understand yaml file. Can you change the query statement in your yaml file? If you can, then you could set the query statement the same as the dataframe condition. For example query: (my_name["column_a"] == 5) & (my_name["column_b"].isin(['good','bad', 3])) & (my_name["column_c"] != 1).
– Mohit Motwani
Nov 20 at 12:45














@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02




@MohitMotwani yes, but then it will be like asking the client to write code in yaml. I was looking for something easier for the client. He will have to know .isin() and other pandas functions...
– Claudiu Creanga
Nov 20 at 13:02












You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17






You want to process a yaml file into python code. As @MohitMotwani shows you, you could just have the client write the actual code. This won't work, you say, because your client needs something easier. This is the main problem with the question because at the end of the day, you know your client and we don't. We can assume that the easier the yaml file is for the client, the more preprocessing you will have to do. I flagged your question as primarily opinion-based because of this.
– user3471881
Nov 20 at 13:17














@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29




@user3471881I would imagine yaml rules to filter dfs are not that uncommon and maybe somebody developed a tool that will help..
– Claudiu Creanga
Nov 20 at 13:29












Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45




Then the question becomes off-topic: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. (stackoverflow.com/help/on-topic)
– user3471881
Nov 20 at 13:45












1 Answer
1






active

oldest

votes

















up vote
0
down vote













There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html



So my query in yaml becomes:



query: 'column_a == 5 and column_b in @consequence and column_c !=1'


and then in python I can read the yaml and filter:



df.query(my_query)





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53393068%2ffilter-a-pandas-df-based-on-some-rules-in-a-yaml-file%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html



    So my query in yaml becomes:



    query: 'column_a == 5 and column_b in @consequence and column_c !=1'


    and then in python I can read the yaml and filter:



    df.query(my_query)





    share|improve this answer

























      up vote
      0
      down vote













      There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html



      So my query in yaml becomes:



      query: 'column_a == 5 and column_b in @consequence and column_c !=1'


      and then in python I can read the yaml and filter:



      df.query(my_query)





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html



        So my query in yaml becomes:



        query: 'column_a == 5 and column_b in @consequence and column_c !=1'


        and then in python I can read the yaml and filter:



        df.query(my_query)





        share|improve this answer












        There is a pandas.query function that does all that https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.query.html



        So my query in yaml becomes:



        query: 'column_a == 5 and column_b in @consequence and column_c !=1'


        and then in python I can read the yaml and filter:



        df.query(my_query)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 20 at 14:57









        Claudiu Creanga

        3,67283072




        3,67283072






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53393068%2ffilter-a-pandas-df-based-on-some-rules-in-a-yaml-file%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            TypeError: fit_transform() missing 1 required positional argument: 'X'