Re: Combining Multiple Regex in Pandas Dataframe












0















I have a dataset that looks like this:



0         03/25/93 Total time of visit (in minutes):n
1 6/18/85 Primary Care Doctor:n
2 sshe plans to move as of 7/8/71 In-Home Servic...


and comprises of dates in different formats such as:



04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009


I need to extract the dates and sort them in ascending order, keeping the following rules:




  • Assume dates in MM/DD/YY

  • Assume dates with the year encoded in two digits


I have to return Pandas Series with correct date in chronological order:



For example, if the series was like this:



0    1999
1 2010
2 1978


I need to return the following series:



0    2
1 4
2 0


where the first column is the index and second is the count(years) appearing in dataset. For instance, if the year 1999 appeared twice it will return two(2) in the second column.



I have been able to extract and match the date patterns, however I am unable to combine them into single expression to match the entire dataframe:



re1 = df.str.extract(r'((?:d{,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|.|s|,)s?d{,2}[a-z]*(?:-|,|s)?s?d{2,4})')
re2 = df.str.extract(r'((?:d{1,2})(?:(?:/|-)d{1,2})(?:(?:/|-)d{2,4}))')
re3 = df.str.extract(r'((?:d{1,2}(?:-|/))?d{4})')


How can I combine the above regular expressions into single expression and return it as Series?










share|improve this question























  • The first part of the question doesn't seem to match with the second half. Are you just after value_counts() ?

    – Dark
    Nov 24 '18 at 13:21











  • See regex101.com/r/HwMb0t/1

    – Wiktor Stribiżew
    Nov 24 '18 at 13:34











  • Did it work for you?

    – Wiktor Stribiżew
    Nov 25 '18 at 17:50













  • The site is useful. It helped me combine the regex.

    – Bhavin Patel
    Nov 27 '18 at 4:07
















0















I have a dataset that looks like this:



0         03/25/93 Total time of visit (in minutes):n
1 6/18/85 Primary Care Doctor:n
2 sshe plans to move as of 7/8/71 In-Home Servic...


and comprises of dates in different formats such as:



04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009


I need to extract the dates and sort them in ascending order, keeping the following rules:




  • Assume dates in MM/DD/YY

  • Assume dates with the year encoded in two digits


I have to return Pandas Series with correct date in chronological order:



For example, if the series was like this:



0    1999
1 2010
2 1978


I need to return the following series:



0    2
1 4
2 0


where the first column is the index and second is the count(years) appearing in dataset. For instance, if the year 1999 appeared twice it will return two(2) in the second column.



I have been able to extract and match the date patterns, however I am unable to combine them into single expression to match the entire dataframe:



re1 = df.str.extract(r'((?:d{,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|.|s|,)s?d{,2}[a-z]*(?:-|,|s)?s?d{2,4})')
re2 = df.str.extract(r'((?:d{1,2})(?:(?:/|-)d{1,2})(?:(?:/|-)d{2,4}))')
re3 = df.str.extract(r'((?:d{1,2}(?:-|/))?d{4})')


How can I combine the above regular expressions into single expression and return it as Series?










share|improve this question























  • The first part of the question doesn't seem to match with the second half. Are you just after value_counts() ?

    – Dark
    Nov 24 '18 at 13:21











  • See regex101.com/r/HwMb0t/1

    – Wiktor Stribiżew
    Nov 24 '18 at 13:34











  • Did it work for you?

    – Wiktor Stribiżew
    Nov 25 '18 at 17:50













  • The site is useful. It helped me combine the regex.

    – Bhavin Patel
    Nov 27 '18 at 4:07














0












0








0








I have a dataset that looks like this:



0         03/25/93 Total time of visit (in minutes):n
1 6/18/85 Primary Care Doctor:n
2 sshe plans to move as of 7/8/71 In-Home Servic...


and comprises of dates in different formats such as:



04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009


I need to extract the dates and sort them in ascending order, keeping the following rules:




  • Assume dates in MM/DD/YY

  • Assume dates with the year encoded in two digits


I have to return Pandas Series with correct date in chronological order:



For example, if the series was like this:



0    1999
1 2010
2 1978


I need to return the following series:



0    2
1 4
2 0


where the first column is the index and second is the count(years) appearing in dataset. For instance, if the year 1999 appeared twice it will return two(2) in the second column.



I have been able to extract and match the date patterns, however I am unable to combine them into single expression to match the entire dataframe:



re1 = df.str.extract(r'((?:d{,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|.|s|,)s?d{,2}[a-z]*(?:-|,|s)?s?d{2,4})')
re2 = df.str.extract(r'((?:d{1,2})(?:(?:/|-)d{1,2})(?:(?:/|-)d{2,4}))')
re3 = df.str.extract(r'((?:d{1,2}(?:-|/))?d{4})')


How can I combine the above regular expressions into single expression and return it as Series?










share|improve this question














I have a dataset that looks like this:



0         03/25/93 Total time of visit (in minutes):n
1 6/18/85 Primary Care Doctor:n
2 sshe plans to move as of 7/8/71 In-Home Servic...


and comprises of dates in different formats such as:



04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009


I need to extract the dates and sort them in ascending order, keeping the following rules:




  • Assume dates in MM/DD/YY

  • Assume dates with the year encoded in two digits


I have to return Pandas Series with correct date in chronological order:



For example, if the series was like this:



0    1999
1 2010
2 1978


I need to return the following series:



0    2
1 4
2 0


where the first column is the index and second is the count(years) appearing in dataset. For instance, if the year 1999 appeared twice it will return two(2) in the second column.



I have been able to extract and match the date patterns, however I am unable to combine them into single expression to match the entire dataframe:



re1 = df.str.extract(r'((?:d{,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|.|s|,)s?d{,2}[a-z]*(?:-|,|s)?s?d{2,4})')
re2 = df.str.extract(r'((?:d{1,2})(?:(?:/|-)d{1,2})(?:(?:/|-)d{2,4}))')
re3 = df.str.extract(r'((?:d{1,2}(?:-|/))?d{4})')


How can I combine the above regular expressions into single expression and return it as Series?







python regex pandas text-mining






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 24 '18 at 13:14









Bhavin PatelBhavin Patel

12




12













  • The first part of the question doesn't seem to match with the second half. Are you just after value_counts() ?

    – Dark
    Nov 24 '18 at 13:21











  • See regex101.com/r/HwMb0t/1

    – Wiktor Stribiżew
    Nov 24 '18 at 13:34











  • Did it work for you?

    – Wiktor Stribiżew
    Nov 25 '18 at 17:50













  • The site is useful. It helped me combine the regex.

    – Bhavin Patel
    Nov 27 '18 at 4:07



















  • The first part of the question doesn't seem to match with the second half. Are you just after value_counts() ?

    – Dark
    Nov 24 '18 at 13:21











  • See regex101.com/r/HwMb0t/1

    – Wiktor Stribiżew
    Nov 24 '18 at 13:34











  • Did it work for you?

    – Wiktor Stribiżew
    Nov 25 '18 at 17:50













  • The site is useful. It helped me combine the regex.

    – Bhavin Patel
    Nov 27 '18 at 4:07

















The first part of the question doesn't seem to match with the second half. Are you just after value_counts() ?

– Dark
Nov 24 '18 at 13:21





The first part of the question doesn't seem to match with the second half. Are you just after value_counts() ?

– Dark
Nov 24 '18 at 13:21













See regex101.com/r/HwMb0t/1

– Wiktor Stribiżew
Nov 24 '18 at 13:34





See regex101.com/r/HwMb0t/1

– Wiktor Stribiżew
Nov 24 '18 at 13:34













Did it work for you?

– Wiktor Stribiżew
Nov 25 '18 at 17:50







Did it work for you?

– Wiktor Stribiżew
Nov 25 '18 at 17:50















The site is useful. It helped me combine the regex.

– Bhavin Patel
Nov 27 '18 at 4:07





The site is useful. It helped me combine the regex.

– Bhavin Patel
Nov 27 '18 at 4:07












1 Answer
1






active

oldest

votes


















0














You may use



((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})


See the regex demo



The point is to join the parts using | operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract could output the match.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458518%2fre-combining-multiple-regex-in-pandas-dataframe%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You may use



    ((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})


    See the regex demo



    The point is to join the parts using | operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract could output the match.






    share|improve this answer




























      0














      You may use



      ((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})


      See the regex demo



      The point is to join the parts using | operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract could output the match.






      share|improve this answer


























        0












        0








        0







        You may use



        ((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})


        See the regex demo



        The point is to join the parts using | operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract could output the match.






        share|improve this answer













        You may use



        ((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})


        See the regex demo



        The point is to join the parts using | operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract could output the match.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 1 '18 at 21:29









        Wiktor StribiżewWiktor Stribiżew

        317k16134215




        317k16134215
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458518%2fre-combining-multiple-regex-in-pandas-dataframe%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            TypeError: fit_transform() missing 1 required positional argument: 'X'