Re: Combining Multiple Regex in Pandas Dataframe
I have a dataset that looks like this:
0 03/25/93 Total time of visit (in minutes):n
1 6/18/85 Primary Care Doctor:n
2 sshe plans to move as of 7/8/71 In-Home Servic...
and comprises of dates in different formats such as:
04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
I need to extract the dates and sort them in ascending order, keeping the following rules:
- Assume dates in MM/DD/YY
- Assume dates with the year encoded in two digits
I have to return Pandas Series with correct date in chronological order:
For example, if the series was like this:
0 1999
1 2010
2 1978
I need to return the following series:
0 2
1 4
2 0
where the first column is the index and second is the count(years) appearing in dataset. For instance, if the year 1999 appeared twice it will return two(2) in the second column.
I have been able to extract and match the date patterns, however I am unable to combine them into single expression to match the entire dataframe:
re1 = df.str.extract(r'((?:d{,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|.|s|,)s?d{,2}[a-z]*(?:-|,|s)?s?d{2,4})')
re2 = df.str.extract(r'((?:d{1,2})(?:(?:/|-)d{1,2})(?:(?:/|-)d{2,4}))')
re3 = df.str.extract(r'((?:d{1,2}(?:-|/))?d{4})')
How can I combine the above regular expressions into single expression and return it as Series?
python regex pandas text-mining
add a comment |
I have a dataset that looks like this:
0 03/25/93 Total time of visit (in minutes):n
1 6/18/85 Primary Care Doctor:n
2 sshe plans to move as of 7/8/71 In-Home Servic...
and comprises of dates in different formats such as:
04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
I need to extract the dates and sort them in ascending order, keeping the following rules:
- Assume dates in MM/DD/YY
- Assume dates with the year encoded in two digits
I have to return Pandas Series with correct date in chronological order:
For example, if the series was like this:
0 1999
1 2010
2 1978
I need to return the following series:
0 2
1 4
2 0
where the first column is the index and second is the count(years) appearing in dataset. For instance, if the year 1999 appeared twice it will return two(2) in the second column.
I have been able to extract and match the date patterns, however I am unable to combine them into single expression to match the entire dataframe:
re1 = df.str.extract(r'((?:d{,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|.|s|,)s?d{,2}[a-z]*(?:-|,|s)?s?d{2,4})')
re2 = df.str.extract(r'((?:d{1,2})(?:(?:/|-)d{1,2})(?:(?:/|-)d{2,4}))')
re3 = df.str.extract(r'((?:d{1,2}(?:-|/))?d{4})')
How can I combine the above regular expressions into single expression and return it as Series?
python regex pandas text-mining
The first part of the question doesn't seem to match with the second half. Are you just aftervalue_counts()
?
– Dark
Nov 24 '18 at 13:21
See regex101.com/r/HwMb0t/1
– Wiktor Stribiżew
Nov 24 '18 at 13:34
Did it work for you?
– Wiktor Stribiżew
Nov 25 '18 at 17:50
The site is useful. It helped me combine the regex.
– Bhavin Patel
Nov 27 '18 at 4:07
add a comment |
I have a dataset that looks like this:
0 03/25/93 Total time of visit (in minutes):n
1 6/18/85 Primary Care Doctor:n
2 sshe plans to move as of 7/8/71 In-Home Servic...
and comprises of dates in different formats such as:
04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
I need to extract the dates and sort them in ascending order, keeping the following rules:
- Assume dates in MM/DD/YY
- Assume dates with the year encoded in two digits
I have to return Pandas Series with correct date in chronological order:
For example, if the series was like this:
0 1999
1 2010
2 1978
I need to return the following series:
0 2
1 4
2 0
where the first column is the index and second is the count(years) appearing in dataset. For instance, if the year 1999 appeared twice it will return two(2) in the second column.
I have been able to extract and match the date patterns, however I am unable to combine them into single expression to match the entire dataframe:
re1 = df.str.extract(r'((?:d{,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|.|s|,)s?d{,2}[a-z]*(?:-|,|s)?s?d{2,4})')
re2 = df.str.extract(r'((?:d{1,2})(?:(?:/|-)d{1,2})(?:(?:/|-)d{2,4}))')
re3 = df.str.extract(r'((?:d{1,2}(?:-|/))?d{4})')
How can I combine the above regular expressions into single expression and return it as Series?
python regex pandas text-mining
I have a dataset that looks like this:
0 03/25/93 Total time of visit (in minutes):n
1 6/18/85 Primary Care Doctor:n
2 sshe plans to move as of 7/8/71 In-Home Servic...
and comprises of dates in different formats such as:
04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
I need to extract the dates and sort them in ascending order, keeping the following rules:
- Assume dates in MM/DD/YY
- Assume dates with the year encoded in two digits
I have to return Pandas Series with correct date in chronological order:
For example, if the series was like this:
0 1999
1 2010
2 1978
I need to return the following series:
0 2
1 4
2 0
where the first column is the index and second is the count(years) appearing in dataset. For instance, if the year 1999 appeared twice it will return two(2) in the second column.
I have been able to extract and match the date patterns, however I am unable to combine them into single expression to match the entire dataframe:
re1 = df.str.extract(r'((?:d{,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|.|s|,)s?d{,2}[a-z]*(?:-|,|s)?s?d{2,4})')
re2 = df.str.extract(r'((?:d{1,2})(?:(?:/|-)d{1,2})(?:(?:/|-)d{2,4}))')
re3 = df.str.extract(r'((?:d{1,2}(?:-|/))?d{4})')
How can I combine the above regular expressions into single expression and return it as Series?
python regex pandas text-mining
python regex pandas text-mining
asked Nov 24 '18 at 13:14
Bhavin PatelBhavin Patel
12
12
The first part of the question doesn't seem to match with the second half. Are you just aftervalue_counts()
?
– Dark
Nov 24 '18 at 13:21
See regex101.com/r/HwMb0t/1
– Wiktor Stribiżew
Nov 24 '18 at 13:34
Did it work for you?
– Wiktor Stribiżew
Nov 25 '18 at 17:50
The site is useful. It helped me combine the regex.
– Bhavin Patel
Nov 27 '18 at 4:07
add a comment |
The first part of the question doesn't seem to match with the second half. Are you just aftervalue_counts()
?
– Dark
Nov 24 '18 at 13:21
See regex101.com/r/HwMb0t/1
– Wiktor Stribiżew
Nov 24 '18 at 13:34
Did it work for you?
– Wiktor Stribiżew
Nov 25 '18 at 17:50
The site is useful. It helped me combine the regex.
– Bhavin Patel
Nov 27 '18 at 4:07
The first part of the question doesn't seem to match with the second half. Are you just after
value_counts()
?– Dark
Nov 24 '18 at 13:21
The first part of the question doesn't seem to match with the second half. Are you just after
value_counts()
?– Dark
Nov 24 '18 at 13:21
See regex101.com/r/HwMb0t/1
– Wiktor Stribiżew
Nov 24 '18 at 13:34
See regex101.com/r/HwMb0t/1
– Wiktor Stribiżew
Nov 24 '18 at 13:34
Did it work for you?
– Wiktor Stribiżew
Nov 25 '18 at 17:50
Did it work for you?
– Wiktor Stribiżew
Nov 25 '18 at 17:50
The site is useful. It helped me combine the regex.
– Bhavin Patel
Nov 27 '18 at 4:07
The site is useful. It helped me combine the regex.
– Bhavin Patel
Nov 27 '18 at 4:07
add a comment |
1 Answer
1
active
oldest
votes
You may use
((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})
See the regex demo
The point is to join the parts using |
operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract
could output the match.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458518%2fre-combining-multiple-regex-in-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You may use
((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})
See the regex demo
The point is to join the parts using |
operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract
could output the match.
add a comment |
You may use
((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})
See the regex demo
The point is to join the parts using |
operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract
could output the match.
add a comment |
You may use
((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})
See the regex demo
The point is to join the parts using |
operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract
could output the match.
You may use
((?:d{0,2}s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[-.s,]s?d{0,2}[a-z]*[-,s]?s?d{2,4}|d{1,2}[/-]d{1,2}[/-]d{2,4}|(?:d{1,2}[-/])?d{4})
See the regex demo
The point is to join the parts using |
operator while keeping all inner groups non-capturing. The outer group must be capturing so that extract
could output the match.
answered Dec 1 '18 at 21:29
Wiktor StribiżewWiktor Stribiżew
317k16134215
317k16134215
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53458518%2fre-combining-multiple-regex-in-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The first part of the question doesn't seem to match with the second half. Are you just after
value_counts()
?– Dark
Nov 24 '18 at 13:21
See regex101.com/r/HwMb0t/1
– Wiktor Stribiżew
Nov 24 '18 at 13:34
Did it work for you?
– Wiktor Stribiżew
Nov 25 '18 at 17:50
The site is useful. It helped me combine the regex.
– Bhavin Patel
Nov 27 '18 at 4:07