How to split multiple UPPERCASE/delimiter/text using regex? (VBA)
I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:
I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.
Which I'm trying to split as follows using regex:
Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.
So far I managed to do this by creating something like this
([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)
But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).
If I try something more elegant, like
([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)
It obviously matches only the word before the first hyphen into the first group, and everything else into next.
To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.
I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)
regex vba split uppercase
add a comment |
I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:
I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.
Which I'm trying to split as follows using regex:
Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.
So far I managed to do this by creating something like this
([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)
But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).
If I try something more elegant, like
([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)
It obviously matches only the word before the first hyphen into the first group, and everything else into next.
To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.
I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)
regex vba split uppercase
Something like this?([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)
?
– Wiktor Stribiżew
Nov 24 '18 at 0:29
Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.
– Kuba Wronecki
Nov 24 '18 at 0:39
add a comment |
I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:
I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.
Which I'm trying to split as follows using regex:
Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.
So far I managed to do this by creating something like this
([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)
But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).
If I try something more elegant, like
([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)
It obviously matches only the word before the first hyphen into the first group, and everything else into next.
To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.
I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)
regex vba split uppercase
I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:
I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.
Which I'm trying to split as follows using regex:
Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.
So far I managed to do this by creating something like this
([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)
But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).
If I try something more elegant, like
([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)
It obviously matches only the word before the first hyphen into the first group, and everything else into next.
To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.
I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)
regex vba split uppercase
regex vba split uppercase
asked Nov 24 '18 at 0:19
Kuba WroneckiKuba Wronecki
1
1
Something like this?([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)
?
– Wiktor Stribiżew
Nov 24 '18 at 0:29
Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.
– Kuba Wronecki
Nov 24 '18 at 0:39
add a comment |
Something like this?([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)
?
– Wiktor Stribiżew
Nov 24 '18 at 0:29
Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.
– Kuba Wronecki
Nov 24 '18 at 0:39
Something like this?
([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)
?– Wiktor Stribiżew
Nov 24 '18 at 0:29
Something like this?
([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)
?– Wiktor Stribiżew
Nov 24 '18 at 0:29
Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.
– Kuba Wronecki
Nov 24 '18 at 0:39
Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.
– Kuba Wronecki
Nov 24 '18 at 0:39
add a comment |
2 Answers
2
active
oldest
votes
I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.
However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.
If is not just an one off processing, you can always use VBA as well, something like:
Sub TextToColumns()
Dim ws As Worksheet: Set ws = ActiveSheet
Dim lRow As Long, sndHyphen As Long, R As Long
lRow = ws.Cells(1, 1).End(xlDown).Row
For R = 1 To lRow 'Iterate through all rows containing this data
sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
Next R
End Sub
Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.
– Kuba Wronecki
Nov 25 '18 at 10:39
add a comment |
Thank you for your input. I finally managed to do this using two subs:
Sub locfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp1, rozp2 As String
For i = 1 To endrow
str = Sheets("Dane").Cells(i, 10).Value
myregexp.Global = True
myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"
If Not str = "" Then
Set myMatches = myregexp.Execute(str)
j = 1
For Each myMatch In myMatches
If myMatch.Value <> "" Then
Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
j = j + 1
End If
Next
End If
Next i
End Sub
Then extracted diagnoses using
Sub rozpfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp, loc As Collection
Dim splitted() As String
Dim rozpoznanie, lokalizacja
Dim wb As Workbook
Dim ws As Worksheet
Set wb = ThisWorkbook
Set ws = wb.Worksheets("Dane")
For i = 1 To endrow
str = ws.Cells(i, 10).Value
Set loc = New Collection
Set rozp = New Collection
For j = 1 To 2
If ws.Cells(i, 10 + j) <> "" Then
loc.Add ws.Cells(i, 10 + j).Value
End If
Next j
For Each lokalizacja In loc
If lokalizacja <> "I" Then
str = Replace(str, lokalizacja, "xxx")
Else
lokalizacja = "I-"
str = Replace(str, lokalizacja, "xxx-")
End If
Next lokalizacja
splitted = split(str, "xxx")
For j = 0 To UBound(splitted)
If splitted(j) <> "" Then
myregexp.Pattern = "-[^w]"
myMatch = myregexp.Replace(splitted(j), "")
rozp.Add (Trim(myMatch))
End If
Next j
j = 1
For Each rozpoznanie In rozp
ws.Cells(i, 12 + j).Value = rozpoznanie
j = j + 1
Next rozpoznanie
Next i
End Sub
While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454141%2fhow-to-split-multiple-uppercase-delimiter-text-using-regex-vba%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.
However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.
If is not just an one off processing, you can always use VBA as well, something like:
Sub TextToColumns()
Dim ws As Worksheet: Set ws = ActiveSheet
Dim lRow As Long, sndHyphen As Long, R As Long
lRow = ws.Cells(1, 1).End(xlDown).Row
For R = 1 To lRow 'Iterate through all rows containing this data
sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
Next R
End Sub
Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.
– Kuba Wronecki
Nov 25 '18 at 10:39
add a comment |
I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.
However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.
If is not just an one off processing, you can always use VBA as well, something like:
Sub TextToColumns()
Dim ws As Worksheet: Set ws = ActiveSheet
Dim lRow As Long, sndHyphen As Long, R As Long
lRow = ws.Cells(1, 1).End(xlDown).Row
For R = 1 To lRow 'Iterate through all rows containing this data
sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
Next R
End Sub
Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.
– Kuba Wronecki
Nov 25 '18 at 10:39
add a comment |
I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.
However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.
If is not just an one off processing, you can always use VBA as well, something like:
Sub TextToColumns()
Dim ws As Worksheet: Set ws = ActiveSheet
Dim lRow As Long, sndHyphen As Long, R As Long
lRow = ws.Cells(1, 1).End(xlDown).Row
For R = 1 To lRow 'Iterate through all rows containing this data
sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
Next R
End Sub
I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.
However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.
If is not just an one off processing, you can always use VBA as well, something like:
Sub TextToColumns()
Dim ws As Worksheet: Set ws = ActiveSheet
Dim lRow As Long, sndHyphen As Long, R As Long
lRow = ws.Cells(1, 1).End(xlDown).Row
For R = 1 To lRow 'Iterate through all rows containing this data
sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
Next R
End Sub
answered Nov 24 '18 at 8:47
DarXydeDarXyde
24026
24026
Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.
– Kuba Wronecki
Nov 25 '18 at 10:39
add a comment |
Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.
– Kuba Wronecki
Nov 25 '18 at 10:39
Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.
– Kuba Wronecki
Nov 25 '18 at 10:39
Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.
– Kuba Wronecki
Nov 25 '18 at 10:39
add a comment |
Thank you for your input. I finally managed to do this using two subs:
Sub locfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp1, rozp2 As String
For i = 1 To endrow
str = Sheets("Dane").Cells(i, 10).Value
myregexp.Global = True
myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"
If Not str = "" Then
Set myMatches = myregexp.Execute(str)
j = 1
For Each myMatch In myMatches
If myMatch.Value <> "" Then
Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
j = j + 1
End If
Next
End If
Next i
End Sub
Then extracted diagnoses using
Sub rozpfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp, loc As Collection
Dim splitted() As String
Dim rozpoznanie, lokalizacja
Dim wb As Workbook
Dim ws As Worksheet
Set wb = ThisWorkbook
Set ws = wb.Worksheets("Dane")
For i = 1 To endrow
str = ws.Cells(i, 10).Value
Set loc = New Collection
Set rozp = New Collection
For j = 1 To 2
If ws.Cells(i, 10 + j) <> "" Then
loc.Add ws.Cells(i, 10 + j).Value
End If
Next j
For Each lokalizacja In loc
If lokalizacja <> "I" Then
str = Replace(str, lokalizacja, "xxx")
Else
lokalizacja = "I-"
str = Replace(str, lokalizacja, "xxx-")
End If
Next lokalizacja
splitted = split(str, "xxx")
For j = 0 To UBound(splitted)
If splitted(j) <> "" Then
myregexp.Pattern = "-[^w]"
myMatch = myregexp.Replace(splitted(j), "")
rozp.Add (Trim(myMatch))
End If
Next j
j = 1
For Each rozpoznanie In rozp
ws.Cells(i, 12 + j).Value = rozpoznanie
j = j + 1
Next rozpoznanie
Next i
End Sub
While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)
add a comment |
Thank you for your input. I finally managed to do this using two subs:
Sub locfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp1, rozp2 As String
For i = 1 To endrow
str = Sheets("Dane").Cells(i, 10).Value
myregexp.Global = True
myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"
If Not str = "" Then
Set myMatches = myregexp.Execute(str)
j = 1
For Each myMatch In myMatches
If myMatch.Value <> "" Then
Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
j = j + 1
End If
Next
End If
Next i
End Sub
Then extracted diagnoses using
Sub rozpfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp, loc As Collection
Dim splitted() As String
Dim rozpoznanie, lokalizacja
Dim wb As Workbook
Dim ws As Worksheet
Set wb = ThisWorkbook
Set ws = wb.Worksheets("Dane")
For i = 1 To endrow
str = ws.Cells(i, 10).Value
Set loc = New Collection
Set rozp = New Collection
For j = 1 To 2
If ws.Cells(i, 10 + j) <> "" Then
loc.Add ws.Cells(i, 10 + j).Value
End If
Next j
For Each lokalizacja In loc
If lokalizacja <> "I" Then
str = Replace(str, lokalizacja, "xxx")
Else
lokalizacja = "I-"
str = Replace(str, lokalizacja, "xxx-")
End If
Next lokalizacja
splitted = split(str, "xxx")
For j = 0 To UBound(splitted)
If splitted(j) <> "" Then
myregexp.Pattern = "-[^w]"
myMatch = myregexp.Replace(splitted(j), "")
rozp.Add (Trim(myMatch))
End If
Next j
j = 1
For Each rozpoznanie In rozp
ws.Cells(i, 12 + j).Value = rozpoznanie
j = j + 1
Next rozpoznanie
Next i
End Sub
While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)
add a comment |
Thank you for your input. I finally managed to do this using two subs:
Sub locfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp1, rozp2 As String
For i = 1 To endrow
str = Sheets("Dane").Cells(i, 10).Value
myregexp.Global = True
myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"
If Not str = "" Then
Set myMatches = myregexp.Execute(str)
j = 1
For Each myMatch In myMatches
If myMatch.Value <> "" Then
Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
j = j + 1
End If
Next
End If
Next i
End Sub
Then extracted diagnoses using
Sub rozpfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp, loc As Collection
Dim splitted() As String
Dim rozpoznanie, lokalizacja
Dim wb As Workbook
Dim ws As Worksheet
Set wb = ThisWorkbook
Set ws = wb.Worksheets("Dane")
For i = 1 To endrow
str = ws.Cells(i, 10).Value
Set loc = New Collection
Set rozp = New Collection
For j = 1 To 2
If ws.Cells(i, 10 + j) <> "" Then
loc.Add ws.Cells(i, 10 + j).Value
End If
Next j
For Each lokalizacja In loc
If lokalizacja <> "I" Then
str = Replace(str, lokalizacja, "xxx")
Else
lokalizacja = "I-"
str = Replace(str, lokalizacja, "xxx-")
End If
Next lokalizacja
splitted = split(str, "xxx")
For j = 0 To UBound(splitted)
If splitted(j) <> "" Then
myregexp.Pattern = "-[^w]"
myMatch = myregexp.Replace(splitted(j), "")
rozp.Add (Trim(myMatch))
End If
Next j
j = 1
For Each rozpoznanie In rozp
ws.Cells(i, 12 + j).Value = rozpoznanie
j = j + 1
Next rozpoznanie
Next i
End Sub
While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)
Thank you for your input. I finally managed to do this using two subs:
Sub locfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp1, rozp2 As String
For i = 1 To endrow
str = Sheets("Dane").Cells(i, 10).Value
myregexp.Global = True
myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"
If Not str = "" Then
Set myMatches = myregexp.Execute(str)
j = 1
For Each myMatch In myMatches
If myMatch.Value <> "" Then
Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
j = j + 1
End If
Next
End If
Next i
End Sub
Then extracted diagnoses using
Sub rozpfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp, loc As Collection
Dim splitted() As String
Dim rozpoznanie, lokalizacja
Dim wb As Workbook
Dim ws As Worksheet
Set wb = ThisWorkbook
Set ws = wb.Worksheets("Dane")
For i = 1 To endrow
str = ws.Cells(i, 10).Value
Set loc = New Collection
Set rozp = New Collection
For j = 1 To 2
If ws.Cells(i, 10 + j) <> "" Then
loc.Add ws.Cells(i, 10 + j).Value
End If
Next j
For Each lokalizacja In loc
If lokalizacja <> "I" Then
str = Replace(str, lokalizacja, "xxx")
Else
lokalizacja = "I-"
str = Replace(str, lokalizacja, "xxx-")
End If
Next lokalizacja
splitted = split(str, "xxx")
For j = 0 To UBound(splitted)
If splitted(j) <> "" Then
myregexp.Pattern = "-[^w]"
myMatch = myregexp.Replace(splitted(j), "")
rozp.Add (Trim(myMatch))
End If
Next j
j = 1
For Each rozpoznanie In rozp
ws.Cells(i, 12 + j).Value = rozpoznanie
j = j + 1
Next rozpoznanie
Next i
End Sub
While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)
answered Nov 25 '18 at 10:46
Kuba WroneckiKuba Wronecki
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454141%2fhow-to-split-multiple-uppercase-delimiter-text-using-regex-vba%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Something like this?
([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)
?– Wiktor Stribiżew
Nov 24 '18 at 0:29
Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.
– Kuba Wronecki
Nov 24 '18 at 0:39