How to split multiple UPPERCASE/delimiter/text using regex? (VBA)












-1















I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:



I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego 
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.


Which I'm trying to split as follows using regex:



Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.


So far I managed to do this by creating something like this



([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)


But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).



If I try something more elegant, like



([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)


It obviously matches only the word before the first hyphen into the first group, and everything else into next.



To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.



I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)










share|improve this question























  • Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

    – Wiktor Stribiżew
    Nov 24 '18 at 0:29













  • Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

    – Kuba Wronecki
    Nov 24 '18 at 0:39


















-1















I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:



I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego 
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.


Which I'm trying to split as follows using regex:



Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.


So far I managed to do this by creating something like this



([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)


But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).



If I try something more elegant, like



([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)


It obviously matches only the word before the first hyphen into the first group, and everything else into next.



To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.



I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)










share|improve this question























  • Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

    – Wiktor Stribiżew
    Nov 24 '18 at 0:29













  • Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

    – Kuba Wronecki
    Nov 24 '18 at 0:39
















-1












-1








-1








I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:



I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego 
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.


Which I'm trying to split as follows using regex:



Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.


So far I managed to do this by creating something like this



([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)


But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).



If I try something more elegant, like



([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)


It obviously matches only the word before the first hyphen into the first group, and everything else into next.



To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.



I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)










share|improve this question














I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:



I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego 
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.


Which I'm trying to split as follows using regex:



Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.


So far I managed to do this by creating something like this



([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)


But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).



If I try something more elegant, like



([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)


It obviously matches only the word before the first hyphen into the first group, and everything else into next.



To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.



I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)







regex vba split uppercase






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 24 '18 at 0:19









Kuba WroneckiKuba Wronecki

1




1













  • Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

    – Wiktor Stribiżew
    Nov 24 '18 at 0:29













  • Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

    – Kuba Wronecki
    Nov 24 '18 at 0:39





















  • Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

    – Wiktor Stribiżew
    Nov 24 '18 at 0:29













  • Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

    – Kuba Wronecki
    Nov 24 '18 at 0:39



















Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

– Wiktor Stribiżew
Nov 24 '18 at 0:29







Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

– Wiktor Stribiżew
Nov 24 '18 at 0:29















Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

– Kuba Wronecki
Nov 24 '18 at 0:39







Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

– Kuba Wronecki
Nov 24 '18 at 0:39














2 Answers
2






active

oldest

votes


















0














I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.



However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.



If is not just an one off processing, you can always use VBA as well, something like:



Sub TextToColumns()
Dim ws As Worksheet: Set ws = ActiveSheet
Dim lRow As Long, sndHyphen As Long, R As Long

lRow = ws.Cells(1, 1).End(xlDown).Row

For R = 1 To lRow 'Iterate through all rows containing this data
sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
Next R

End Sub





share|improve this answer
























  • Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

    – Kuba Wronecki
    Nov 25 '18 at 10:39



















0














Thank you for your input. I finally managed to do this using two subs:



Sub locfinder()

Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp1, rozp2 As String

For i = 1 To endrow
str = Sheets("Dane").Cells(i, 10).Value
myregexp.Global = True
myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"

If Not str = "" Then
Set myMatches = myregexp.Execute(str)
j = 1
For Each myMatch In myMatches
If myMatch.Value <> "" Then
Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
j = j + 1
End If
Next
End If
Next i
End Sub


Then extracted diagnoses using



Sub rozpfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp

Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp, loc As Collection
Dim splitted() As String
Dim rozpoznanie, lokalizacja
Dim wb As Workbook
Dim ws As Worksheet
Set wb = ThisWorkbook
Set ws = wb.Worksheets("Dane")

For i = 1 To endrow
str = ws.Cells(i, 10).Value
Set loc = New Collection
Set rozp = New Collection

For j = 1 To 2
If ws.Cells(i, 10 + j) <> "" Then
loc.Add ws.Cells(i, 10 + j).Value
End If
Next j
For Each lokalizacja In loc
If lokalizacja <> "I" Then
str = Replace(str, lokalizacja, "xxx")
Else
lokalizacja = "I-"
str = Replace(str, lokalizacja, "xxx-")
End If
Next lokalizacja
splitted = split(str, "xxx")
For j = 0 To UBound(splitted)
If splitted(j) <> "" Then
myregexp.Pattern = "-[^w]"
myMatch = myregexp.Replace(splitted(j), "")
rozp.Add (Trim(myMatch))
End If
Next j
j = 1
For Each rozpoznanie In rozp
ws.Cells(i, 12 + j).Value = rozpoznanie
j = j + 1
Next rozpoznanie
Next i
End Sub


While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454141%2fhow-to-split-multiple-uppercase-delimiter-text-using-regex-vba%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.



    However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.



    If is not just an one off processing, you can always use VBA as well, something like:



    Sub TextToColumns()
    Dim ws As Worksheet: Set ws = ActiveSheet
    Dim lRow As Long, sndHyphen As Long, R As Long

    lRow = ws.Cells(1, 1).End(xlDown).Row

    For R = 1 To lRow 'Iterate through all rows containing this data
    sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
    ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
    ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
    Next R

    End Sub





    share|improve this answer
























    • Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

      – Kuba Wronecki
      Nov 25 '18 at 10:39
















    0














    I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.



    However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.



    If is not just an one off processing, you can always use VBA as well, something like:



    Sub TextToColumns()
    Dim ws As Worksheet: Set ws = ActiveSheet
    Dim lRow As Long, sndHyphen As Long, R As Long

    lRow = ws.Cells(1, 1).End(xlDown).Row

    For R = 1 To lRow 'Iterate through all rows containing this data
    sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
    ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
    ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
    Next R

    End Sub





    share|improve this answer
























    • Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

      – Kuba Wronecki
      Nov 25 '18 at 10:39














    0












    0








    0







    I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.



    However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.



    If is not just an one off processing, you can always use VBA as well, something like:



    Sub TextToColumns()
    Dim ws As Worksheet: Set ws = ActiveSheet
    Dim lRow As Long, sndHyphen As Long, R As Long

    lRow = ws.Cells(1, 1).End(xlDown).Row

    For R = 1 To lRow 'Iterate through all rows containing this data
    sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
    ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
    ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
    Next R

    End Sub





    share|improve this answer













    I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.



    However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.



    If is not just an one off processing, you can always use VBA as well, something like:



    Sub TextToColumns()
    Dim ws As Worksheet: Set ws = ActiveSheet
    Dim lRow As Long, sndHyphen As Long, R As Long

    lRow = ws.Cells(1, 1).End(xlDown).Row

    For R = 1 To lRow 'Iterate through all rows containing this data
    sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
    ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
    ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
    Next R

    End Sub






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 24 '18 at 8:47









    DarXydeDarXyde

    24026




    24026













    • Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

      – Kuba Wronecki
      Nov 25 '18 at 10:39



















    • Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

      – Kuba Wronecki
      Nov 25 '18 at 10:39

















    Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

    – Kuba Wronecki
    Nov 25 '18 at 10:39





    Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

    – Kuba Wronecki
    Nov 25 '18 at 10:39













    0














    Thank you for your input. I finally managed to do this using two subs:



    Sub locfinder()

    Dim myregexp As RegExp
    Set myregexp = New RegExp
    Dim myMatches As Variant
    Dim myMatch As Variant
    Dim str As String
    Dim i, j As Integer
    Dim endrow As Integer
    Sheets("dane").Activate
    endrow = LastRow
    Dim rozp1, rozp2 As String

    For i = 1 To endrow
    str = Sheets("Dane").Cells(i, 10).Value
    myregexp.Global = True
    myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"

    If Not str = "" Then
    Set myMatches = myregexp.Execute(str)
    j = 1
    For Each myMatch In myMatches
    If myMatch.Value <> "" Then
    Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
    j = j + 1
    End If
    Next
    End If
    Next i
    End Sub


    Then extracted diagnoses using



    Sub rozpfinder()
    Dim myregexp As RegExp
    Set myregexp = New RegExp

    Dim myMatches As Variant
    Dim myMatch As Variant
    Dim str As String
    Dim i, j As Integer
    Dim endrow As Integer
    Sheets("dane").Activate
    endrow = LastRow
    Dim rozp, loc As Collection
    Dim splitted() As String
    Dim rozpoznanie, lokalizacja
    Dim wb As Workbook
    Dim ws As Worksheet
    Set wb = ThisWorkbook
    Set ws = wb.Worksheets("Dane")

    For i = 1 To endrow
    str = ws.Cells(i, 10).Value
    Set loc = New Collection
    Set rozp = New Collection

    For j = 1 To 2
    If ws.Cells(i, 10 + j) <> "" Then
    loc.Add ws.Cells(i, 10 + j).Value
    End If
    Next j
    For Each lokalizacja In loc
    If lokalizacja <> "I" Then
    str = Replace(str, lokalizacja, "xxx")
    Else
    lokalizacja = "I-"
    str = Replace(str, lokalizacja, "xxx-")
    End If
    Next lokalizacja
    splitted = split(str, "xxx")
    For j = 0 To UBound(splitted)
    If splitted(j) <> "" Then
    myregexp.Pattern = "-[^w]"
    myMatch = myregexp.Replace(splitted(j), "")
    rozp.Add (Trim(myMatch))
    End If
    Next j
    j = 1
    For Each rozpoznanie In rozp
    ws.Cells(i, 12 + j).Value = rozpoznanie
    j = j + 1
    Next rozpoznanie
    Next i
    End Sub


    While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)






    share|improve this answer




























      0














      Thank you for your input. I finally managed to do this using two subs:



      Sub locfinder()

      Dim myregexp As RegExp
      Set myregexp = New RegExp
      Dim myMatches As Variant
      Dim myMatch As Variant
      Dim str As String
      Dim i, j As Integer
      Dim endrow As Integer
      Sheets("dane").Activate
      endrow = LastRow
      Dim rozp1, rozp2 As String

      For i = 1 To endrow
      str = Sheets("Dane").Cells(i, 10).Value
      myregexp.Global = True
      myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"

      If Not str = "" Then
      Set myMatches = myregexp.Execute(str)
      j = 1
      For Each myMatch In myMatches
      If myMatch.Value <> "" Then
      Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
      j = j + 1
      End If
      Next
      End If
      Next i
      End Sub


      Then extracted diagnoses using



      Sub rozpfinder()
      Dim myregexp As RegExp
      Set myregexp = New RegExp

      Dim myMatches As Variant
      Dim myMatch As Variant
      Dim str As String
      Dim i, j As Integer
      Dim endrow As Integer
      Sheets("dane").Activate
      endrow = LastRow
      Dim rozp, loc As Collection
      Dim splitted() As String
      Dim rozpoznanie, lokalizacja
      Dim wb As Workbook
      Dim ws As Worksheet
      Set wb = ThisWorkbook
      Set ws = wb.Worksheets("Dane")

      For i = 1 To endrow
      str = ws.Cells(i, 10).Value
      Set loc = New Collection
      Set rozp = New Collection

      For j = 1 To 2
      If ws.Cells(i, 10 + j) <> "" Then
      loc.Add ws.Cells(i, 10 + j).Value
      End If
      Next j
      For Each lokalizacja In loc
      If lokalizacja <> "I" Then
      str = Replace(str, lokalizacja, "xxx")
      Else
      lokalizacja = "I-"
      str = Replace(str, lokalizacja, "xxx-")
      End If
      Next lokalizacja
      splitted = split(str, "xxx")
      For j = 0 To UBound(splitted)
      If splitted(j) <> "" Then
      myregexp.Pattern = "-[^w]"
      myMatch = myregexp.Replace(splitted(j), "")
      rozp.Add (Trim(myMatch))
      End If
      Next j
      j = 1
      For Each rozpoznanie In rozp
      ws.Cells(i, 12 + j).Value = rozpoznanie
      j = j + 1
      Next rozpoznanie
      Next i
      End Sub


      While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)






      share|improve this answer


























        0












        0








        0







        Thank you for your input. I finally managed to do this using two subs:



        Sub locfinder()

        Dim myregexp As RegExp
        Set myregexp = New RegExp
        Dim myMatches As Variant
        Dim myMatch As Variant
        Dim str As String
        Dim i, j As Integer
        Dim endrow As Integer
        Sheets("dane").Activate
        endrow = LastRow
        Dim rozp1, rozp2 As String

        For i = 1 To endrow
        str = Sheets("Dane").Cells(i, 10).Value
        myregexp.Global = True
        myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"

        If Not str = "" Then
        Set myMatches = myregexp.Execute(str)
        j = 1
        For Each myMatch In myMatches
        If myMatch.Value <> "" Then
        Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
        j = j + 1
        End If
        Next
        End If
        Next i
        End Sub


        Then extracted diagnoses using



        Sub rozpfinder()
        Dim myregexp As RegExp
        Set myregexp = New RegExp

        Dim myMatches As Variant
        Dim myMatch As Variant
        Dim str As String
        Dim i, j As Integer
        Dim endrow As Integer
        Sheets("dane").Activate
        endrow = LastRow
        Dim rozp, loc As Collection
        Dim splitted() As String
        Dim rozpoznanie, lokalizacja
        Dim wb As Workbook
        Dim ws As Worksheet
        Set wb = ThisWorkbook
        Set ws = wb.Worksheets("Dane")

        For i = 1 To endrow
        str = ws.Cells(i, 10).Value
        Set loc = New Collection
        Set rozp = New Collection

        For j = 1 To 2
        If ws.Cells(i, 10 + j) <> "" Then
        loc.Add ws.Cells(i, 10 + j).Value
        End If
        Next j
        For Each lokalizacja In loc
        If lokalizacja <> "I" Then
        str = Replace(str, lokalizacja, "xxx")
        Else
        lokalizacja = "I-"
        str = Replace(str, lokalizacja, "xxx-")
        End If
        Next lokalizacja
        splitted = split(str, "xxx")
        For j = 0 To UBound(splitted)
        If splitted(j) <> "" Then
        myregexp.Pattern = "-[^w]"
        myMatch = myregexp.Replace(splitted(j), "")
        rozp.Add (Trim(myMatch))
        End If
        Next j
        j = 1
        For Each rozpoznanie In rozp
        ws.Cells(i, 12 + j).Value = rozpoznanie
        j = j + 1
        Next rozpoznanie
        Next i
        End Sub


        While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)






        share|improve this answer













        Thank you for your input. I finally managed to do this using two subs:



        Sub locfinder()

        Dim myregexp As RegExp
        Set myregexp = New RegExp
        Dim myMatches As Variant
        Dim myMatch As Variant
        Dim str As String
        Dim i, j As Integer
        Dim endrow As Integer
        Sheets("dane").Activate
        endrow = LastRow
        Dim rozp1, rozp2 As String

        For i = 1 To endrow
        str = Sheets("Dane").Cells(i, 10).Value
        myregexp.Global = True
        myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"

        If Not str = "" Then
        Set myMatches = myregexp.Execute(str)
        j = 1
        For Each myMatch In myMatches
        If myMatch.Value <> "" Then
        Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
        j = j + 1
        End If
        Next
        End If
        Next i
        End Sub


        Then extracted diagnoses using



        Sub rozpfinder()
        Dim myregexp As RegExp
        Set myregexp = New RegExp

        Dim myMatches As Variant
        Dim myMatch As Variant
        Dim str As String
        Dim i, j As Integer
        Dim endrow As Integer
        Sheets("dane").Activate
        endrow = LastRow
        Dim rozp, loc As Collection
        Dim splitted() As String
        Dim rozpoznanie, lokalizacja
        Dim wb As Workbook
        Dim ws As Worksheet
        Set wb = ThisWorkbook
        Set ws = wb.Worksheets("Dane")

        For i = 1 To endrow
        str = ws.Cells(i, 10).Value
        Set loc = New Collection
        Set rozp = New Collection

        For j = 1 To 2
        If ws.Cells(i, 10 + j) <> "" Then
        loc.Add ws.Cells(i, 10 + j).Value
        End If
        Next j
        For Each lokalizacja In loc
        If lokalizacja <> "I" Then
        str = Replace(str, lokalizacja, "xxx")
        Else
        lokalizacja = "I-"
        str = Replace(str, lokalizacja, "xxx-")
        End If
        Next lokalizacja
        splitted = split(str, "xxx")
        For j = 0 To UBound(splitted)
        If splitted(j) <> "" Then
        myregexp.Pattern = "-[^w]"
        myMatch = myregexp.Replace(splitted(j), "")
        rozp.Add (Trim(myMatch))
        End If
        Next j
        j = 1
        For Each rozpoznanie In rozp
        ws.Cells(i, 12 + j).Value = rozpoznanie
        j = j + 1
        Next rozpoznanie
        Next i
        End Sub


        While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 25 '18 at 10:46









        Kuba WroneckiKuba Wronecki

        1




        1






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454141%2fhow-to-split-multiple-uppercase-delimiter-text-using-regex-vba%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            Refactoring coordinates for Minecraft Pi buildings written in Python