How to split multiple UPPERCASE/delimiter/text using regex? (VBA)

-1

I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:

I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego 

powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry 

nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM + 

KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie 

przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie 

stwierdza się bakterii odpowiadajacych Helicobacter pylori.

Which I'm trying to split as follows using regex:

Location - I- TRZON

Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.

Location II- ANTRUM + KĄT

Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.

So far I managed to do this by creating something like this

([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)

But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).

If I try something more elegant, like

([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)

It obviously matches only the word before the first hyphen into the first group, and everything else into next.

To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.

I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)

asked Nov 24 '18 at 0:19

Kuba Wronecki

Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

– Wiktor Stribiżew
Nov 24 '18 at 0:29

Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

– Kuba Wronecki
Nov 24 '18 at 0:39

add a comment |

-1

I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:

I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego 

powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry 

nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM + 

KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie 

przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie 

stwierdza się bakterii odpowiadajacych Helicobacter pylori.

Which I'm trying to split as follows using regex:

Location - I- TRZON

Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.

Location II- ANTRUM + KĄT

Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.

So far I managed to do this by creating something like this

([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)

But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).

If I try something more elegant, like

([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)

It obviously matches only the word before the first hyphen into the first group, and everything else into next.

I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)

asked Nov 24 '18 at 0:19

Kuba Wronecki

Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

– Wiktor Stribiżew
Nov 24 '18 at 0:29

Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

– Kuba Wronecki
Nov 24 '18 at 0:39

add a comment |

-1

I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:

I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego 

powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry 

nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM + 

KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie 

przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie 

stwierdza się bakterii odpowiadajacych Helicobacter pylori.

Which I'm trying to split as follows using regex:

Location - I- TRZON

Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.

Location II- ANTRUM + KĄT

Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.

So far I managed to do this by creating something like this

([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)

But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).

If I try something more elegant, like

([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)

It obviously matches only the word before the first hyphen into the first group, and everything else into next.

I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)

asked Nov 24 '18 at 0:19

Kuba Wronecki

I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:

I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego 

powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry 

nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM + 

KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie 

przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie 

stwierdza się bakterii odpowiadajacych Helicobacter pylori.

Which I'm trying to split as follows using regex:

Location - I- TRZON

Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.

Location II- ANTRUM + KĄT

Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.

So far I managed to do this by creating something like this

([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*)[s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]+)*[s]?-+?(.*)

But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).

If I try something more elegant, like

([A-ZŻŹĆŃĄŚŁĘÓ]+[s-+,]*?)-(.*)

It obviously matches only the word before the first hyphen into the first group, and everything else into next.

I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)

regex vba split uppercase

asked Nov 24 '18 at 0:19

Kuba Wronecki

asked Nov 24 '18 at 0:19

Kuba Wronecki

asked Nov 24 '18 at 0:19

Kuba Wronecki

asked Nov 24 '18 at 0:19

Kuba Wronecki

asked Nov 24 '18 at 0:19

Kuba Wronecki

Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

– Wiktor Stribiżew
Nov 24 '18 at 0:29

Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

– Kuba Wronecki
Nov 24 '18 at 0:39

add a comment |

Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

– Wiktor Stribiżew
Nov 24 '18 at 0:29

Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

– Kuba Wronecki
Nov 24 '18 at 0:39

Something like this? ([A-ZŻŹĆĄŚĘŁÓŃ]+-[A-ZŻŹĆĄŚĘŁÓŃs,+-0-9]*?)s*-s*([^]*?)(?=[A-ZŻŹĆĄŚĘŁÓŃ]+-|$)?

– Wiktor Stribiżew
Nov 24 '18 at 0:29

Thank you for your reply! While it works on the string I provided, It didn't work at all in vba (?), and when I tried different string in regexr, it didn't match at all... I tested: TRZON, ANTRUM - In excisionibus examinatis: Gastritis chronica gradus minoris. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobater pylori.

– Kuba Wronecki
Nov 24 '18 at 0:39

add a comment |

2 Answers
2

active

oldest

votes

I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.

However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.

If is not just an one off processing, you can always use VBA as well, something like:

Sub TextToColumns()

    Dim ws As Worksheet: Set ws = ActiveSheet

    Dim lRow As Long, sndHyphen As Long, R As Long



    lRow = ws.Cells(1, 1).End(xlDown).Row



    For R = 1 To lRow                                                               'Iterate through all rows containing this data

        sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-")      'Get the hyphens positions

        ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2)                        'Get the data before the second hyphen

        ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2)                         'Get the data after the second hyphen

    Next R



End Sub

answered Nov 24 '18 at 8:47

DarXyde

24026

Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

– Kuba Wronecki
Nov 25 '18 at 10:39

add a comment |

Thank you for your input. I finally managed to do this using two subs:

Sub locfinder()



Dim myregexp As RegExp

Set myregexp = New RegExp

Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp1, rozp2 As String



For i = 1 To endrow

str = Sheets("Dane").Cells(i, 10).Value

myregexp.Global = True

myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"



If Not str = "" Then

Set myMatches = myregexp.Execute(str)

 j = 1

 For Each myMatch In myMatches

    If myMatch.Value <> "" Then

    Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))

    j = j + 1

    End If

Next

End If

Next i

End Sub

Then extracted diagnoses using

Sub rozpfinder()

Dim myregexp As RegExp

Set myregexp = New RegExp



Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp, loc As Collection

Dim splitted() As String

Dim rozpoznanie, lokalizacja

Dim wb As Workbook

Dim ws As Worksheet

Set wb = ThisWorkbook

Set ws = wb.Worksheets("Dane")



For i = 1 To endrow

    str = ws.Cells(i, 10).Value

    Set loc = New Collection

    Set rozp = New Collection



    For j = 1 To 2

        If ws.Cells(i, 10 + j) <> "" Then

            loc.Add ws.Cells(i, 10 + j).Value

        End If

    Next j

    For Each lokalizacja In loc

        If lokalizacja <> "I" Then

        str = Replace(str, lokalizacja, "xxx")

        Else

        lokalizacja = "I-"

        str = Replace(str, lokalizacja, "xxx-")

        End If

    Next lokalizacja

    splitted = split(str, "xxx")

    For j = 0 To UBound(splitted)

        If splitted(j) <> "" Then

        myregexp.Pattern = "-[^w]"

        myMatch = myregexp.Replace(splitted(j), "")

        rozp.Add (Trim(myMatch))

        End If

    Next j

    j = 1

    For Each rozpoznanie In rozp

        ws.Cells(i, 12 + j).Value = rozpoznanie

        j = j + 1

    Next rozpoznanie

Next i

End Sub

While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)

answered Nov 25 '18 at 10:46

Kuba Wronecki

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53454141%2fhow-to-split-multiple-uppercase-delimiter-text-using-regex-vba%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.

However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.

If is not just an one off processing, you can always use VBA as well, something like:

Sub TextToColumns()

    Dim ws As Worksheet: Set ws = ActiveSheet

    Dim lRow As Long, sndHyphen As Long, R As Long



    lRow = ws.Cells(1, 1).End(xlDown).Row



    For R = 1 To lRow                                                               'Iterate through all rows containing this data

        sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-")      'Get the hyphens positions

        ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2)                        'Get the data before the second hyphen

        ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2)                         'Get the data after the second hyphen

    Next R



End Sub

answered Nov 24 '18 at 8:47

DarXyde

24026

Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

– Kuba Wronecki
Nov 25 '18 at 10:39

add a comment |

I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.

However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.

If is not just an one off processing, you can always use VBA as well, something like:

Sub TextToColumns()

    Dim ws As Worksheet: Set ws = ActiveSheet

    Dim lRow As Long, sndHyphen As Long, R As Long



    lRow = ws.Cells(1, 1).End(xlDown).Row



    For R = 1 To lRow                                                               'Iterate through all rows containing this data

        sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-")      'Get the hyphens positions

        ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2)                        'Get the data before the second hyphen

        ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2)                         'Get the data after the second hyphen

    Next R



End Sub

answered Nov 24 '18 at 8:47

DarXyde

24026

Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

– Kuba Wronecki
Nov 25 '18 at 10:39

add a comment |

I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.

However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.

If is not just an one off processing, you can always use VBA as well, something like:

Sub TextToColumns()

    Dim ws As Worksheet: Set ws = ActiveSheet

    Dim lRow As Long, sndHyphen As Long, R As Long



    lRow = ws.Cells(1, 1).End(xlDown).Row



    For R = 1 To lRow                                                               'Iterate through all rows containing this data

        sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-")      'Get the hyphens positions

        ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2)                        'Get the data before the second hyphen

        ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2)                         'Get the data after the second hyphen

    Next R



End Sub

answered Nov 24 '18 at 8:47

DarXyde

24026

I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.

However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.

If is not just an one off processing, you can always use VBA as well, something like:

Sub TextToColumns()

    Dim ws As Worksheet: Set ws = ActiveSheet

    Dim lRow As Long, sndHyphen As Long, R As Long



    lRow = ws.Cells(1, 1).End(xlDown).Row



    For R = 1 To lRow                                                               'Iterate through all rows containing this data

        sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-")      'Get the hyphens positions

        ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2)                        'Get the data before the second hyphen

        ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2)                         'Get the data after the second hyphen

    Next R



End Sub

answered Nov 24 '18 at 8:47

DarXyde

24026

answered Nov 24 '18 at 8:47

DarXyde

24026

answered Nov 24 '18 at 8:47

DarXyde

24026

answered Nov 24 '18 at 8:47

DarXyde

24026

Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

– Kuba Wronecki
Nov 25 '18 at 10:39

add a comment |

Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

– Kuba Wronecki
Nov 25 '18 at 10:39

Manual concatenation would be possible, but a long process. Also I wanted to understand more regex, since I'm going to need to extract much more data from these strings later... :) I managed to do this using different sub, see my other solution.

– Kuba Wronecki
Nov 25 '18 at 10:39

add a comment |

Thank you for your input. I finally managed to do this using two subs:

Sub locfinder()



Dim myregexp As RegExp

Set myregexp = New RegExp

Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp1, rozp2 As String



For i = 1 To endrow

str = Sheets("Dane").Cells(i, 10).Value

myregexp.Global = True

myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"



If Not str = "" Then

Set myMatches = myregexp.Execute(str)

 j = 1

 For Each myMatch In myMatches

    If myMatch.Value <> "" Then

    Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))

    j = j + 1

    End If

Next

End If

Next i

End Sub

Then extracted diagnoses using

Sub rozpfinder()

Dim myregexp As RegExp

Set myregexp = New RegExp



Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp, loc As Collection

Dim splitted() As String

Dim rozpoznanie, lokalizacja

Dim wb As Workbook

Dim ws As Worksheet

Set wb = ThisWorkbook

Set ws = wb.Worksheets("Dane")



For i = 1 To endrow

    str = ws.Cells(i, 10).Value

    Set loc = New Collection

    Set rozp = New Collection



    For j = 1 To 2

        If ws.Cells(i, 10 + j) <> "" Then

            loc.Add ws.Cells(i, 10 + j).Value

        End If

    Next j

    For Each lokalizacja In loc

        If lokalizacja <> "I" Then

        str = Replace(str, lokalizacja, "xxx")

        Else

        lokalizacja = "I-"

        str = Replace(str, lokalizacja, "xxx-")

        End If

    Next lokalizacja

    splitted = split(str, "xxx")

    For j = 0 To UBound(splitted)

        If splitted(j) <> "" Then

        myregexp.Pattern = "-[^w]"

        myMatch = myregexp.Replace(splitted(j), "")

        rozp.Add (Trim(myMatch))

        End If

    Next j

    j = 1

    For Each rozpoznanie In rozp

        ws.Cells(i, 12 + j).Value = rozpoznanie

        j = j + 1

    Next rozpoznanie

Next i

End Sub

While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)

answered Nov 25 '18 at 10:46

Kuba Wronecki

add a comment |

Thank you for your input. I finally managed to do this using two subs:

Sub locfinder()



Dim myregexp As RegExp

Set myregexp = New RegExp

Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp1, rozp2 As String



For i = 1 To endrow

str = Sheets("Dane").Cells(i, 10).Value

myregexp.Global = True

myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"



If Not str = "" Then

Set myMatches = myregexp.Execute(str)

 j = 1

 For Each myMatch In myMatches

    If myMatch.Value <> "" Then

    Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))

    j = j + 1

    End If

Next

End If

Next i

End Sub

Then extracted diagnoses using

Sub rozpfinder()

Dim myregexp As RegExp

Set myregexp = New RegExp



Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp, loc As Collection

Dim splitted() As String

Dim rozpoznanie, lokalizacja

Dim wb As Workbook

Dim ws As Worksheet

Set wb = ThisWorkbook

Set ws = wb.Worksheets("Dane")



For i = 1 To endrow

    str = ws.Cells(i, 10).Value

    Set loc = New Collection

    Set rozp = New Collection



    For j = 1 To 2

        If ws.Cells(i, 10 + j) <> "" Then

            loc.Add ws.Cells(i, 10 + j).Value

        End If

    Next j

    For Each lokalizacja In loc

        If lokalizacja <> "I" Then

        str = Replace(str, lokalizacja, "xxx")

        Else

        lokalizacja = "I-"

        str = Replace(str, lokalizacja, "xxx-")

        End If

    Next lokalizacja

    splitted = split(str, "xxx")

    For j = 0 To UBound(splitted)

        If splitted(j) <> "" Then

        myregexp.Pattern = "-[^w]"

        myMatch = myregexp.Replace(splitted(j), "")

        rozp.Add (Trim(myMatch))

        End If

    Next j

    j = 1

    For Each rozpoznanie In rozp

        ws.Cells(i, 12 + j).Value = rozpoznanie

        j = j + 1

    Next rozpoznanie

Next i

End Sub

While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)

answered Nov 25 '18 at 10:46

Kuba Wronecki

add a comment |

Thank you for your input. I finally managed to do this using two subs:

Sub locfinder()



Dim myregexp As RegExp

Set myregexp = New RegExp

Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp1, rozp2 As String



For i = 1 To endrow

str = Sheets("Dane").Cells(i, 10).Value

myregexp.Global = True

myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"



If Not str = "" Then

Set myMatches = myregexp.Execute(str)

 j = 1

 For Each myMatch In myMatches

    If myMatch.Value <> "" Then

    Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))

    j = j + 1

    End If

Next

End If

Next i

End Sub

Then extracted diagnoses using

Sub rozpfinder()

Dim myregexp As RegExp

Set myregexp = New RegExp



Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp, loc As Collection

Dim splitted() As String

Dim rozpoznanie, lokalizacja

Dim wb As Workbook

Dim ws As Worksheet

Set wb = ThisWorkbook

Set ws = wb.Worksheets("Dane")



For i = 1 To endrow

    str = ws.Cells(i, 10).Value

    Set loc = New Collection

    Set rozp = New Collection



    For j = 1 To 2

        If ws.Cells(i, 10 + j) <> "" Then

            loc.Add ws.Cells(i, 10 + j).Value

        End If

    Next j

    For Each lokalizacja In loc

        If lokalizacja <> "I" Then

        str = Replace(str, lokalizacja, "xxx")

        Else

        lokalizacja = "I-"

        str = Replace(str, lokalizacja, "xxx-")

        End If

    Next lokalizacja

    splitted = split(str, "xxx")

    For j = 0 To UBound(splitted)

        If splitted(j) <> "" Then

        myregexp.Pattern = "-[^w]"

        myMatch = myregexp.Replace(splitted(j), "")

        rozp.Add (Trim(myMatch))

        End If

    Next j

    j = 1

    For Each rozpoznanie In rozp

        ws.Cells(i, 12 + j).Value = rozpoznanie

        j = j + 1

    Next rozpoznanie

Next i

End Sub

While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)

answered Nov 25 '18 at 10:46

Kuba Wronecki

Thank you for your input. I finally managed to do this using two subs:

Sub locfinder()



Dim myregexp As RegExp

Set myregexp = New RegExp

Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp1, rozp2 As String



For i = 1 To endrow

str = Sheets("Dane").Cells(i, 10).Value

myregexp.Global = True

myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[s,+-0-9]*|Trzon|Antrum)s?-"



If Not str = "" Then

Set myMatches = myregexp.Execute(str)

 j = 1

 For Each myMatch In myMatches

    If myMatch.Value <> "" Then

    Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))

    j = j + 1

    End If

Next

End If

Next i

End Sub

Then extracted diagnoses using

Sub rozpfinder()

Dim myregexp As RegExp

Set myregexp = New RegExp



Dim myMatches As Variant

Dim myMatch As Variant

Dim str As String

Dim i, j As Integer

Dim endrow As Integer

Sheets("dane").Activate

endrow = LastRow

Dim rozp, loc As Collection

Dim splitted() As String

Dim rozpoznanie, lokalizacja

Dim wb As Workbook

Dim ws As Worksheet

Set wb = ThisWorkbook

Set ws = wb.Worksheets("Dane")



For i = 1 To endrow

    str = ws.Cells(i, 10).Value

    Set loc = New Collection

    Set rozp = New Collection



    For j = 1 To 2

        If ws.Cells(i, 10 + j) <> "" Then

            loc.Add ws.Cells(i, 10 + j).Value

        End If

    Next j

    For Each lokalizacja In loc

        If lokalizacja <> "I" Then

        str = Replace(str, lokalizacja, "xxx")

        Else

        lokalizacja = "I-"

        str = Replace(str, lokalizacja, "xxx-")

        End If

    Next lokalizacja

    splitted = split(str, "xxx")

    For j = 0 To UBound(splitted)

        If splitted(j) <> "" Then

        myregexp.Pattern = "-[^w]"

        myMatch = myregexp.Replace(splitted(j), "")

        rozp.Add (Trim(myMatch))

        End If

    Next j

    j = 1

    For Each rozpoznanie In rozp

        ws.Cells(i, 12 + j).Value = rozpoznanie

        j = j + 1

    Next rozpoznanie

Next i

End Sub

While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)

answered Nov 25 '18 at 10:46

Kuba Wronecki

answered Nov 25 '18 at 10:46

Kuba Wronecki

answered Nov 25 '18 at 10:46

Kuba Wronecki

answered Nov 25 '18 at 10:46

Kuba Wronecki

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk