PHP function to convert a Portuguese word from plural to singular

I know, this sounds really difficult, but it is really easy.

I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.

The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):

If the word ends in a vowel, remove the s at the end

Words ending in ões, ães and ãos should end with ão

Words ending in is, remove the is and add l to the end
Special case: accents should be removed, if needed. The only cases I saw were anéis and pastéis, which have to be anel and papel.

Words ending in ns get it replaced with m

Words ending with [rsz]es should lose the es
Special case: words ending in eses need the first e replaced with ê, like in meses => mês

Some words are always used in the plural, like óculos, parabéns and férias.

Below, here's the code:

function plural_to_singular($string)

{

    if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))

    {

        return $string;

    }



    $regexes = array(

        '[õã]es' => 'ão',

        '[áó].*eis' => 'el',

        '[eé]is' => 'el',

        '([^eé])is' => '$1l',

        'ns' => 'm',

        'eses' => 'ês',

        '([rzs])es' => '$1',

        's' => ''

    );



    foreach($regexes as $fragment => $replace)

    {

        $regex = '/' . $fragment . '$/ui';

        if(preg_match($regex, $string))

        {

            return preg_replace($regex, $replace, $string);

        }

    }



    return $string;

}

You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases

In your opinion, what can I improve?

Is there any obvious butchering or performance killer?

edited Dec 15 '16 at 18:44

Mike Brant

8,813622

asked Dec 15 '16 at 14:47

Ismael Miguel

4,30111453

add a comment |

I know, this sounds really difficult, but it is really easy.

I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.

The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):

If the word ends in a vowel, remove the s at the end

Words ending in ões, ães and ãos should end with ão

Words ending in is, remove the is and add l to the end
Special case: accents should be removed, if needed. The only cases I saw were anéis and pastéis, which have to be anel and papel.

Words ending in ns get it replaced with m

Words ending with [rsz]es should lose the es
Special case: words ending in eses need the first e replaced with ê, like in meses => mês

Some words are always used in the plural, like óculos, parabéns and férias.

Below, here's the code:

function plural_to_singular($string)

{

    if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))

    {

        return $string;

    }



    $regexes = array(

        '[õã]es' => 'ão',

        '[áó].*eis' => 'el',

        '[eé]is' => 'el',

        '([^eé])is' => '$1l',

        'ns' => 'm',

        'eses' => 'ês',

        '([rzs])es' => '$1',

        's' => ''

    );



    foreach($regexes as $fragment => $replace)

    {

        $regex = '/' . $fragment . '$/ui';

        if(preg_match($regex, $string))

        {

            return preg_replace($regex, $replace, $string);

        }

    }



    return $string;

}

You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases

In your opinion, what can I improve?

Is there any obvious butchering or performance killer?

edited Dec 15 '16 at 18:44

Mike Brant

8,813622

asked Dec 15 '16 at 14:47

Ismael Miguel

4,30111453

add a comment |

I know, this sounds really difficult, but it is really easy.

I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.

The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):

If the word ends in a vowel, remove the s at the end

Words ending in ões, ães and ãos should end with ão

Words ending in is, remove the is and add l to the end
Special case: accents should be removed, if needed. The only cases I saw were anéis and pastéis, which have to be anel and papel.

Words ending in ns get it replaced with m

Words ending with [rsz]es should lose the es
Special case: words ending in eses need the first e replaced with ê, like in meses => mês

Some words are always used in the plural, like óculos, parabéns and férias.

Below, here's the code:

function plural_to_singular($string)

{

    if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))

    {

        return $string;

    }



    $regexes = array(

        '[õã]es' => 'ão',

        '[áó].*eis' => 'el',

        '[eé]is' => 'el',

        '([^eé])is' => '$1l',

        'ns' => 'm',

        'eses' => 'ês',

        '([rzs])es' => '$1',

        's' => ''

    );



    foreach($regexes as $fragment => $replace)

    {

        $regex = '/' . $fragment . '$/ui';

        if(preg_match($regex, $string))

        {

            return preg_replace($regex, $replace, $string);

        }

    }



    return $string;

}

You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases

In your opinion, what can I improve?

Is there any obvious butchering or performance killer?

edited Dec 15 '16 at 18:44

Mike Brant

8,813622

asked Dec 15 '16 at 14:47

Ismael Miguel

4,30111453

I know, this sounds really difficult, but it is really easy.

I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.

The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):

If the word ends in a vowel, remove the s at the end

Words ending in ões, ães and ãos should end with ão

Words ending in is, remove the is and add l to the end
Special case: accents should be removed, if needed. The only cases I saw were anéis and pastéis, which have to be anel and papel.

Words ending in ns get it replaced with m

Words ending with [rsz]es should lose the es
Special case: words ending in eses need the first e replaced with ê, like in meses => mês

Some words are always used in the plural, like óculos, parabéns and férias.

Below, here's the code:

function plural_to_singular($string)

{

    if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))

    {

        return $string;

    }



    $regexes = array(

        '[õã]es' => 'ão',

        '[áó].*eis' => 'el',

        '[eé]is' => 'el',

        '([^eé])is' => '$1l',

        'ns' => 'm',

        'eses' => 'ês',

        '([rzs])es' => '$1',

        's' => ''

    );



    foreach($regexes as $fragment => $replace)

    {

        $regex = '/' . $fragment . '$/ui';

        if(preg_match($regex, $string))

        {

            return preg_replace($regex, $replace, $string);

        }

    }



    return $string;

}

You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases

In your opinion, what can I improve?

Is there any obvious butchering or performance killer?

php strings regex i18n

edited Dec 15 '16 at 18:44

Mike Brant

8,813622

asked Dec 15 '16 at 14:47

Ismael Miguel

4,30111453

edited Dec 15 '16 at 18:44

Mike Brant

8,813622

asked Dec 15 '16 at 14:47

Ismael Miguel

4,30111453

edited Dec 15 '16 at 18:44

Mike Brant

8,813622

edited Dec 15 '16 at 18:44

Mike Brant

8,813622

edited Dec 15 '16 at 18:44

Mike Brant

8,813622

asked Dec 15 '16 at 14:47

Ismael Miguel

4,30111453

asked Dec 15 '16 at 14:47

Ismael Miguel

4,30111453

asked Dec 15 '16 at 14:47

Ismael Miguel

4,30111453

add a comment |

2 Answers
2

active

oldest

votes

Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.

There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.

So you could easily do something like:

preg_replace($pattern_array, $replacement_array, $string);

I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:

$regex_config = array(

     'ão' => '/[õã]es$/iu',

    ...

);

$pattern_array = array_values($regex_config);

$replacement_array = array_keys($regex_config);

$result = preg_replace($pattern_array, $replacement_array, $string, 1);

You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?

Should your function name indicate that the function is only applicable to Portugeuse?

edited Dec 17 '16 at 14:14

answered Dec 15 '16 at 18:39

Mike Brant

8,813622

$begingroup$
Won't your suggestion break for meses, which would return mê (since the s at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50

$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31

$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43

$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20

$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26

add a comment |

Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.

$regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.

For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''

I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).

Code: (Demo)

function is_allcaps($string)

{

    $last_letter = mb_substr($string, -1, 1, 'UTF-8');

    return $last_letter === mb_strtoupper($last_letter, 'UTF-8');

    // otherwise use cytpe_upper() and setlocale()

}



function plural_to_singular($string)

{

    // quick return of "untouchables"

    if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))

    {

        return $string;

    }



    $regex_map = [

        '~[õã]es$~iu' => 'ão',

        '~(?:[áó].*e|[eé])is$~iu' => 'el',

        '~[^eé]Kis$~iu' => 'l',

        '~ns$~iu' => 'm',

        '~eses$~iu' => 'ês',

        '~(?:[rzs]Ke)?s$~iu' => ''

    ];



    foreach ($regex_map as $pattern => $replacement)

    {

        $singular = preg_replace($pattern, $replacement, $string, 1, $count);

        if ($count)

        {

            return is_allcaps($string) ? mb_strtoupper($singular) : $singular;



        }

    }

    return $string;

}



$words = array(

    'óculos' => 'óculos',

    'papéis' => 'papel',

    'anéis' => 'anel',

    'PASTEIS' => 'PASTEL',

    'CAMIÕES' => 'CAMIÃO',

    'rodas' => 'roda',

    'cães' => 'cão',

    'meses' => 'mês',

    'vezes' => 'vez',

    'luzes' => 'luz',

    'cristais' => 'cristal',

    'canções' => 'canção',

    'nuvens' => 'nuvem',

    'alemães' => 'alemão'

);



foreach($words as $plural => $singular)

{

    echo "$plural => $singular = " , plural_to_singular($plural) , "n";

}

answered 11 mins ago

mickmackusa

1,159213

$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
9 mins ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f149991%2fphp-function-to-convert-a-portuguese-word-from-plural-to-singular%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.

So you could easily do something like:

preg_replace($pattern_array, $replacement_array, $string);

I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:

$regex_config = array(

     'ão' => '/[õã]es$/iu',

    ...

);

$pattern_array = array_values($regex_config);

$replacement_array = array_keys($regex_config);

$result = preg_replace($pattern_array, $replacement_array, $string, 1);

Should your function name indicate that the function is only applicable to Portugeuse?

edited Dec 17 '16 at 14:14

answered Dec 15 '16 at 18:39

Mike Brant

8,813622

$begingroup$
Won't your suggestion break for meses, which would return mê (since the s at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50

$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31

$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43

$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20

$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26

add a comment |

There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.

So you could easily do something like:

preg_replace($pattern_array, $replacement_array, $string);

I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:

$regex_config = array(

     'ão' => '/[õã]es$/iu',

    ...

);

$pattern_array = array_values($regex_config);

$replacement_array = array_keys($regex_config);

$result = preg_replace($pattern_array, $replacement_array, $string, 1);

Should your function name indicate that the function is only applicable to Portugeuse?

edited Dec 17 '16 at 14:14

answered Dec 15 '16 at 18:39

Mike Brant

8,813622

$begingroup$
Won't your suggestion break for meses, which would return mê (since the s at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50

$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31

$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43

$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20

$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26

add a comment |

There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.

So you could easily do something like:

preg_replace($pattern_array, $replacement_array, $string);

I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:

$regex_config = array(

     'ão' => '/[õã]es$/iu',

    ...

);

$pattern_array = array_values($regex_config);

$replacement_array = array_keys($regex_config);

$result = preg_replace($pattern_array, $replacement_array, $string, 1);

Should your function name indicate that the function is only applicable to Portugeuse?

edited Dec 17 '16 at 14:14

answered Dec 15 '16 at 18:39

Mike Brant

8,813622

There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.

So you could easily do something like:

preg_replace($pattern_array, $replacement_array, $string);

I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:

$regex_config = array(

     'ão' => '/[õã]es$/iu',

    ...

);

$pattern_array = array_values($regex_config);

$replacement_array = array_keys($regex_config);

$result = preg_replace($pattern_array, $replacement_array, $string, 1);

Should your function name indicate that the function is only applicable to Portugeuse?

edited Dec 17 '16 at 14:14

answered Dec 15 '16 at 18:39

Mike Brant

8,813622

edited Dec 17 '16 at 14:14

answered Dec 15 '16 at 18:39

Mike Brant

8,813622

answered Dec 15 '16 at 18:39

Mike Brant

8,813622

answered Dec 15 '16 at 18:39

Mike Brant

8,813622

$begingroup$
Won't your suggestion break for meses, which would return mê (since the s at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50

$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31

$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43

$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20

$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26

add a comment |

$begingroup$
Won't your suggestion break for meses, which would return mê (since the s at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50

$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31

$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43

$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20

$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26

Won't your suggestion break for meses, which would return mê (since the s at the end is removed, as the last case, and is a required step)?

– Ismael Miguel
Dec 15 '16 at 19:50

@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.

– Mike Brant
Dec 16 '16 at 16:31

In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?

– Ismael Miguel
Dec 16 '16 at 20:43

@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.

– Mike Brant
Dec 17 '16 at 14:20

Oh, yeah, the magical parameter that I always forget about! That's a great idea!

– Ismael Miguel
Dec 17 '16 at 17:26

add a comment |

Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.

$regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.

For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''

I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).

Code: (Demo)

function is_allcaps($string)

{

    $last_letter = mb_substr($string, -1, 1, 'UTF-8');

    return $last_letter === mb_strtoupper($last_letter, 'UTF-8');

    // otherwise use cytpe_upper() and setlocale()

}



function plural_to_singular($string)

{

    // quick return of "untouchables"

    if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))

    {

        return $string;

    }



    $regex_map = [

        '~[õã]es$~iu' => 'ão',

        '~(?:[áó].*e|[eé])is$~iu' => 'el',

        '~[^eé]Kis$~iu' => 'l',

        '~ns$~iu' => 'm',

        '~eses$~iu' => 'ês',

        '~(?:[rzs]Ke)?s$~iu' => ''

    ];



    foreach ($regex_map as $pattern => $replacement)

    {

        $singular = preg_replace($pattern, $replacement, $string, 1, $count);

        if ($count)

        {

            return is_allcaps($string) ? mb_strtoupper($singular) : $singular;



        }

    }

    return $string;

}



$words = array(

    'óculos' => 'óculos',

    'papéis' => 'papel',

    'anéis' => 'anel',

    'PASTEIS' => 'PASTEL',

    'CAMIÕES' => 'CAMIÃO',

    'rodas' => 'roda',

    'cães' => 'cão',

    'meses' => 'mês',

    'vezes' => 'vez',

    'luzes' => 'luz',

    'cristais' => 'cristal',

    'canções' => 'canção',

    'nuvens' => 'nuvem',

    'alemães' => 'alemão'

);



foreach($words as $plural => $singular)

{

    echo "$plural => $singular = " , plural_to_singular($plural) , "n";

}

answered 11 mins ago

mickmackusa

1,159213

$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
9 mins ago

add a comment |

Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.

$regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.

For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''

I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).

Code: (Demo)

function is_allcaps($string)

{

    $last_letter = mb_substr($string, -1, 1, 'UTF-8');

    return $last_letter === mb_strtoupper($last_letter, 'UTF-8');

    // otherwise use cytpe_upper() and setlocale()

}



function plural_to_singular($string)

{

    // quick return of "untouchables"

    if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))

    {

        return $string;

    }



    $regex_map = [

        '~[õã]es$~iu' => 'ão',

        '~(?:[áó].*e|[eé])is$~iu' => 'el',

        '~[^eé]Kis$~iu' => 'l',

        '~ns$~iu' => 'm',

        '~eses$~iu' => 'ês',

        '~(?:[rzs]Ke)?s$~iu' => ''

    ];



    foreach ($regex_map as $pattern => $replacement)

    {

        $singular = preg_replace($pattern, $replacement, $string, 1, $count);

        if ($count)

        {

            return is_allcaps($string) ? mb_strtoupper($singular) : $singular;



        }

    }

    return $string;

}



$words = array(

    'óculos' => 'óculos',

    'papéis' => 'papel',

    'anéis' => 'anel',

    'PASTEIS' => 'PASTEL',

    'CAMIÕES' => 'CAMIÃO',

    'rodas' => 'roda',

    'cães' => 'cão',

    'meses' => 'mês',

    'vezes' => 'vez',

    'luzes' => 'luz',

    'cristais' => 'cristal',

    'canções' => 'canção',

    'nuvens' => 'nuvem',

    'alemães' => 'alemão'

);



foreach($words as $plural => $singular)

{

    echo "$plural => $singular = " , plural_to_singular($plural) , "n";

}

answered 11 mins ago

mickmackusa

1,159213

$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
9 mins ago

add a comment |

Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.

$regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.

For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''

I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).

Code: (Demo)

function is_allcaps($string)

{

    $last_letter = mb_substr($string, -1, 1, 'UTF-8');

    return $last_letter === mb_strtoupper($last_letter, 'UTF-8');

    // otherwise use cytpe_upper() and setlocale()

}



function plural_to_singular($string)

{

    // quick return of "untouchables"

    if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))

    {

        return $string;

    }



    $regex_map = [

        '~[õã]es$~iu' => 'ão',

        '~(?:[áó].*e|[eé])is$~iu' => 'el',

        '~[^eé]Kis$~iu' => 'l',

        '~ns$~iu' => 'm',

        '~eses$~iu' => 'ês',

        '~(?:[rzs]Ke)?s$~iu' => ''

    ];



    foreach ($regex_map as $pattern => $replacement)

    {

        $singular = preg_replace($pattern, $replacement, $string, 1, $count);

        if ($count)

        {

            return is_allcaps($string) ? mb_strtoupper($singular) : $singular;



        }

    }

    return $string;

}



$words = array(

    'óculos' => 'óculos',

    'papéis' => 'papel',

    'anéis' => 'anel',

    'PASTEIS' => 'PASTEL',

    'CAMIÕES' => 'CAMIÃO',

    'rodas' => 'roda',

    'cães' => 'cão',

    'meses' => 'mês',

    'vezes' => 'vez',

    'luzes' => 'luz',

    'cristais' => 'cristal',

    'canções' => 'canção',

    'nuvens' => 'nuvem',

    'alemães' => 'alemão'

);



foreach($words as $plural => $singular)

{

    echo "$plural => $singular = " , plural_to_singular($plural) , "n";

}

answered 11 mins ago

mickmackusa

1,159213

Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.

$regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.

For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''

I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).

Code: (Demo)

function is_allcaps($string)

{

    $last_letter = mb_substr($string, -1, 1, 'UTF-8');

    return $last_letter === mb_strtoupper($last_letter, 'UTF-8');

    // otherwise use cytpe_upper() and setlocale()

}



function plural_to_singular($string)

{

    // quick return of "untouchables"

    if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))

    {

        return $string;

    }



    $regex_map = [

        '~[õã]es$~iu' => 'ão',

        '~(?:[áó].*e|[eé])is$~iu' => 'el',

        '~[^eé]Kis$~iu' => 'l',

        '~ns$~iu' => 'm',

        '~eses$~iu' => 'ês',

        '~(?:[rzs]Ke)?s$~iu' => ''

    ];



    foreach ($regex_map as $pattern => $replacement)

    {

        $singular = preg_replace($pattern, $replacement, $string, 1, $count);

        if ($count)

        {

            return is_allcaps($string) ? mb_strtoupper($singular) : $singular;



        }

    }

    return $string;

}



$words = array(

    'óculos' => 'óculos',

    'papéis' => 'papel',

    'anéis' => 'anel',

    'PASTEIS' => 'PASTEL',

    'CAMIÕES' => 'CAMIÃO',

    'rodas' => 'roda',

    'cães' => 'cão',

    'meses' => 'mês',

    'vezes' => 'vez',

    'luzes' => 'luz',

    'cristais' => 'cristal',

    'canções' => 'canção',

    'nuvens' => 'nuvem',

    'alemães' => 'alemão'

);



foreach($words as $plural => $singular)

{

    echo "$plural => $singular = " , plural_to_singular($plural) , "n";

}

answered 11 mins ago

mickmackusa

1,159213

answered 11 mins ago

mickmackusa

1,159213

answered 11 mins ago

mickmackusa

1,159213

answered 11 mins ago

mickmackusa

1,159213

$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
9 mins ago

add a comment |

$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
9 mins ago

@MikeBrant ping

– mickmackusa
9 mins ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk