JS RegEx for matching a complete URL [duplicate]












-1
















This question already has an answer here:




  • Extracting for URL from string using regex

    3 answers




I'm trying to match a URL in a string of text and I'm using this regex to search for a URL :



/b(https?://.*?.[a-z]{2,4}b)/g


The problem is, it only ever matches the protocol and domain, and nothing else that follows.



Example :



let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';

console.log(str.match(regEx));


Returns :



https://website.com


How would I alter the regex so it will return the full URL?



https://website.com/sH6Sd2x


Working Demo :




let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';
console.log(str.match(regEx));












share|improve this question















marked as duplicate by Wiktor Stribiżew javascript
Users with the  javascript badge can single-handedly close javascript questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 21:09


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • Your regexp ends with .{a-z]{2,4}b, so that will only match the top-level domain part of the URL.

    – Barmar
    Nov 25 '18 at 21:05











  • @Barmar, yes thanks, I'm aware of that. My question was how to alter the regex to include the rest?

    – spice
    Nov 25 '18 at 21:07






  • 1





    A usual URL extraction pattern assumes there are no whitespaces after protocol. Try just /bhttps?://S+b/g, see demo

    – Wiktor Stribiżew
    Nov 25 '18 at 21:07













  • @WiktorStribiżew yep that's it, thank you very much :)

    – spice
    Nov 25 '18 at 21:08
















-1
















This question already has an answer here:




  • Extracting for URL from string using regex

    3 answers




I'm trying to match a URL in a string of text and I'm using this regex to search for a URL :



/b(https?://.*?.[a-z]{2,4}b)/g


The problem is, it only ever matches the protocol and domain, and nothing else that follows.



Example :



let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';

console.log(str.match(regEx));


Returns :



https://website.com


How would I alter the regex so it will return the full URL?



https://website.com/sH6Sd2x


Working Demo :




let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';
console.log(str.match(regEx));












share|improve this question















marked as duplicate by Wiktor Stribiżew javascript
Users with the  javascript badge can single-handedly close javascript questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 21:09


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • Your regexp ends with .{a-z]{2,4}b, so that will only match the top-level domain part of the URL.

    – Barmar
    Nov 25 '18 at 21:05











  • @Barmar, yes thanks, I'm aware of that. My question was how to alter the regex to include the rest?

    – spice
    Nov 25 '18 at 21:07






  • 1





    A usual URL extraction pattern assumes there are no whitespaces after protocol. Try just /bhttps?://S+b/g, see demo

    – Wiktor Stribiżew
    Nov 25 '18 at 21:07













  • @WiktorStribiżew yep that's it, thank you very much :)

    – spice
    Nov 25 '18 at 21:08














-1












-1








-1


1







This question already has an answer here:




  • Extracting for URL from string using regex

    3 answers




I'm trying to match a URL in a string of text and I'm using this regex to search for a URL :



/b(https?://.*?.[a-z]{2,4}b)/g


The problem is, it only ever matches the protocol and domain, and nothing else that follows.



Example :



let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';

console.log(str.match(regEx));


Returns :



https://website.com


How would I alter the regex so it will return the full URL?



https://website.com/sH6Sd2x


Working Demo :




let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';
console.log(str.match(regEx));












share|improve this question

















This question already has an answer here:




  • Extracting for URL from string using regex

    3 answers




I'm trying to match a URL in a string of text and I'm using this regex to search for a URL :



/b(https?://.*?.[a-z]{2,4}b)/g


The problem is, it only ever matches the protocol and domain, and nothing else that follows.



Example :



let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';

console.log(str.match(regEx));


Returns :



https://website.com


How would I alter the regex so it will return the full URL?



https://website.com/sH6Sd2x


Working Demo :




let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';
console.log(str.match(regEx));







This question already has an answer here:




  • Extracting for URL from string using regex

    3 answers







let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';
console.log(str.match(regEx));





let regEx = /b(https?://.*?.[a-z]{2,4}b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';
console.log(str.match(regEx));






javascript regex match






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 '18 at 21:50







spice

















asked Nov 25 '18 at 21:01









spicespice

450210




450210




marked as duplicate by Wiktor Stribiżew javascript
Users with the  javascript badge can single-handedly close javascript questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 21:09


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by Wiktor Stribiżew javascript
Users with the  javascript badge can single-handedly close javascript questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 25 '18 at 21:09


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • Your regexp ends with .{a-z]{2,4}b, so that will only match the top-level domain part of the URL.

    – Barmar
    Nov 25 '18 at 21:05











  • @Barmar, yes thanks, I'm aware of that. My question was how to alter the regex to include the rest?

    – spice
    Nov 25 '18 at 21:07






  • 1





    A usual URL extraction pattern assumes there are no whitespaces after protocol. Try just /bhttps?://S+b/g, see demo

    – Wiktor Stribiżew
    Nov 25 '18 at 21:07













  • @WiktorStribiżew yep that's it, thank you very much :)

    – spice
    Nov 25 '18 at 21:08



















  • Your regexp ends with .{a-z]{2,4}b, so that will only match the top-level domain part of the URL.

    – Barmar
    Nov 25 '18 at 21:05











  • @Barmar, yes thanks, I'm aware of that. My question was how to alter the regex to include the rest?

    – spice
    Nov 25 '18 at 21:07






  • 1





    A usual URL extraction pattern assumes there are no whitespaces after protocol. Try just /bhttps?://S+b/g, see demo

    – Wiktor Stribiżew
    Nov 25 '18 at 21:07













  • @WiktorStribiżew yep that's it, thank you very much :)

    – spice
    Nov 25 '18 at 21:08

















Your regexp ends with .{a-z]{2,4}b, so that will only match the top-level domain part of the URL.

– Barmar
Nov 25 '18 at 21:05





Your regexp ends with .{a-z]{2,4}b, so that will only match the top-level domain part of the URL.

– Barmar
Nov 25 '18 at 21:05













@Barmar, yes thanks, I'm aware of that. My question was how to alter the regex to include the rest?

– spice
Nov 25 '18 at 21:07





@Barmar, yes thanks, I'm aware of that. My question was how to alter the regex to include the rest?

– spice
Nov 25 '18 at 21:07




1




1





A usual URL extraction pattern assumes there are no whitespaces after protocol. Try just /bhttps?://S+b/g, see demo

– Wiktor Stribiżew
Nov 25 '18 at 21:07







A usual URL extraction pattern assumes there are no whitespaces after protocol. Try just /bhttps?://S+b/g, see demo

– Wiktor Stribiżew
Nov 25 '18 at 21:07















@WiktorStribiżew yep that's it, thank you very much :)

– spice
Nov 25 '18 at 21:08





@WiktorStribiżew yep that's it, thank you very much :)

– spice
Nov 25 '18 at 21:08












2 Answers
2






active

oldest

votes


















0














The reason it stops there is that your expression ends with .[a-z]{2,4} which I guess is intended to match the top level domain (.com, .net, uk etc). After that it stops matching.



The solution: add /[^s]* to the expression. This matches a further slash and zero or more non-whitespace characters.



Note that S (with capital S) is equivalent to [^s] (with lowercase s), so use what you like best.



Demo:






let regEx = /b(https?://.*?.[a-z]{2,4}/[^s]*b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';

console.log(str.match(regEx));





You might even shorten it further if you realize that URLs never contain whitespace, and matching the domain explicitly is not needed, or worse it may even cause trouble (e.g. .museum is also a valid TLD, but you exclude it).



Enhanced version (shorter regex and more accurate):






let regEx = /b(https?://S*b)/g;
let str = 'some text https://website.com/sH6Sd2x some more text';

console.log(str.match(regEx));








share|improve this answer


























  • Yep this is exatly what I was looking for. Thank you so much @Peter!

    – spice
    Nov 25 '18 at 21:10



















-1














Since the regexp ends with .[a-z]{2,4}b, it only matches up to the top-level domain part of the hostname in the URL. You need to match the rest of the URL after that. This matches any non-whitespace characters after that:



let regEx = /bhttps?://.*?.[a-z]{2,4}bS*/g;


See Detect URLs in text with JavaScript for more complete solutions to matching URLs.






share|improve this answer






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The reason it stops there is that your expression ends with .[a-z]{2,4} which I guess is intended to match the top level domain (.com, .net, uk etc). After that it stops matching.



    The solution: add /[^s]* to the expression. This matches a further slash and zero or more non-whitespace characters.



    Note that S (with capital S) is equivalent to [^s] (with lowercase s), so use what you like best.



    Demo:






    let regEx = /b(https?://.*?.[a-z]{2,4}/[^s]*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));





    You might even shorten it further if you realize that URLs never contain whitespace, and matching the domain explicitly is not needed, or worse it may even cause trouble (e.g. .museum is also a valid TLD, but you exclude it).



    Enhanced version (shorter regex and more accurate):






    let regEx = /b(https?://S*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));








    share|improve this answer


























    • Yep this is exatly what I was looking for. Thank you so much @Peter!

      – spice
      Nov 25 '18 at 21:10
















    0














    The reason it stops there is that your expression ends with .[a-z]{2,4} which I guess is intended to match the top level domain (.com, .net, uk etc). After that it stops matching.



    The solution: add /[^s]* to the expression. This matches a further slash and zero or more non-whitespace characters.



    Note that S (with capital S) is equivalent to [^s] (with lowercase s), so use what you like best.



    Demo:






    let regEx = /b(https?://.*?.[a-z]{2,4}/[^s]*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));





    You might even shorten it further if you realize that URLs never contain whitespace, and matching the domain explicitly is not needed, or worse it may even cause trouble (e.g. .museum is also a valid TLD, but you exclude it).



    Enhanced version (shorter regex and more accurate):






    let regEx = /b(https?://S*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));








    share|improve this answer


























    • Yep this is exatly what I was looking for. Thank you so much @Peter!

      – spice
      Nov 25 '18 at 21:10














    0












    0








    0







    The reason it stops there is that your expression ends with .[a-z]{2,4} which I guess is intended to match the top level domain (.com, .net, uk etc). After that it stops matching.



    The solution: add /[^s]* to the expression. This matches a further slash and zero or more non-whitespace characters.



    Note that S (with capital S) is equivalent to [^s] (with lowercase s), so use what you like best.



    Demo:






    let regEx = /b(https?://.*?.[a-z]{2,4}/[^s]*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));





    You might even shorten it further if you realize that URLs never contain whitespace, and matching the domain explicitly is not needed, or worse it may even cause trouble (e.g. .museum is also a valid TLD, but you exclude it).



    Enhanced version (shorter regex and more accurate):






    let regEx = /b(https?://S*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));








    share|improve this answer















    The reason it stops there is that your expression ends with .[a-z]{2,4} which I guess is intended to match the top level domain (.com, .net, uk etc). After that it stops matching.



    The solution: add /[^s]* to the expression. This matches a further slash and zero or more non-whitespace characters.



    Note that S (with capital S) is equivalent to [^s] (with lowercase s), so use what you like best.



    Demo:






    let regEx = /b(https?://.*?.[a-z]{2,4}/[^s]*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));





    You might even shorten it further if you realize that URLs never contain whitespace, and matching the domain explicitly is not needed, or worse it may even cause trouble (e.g. .museum is also a valid TLD, but you exclude it).



    Enhanced version (shorter regex and more accurate):






    let regEx = /b(https?://S*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));








    let regEx = /b(https?://.*?.[a-z]{2,4}/[^s]*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));





    let regEx = /b(https?://.*?.[a-z]{2,4}/[^s]*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));





    let regEx = /b(https?://S*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));





    let regEx = /b(https?://S*b)/g;
    let str = 'some text https://website.com/sH6Sd2x some more text';

    console.log(str.match(regEx));






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 25 '18 at 21:12

























    answered Nov 25 '18 at 21:08









    Peter BPeter B

    13.3k52045




    13.3k52045













    • Yep this is exatly what I was looking for. Thank you so much @Peter!

      – spice
      Nov 25 '18 at 21:10



















    • Yep this is exatly what I was looking for. Thank you so much @Peter!

      – spice
      Nov 25 '18 at 21:10

















    Yep this is exatly what I was looking for. Thank you so much @Peter!

    – spice
    Nov 25 '18 at 21:10





    Yep this is exatly what I was looking for. Thank you so much @Peter!

    – spice
    Nov 25 '18 at 21:10













    -1














    Since the regexp ends with .[a-z]{2,4}b, it only matches up to the top-level domain part of the hostname in the URL. You need to match the rest of the URL after that. This matches any non-whitespace characters after that:



    let regEx = /bhttps?://.*?.[a-z]{2,4}bS*/g;


    See Detect URLs in text with JavaScript for more complete solutions to matching URLs.






    share|improve this answer




























      -1














      Since the regexp ends with .[a-z]{2,4}b, it only matches up to the top-level domain part of the hostname in the URL. You need to match the rest of the URL after that. This matches any non-whitespace characters after that:



      let regEx = /bhttps?://.*?.[a-z]{2,4}bS*/g;


      See Detect URLs in text with JavaScript for more complete solutions to matching URLs.






      share|improve this answer


























        -1












        -1








        -1







        Since the regexp ends with .[a-z]{2,4}b, it only matches up to the top-level domain part of the hostname in the URL. You need to match the rest of the URL after that. This matches any non-whitespace characters after that:



        let regEx = /bhttps?://.*?.[a-z]{2,4}bS*/g;


        See Detect URLs in text with JavaScript for more complete solutions to matching URLs.






        share|improve this answer













        Since the regexp ends with .[a-z]{2,4}b, it only matches up to the top-level domain part of the hostname in the URL. You need to match the rest of the URL after that. This matches any non-whitespace characters after that:



        let regEx = /bhttps?://.*?.[a-z]{2,4}bS*/g;


        See Detect URLs in text with JavaScript for more complete solutions to matching URLs.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 25 '18 at 21:08









        BarmarBarmar

        429k36253353




        429k36253353















            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            TypeError: fit_transform() missing 1 required positional argument: 'X'