MF, I'll test it a bit. One thing, though--if you don't plan on using any of the matches out of the parenthesized sections, you should put ?: directly after each opening paren. This tells the parser to not store matches for the set of parens making it run more efficiently.
/Some (text)/
/Some (?:text)/
Thanks for the feedback. Here is an updated regex.
preg_match(' /^ # Start at the beginning of the text (?:ftp|http|https):\/\/ # Look for ftp, http, or https (?: # Username:password combinations (optional) [\w\.\-\+]+ # A username :{0,1} # an optional colon to separate the username and password [\w\.\-\+]*@ # A password )? (?:[a-z0-9\-\.]+) # The domain limiting it to just allowed characters (?::[0-9]+)? # Server port number (?: # The path (optional) \/| # a forward slash \/(?:[\w#!:\.\?\+=&%@!\-\/\(\)]+)| # or a forward slash followed by a full path \?(?:[\w#!:\.\?\+=&%@!\-\/\(\)]+) # or a question mark followed by key value pairs )?$ /xi', $url);
Here's a few cents from me...
(?:ftp|http|https)
can be shortened to
(?:ftp|https?)
the same can be done for this
:{0,1} # an optional colon to separate the username and password
A couple of resources I rely heavily on - http://www.regular-expressions.info/tutorial.html and https://addons.mozilla.org/en-US/firefox/addon/2077
Try this out...
preg_match('
/^
(?:ftp|https?):\/\/ #changed here
(?:
[\w\.\-\+]+
:*? #changed here
[\w\.\-\+]*@)?
(?:[a-z0-9\-\.]+)
(?::[0-9]+)?
?:
(\/)+(?:[\w#!:\.\?\+=&%@!\-\/\(\)]+)*| #changed here
\?(?:[\w#!:\.\?\+=&i%@!\-\/\(\)]+))$
/xi', $url);Thanks for the help and feedback. I'll take a look at this in the next couple weeks when I'm not so busy. I've learned that there are more specs to take into account and more characters to allow... http://www.ietf.org/rfc/rfc3986.txt and maybe http://www.whatwg.org/specs/web-apps/current-work/...
It would be so nice if there were one spec to rule them all.
Here is an update to my valid_url regular expression.
preg_match(" /^ # Start at the beginning of the text (?:ftp|https?):\/\/ # Look for ftp, http, or https (?: # Userinfo (optional) (?:[\w\.\-\+%!$&'\(\)*\+,;=]+:)* [\w\.\-\+%!$&'\(\)*\+,;=]+@ )? (?:[a-z0-9\-\.%]+) # The domain (?::[0-9]+)? # Server port number (optional) (?:[\/|\?][\w#!:\.\?\+=&%@!$'~*,;\/\(\)\[\]\-]*)? # The path (optional) $/xi", $url);
This is for http://drupal.org/node/124492 and to fix an issue on this site. Anyone see a way to improve on it. The goal is for it to work to RFC 3986.
Great work on this so far, i've been looking for a URL checker for something i'm currently working on.
I was testing this and most of my issues i had with previous URL checkers were fine.
This fails on:
My understanding is this should be fail? if it doesnt then ok. My URL checker function looks like this:
static function validateURL($url) {
$url = trim($url);
if($url == "") { return false; }
if(!preg_match("!^(?:https?://|ftp://)!", $url)) { $url = "http://" . $url; }
if(preg_match("!.*?//[a-z0-9\-\.%]+\.$!", $url)) { return false; }
if(!preg_match("
/^ # Start at the beginning of the text
(?:ftp|https?):\/\/ # Look for ftp, http, or https
(?: # Userinfo (optional)
(?:[\w\.\-\+%!$&'\(\)*\+,;=]+:)*
[\w\.\-\+%!$&'\(\)*\+,;=]+@
)?
(?:[a-z0-9\-\.%]+) # The domain
(?::[0-9]+)? # Server port number (optional)
(?:[\/|\?][\w#!:\.\?\+=&%@!$'~*,;\/\(\)\[\]\-]*)? # The path (optional)
$/xi", $url)) {
return false;
}
if(!filter_var($url, FILTER_VALIDATE_URL)) { return false; }
return true;
}Granted there are probably alot of issues with what i've got, i'm more than happy to be told them should there be some. The above seems to catch every case ( that i can think of )
Just a note: HTML5 browsers will have URL validators built into them, so this will only be necessary for security double-checks on the server, and may not need to be a perfect.
BTW, according to RFC1738, the "/" is required after the hostname, if there is a query string. It;s weird, but according to my reading of RFC1738
is legal, but
http://www.example.com?value="something"
is not.
Hi,
In need of a er to validate all possible URL's. I want to avoid the wrong URL's, Example: (htp: / / http:/www., Ww.domain ...).
She has to validate access FTP (S) and HTTP (S). In HTTP access can contain querystring, which also should be validated.
Until now I have the following:
var regexp = /^ # casa o início da url
(((f|ht)tp(s)?):\/\/)? # Protocolos ftp, ftps, http e https - opcionais
(www\.)? # www. - opcional
([a-zA-Z0-9\-]{1,}\.){1,}? # subdomínios - opcional
( #
[a-zA-Z0-9\-]{2,}\.[a-zA-Z0-9\-]{2,4} # domínio de primeiro nível
(\.[a-zA-Z0-9\-]{2,4})? # segundo nível - opcional
) #
(\/|\?)? # / ou ? para iniciar diretórios e querystrings - opcional
$/I'm not optimizing the size of the ER, the idea is that it is functional. Once you are working want to optimize.
Failure to implement verification login via FTP and check IP's .... any suggestions?
Greetings,
GuttoSP
@Arlen - There are 3 reasons why relying on html5 will fail for the near to mid future.
I'm working on a regular expression for url validation. The goal is to provide a better validator that will end up in drupal and used to validate urls (like the fields we fill out for profiles and getting them to work with flickr). If anyone has a minute here is my proposed regex.
Any suggestions for improvement? I'm looking for something at spec or close to spec. See http://www.ietf.org/rfc/rfc2396.txt, http://tools.ietf.org/html/rfc3305, http://tools.ietf.org/html/rfc1738, and http://www.w3.org/Addressing/URL/url-spec.txt for all the gory spec details.
Matt Farina
Geeks and God Former Co-Host
www.mattfarina.com