Be notified of new comments on this post with the RSS feed for this post.
Thank you for putting the time and effort into this introduction. I think it will help quite a few people that might not otherwise feel comfortable with regular expressions.
A typo: When you introduce spaces to your character class example "[a-zA-Z0-9-_ ]+", the '-' needs to either be the last member of the character class, or slash escaped. Otherwise, you're telling most regex parsers (Perl, Python, PCRE, and probably any others that matter) that the character class matches the range from 9 to _ in ASCII. That's likely to not be what you want.
The regex for the intended character class could be: [a-zA-Z0-9_ -]+ or [a-zA-Z0-9-_ ]+
jane: Yes.. That example may have been inspired by a certain web-comic :P
Scott: Thanks for the correction - I have updated my local version of the article - I will update the web-article/PDF when there's a few more corrections to be made
I had the same correction as the user above me. Also a few more:
Introduction, last paragraph:
"you start of learning some basics, then learn some more advanced bits"
Should be: (grammar correction and same tense)
"you start by learning some basics, then learning some more advanced bits"
Practical Examples opening paragraph has an extra line break in it. There is no reason for it and it should be removed. In fact this seems to happen all over the place. I would suggest keeping it consistent. Either have no line breaks or have the double line breaks where there is whitespace in between paragraphs.
Validating Form Input, last paragraph:
"Secondly we need to"
Should be:
"Secondly, we need to"
What evil people want: SQL Injection, first sentence: "There are two big problems when it comes to user-submitted data, which is going to be displayed on the site"
This sound weird, maybe it should be: "There are two big problems when it comes to user-submitted data that is going to be displayed on (a|your|the) site"
"the site" doesn't seem to make much sense, and the comma before "which" may be correct but it breaks the flow of the sentence. I donno.
From: What evil people want: XSS you have: (makes no sense)
s/</</g s/>/>/g
You probably want something like:
s/</</ s/>/>/
Which is probably what you have in your code but you have to go one step further to display to use you must do: < and >
Your trim regex only trims trailing whitespace:
s/\s+$//
For a complete trim there are a few alternatives:
s/\s+$// s/^\s+//
Or (notice I have to use the /g so in the case of leading and trailing whitespace it gets both):
s/^\s+|\s+$//g
Your multiple spaces regex:
s/\s{2,}/ /
I'm not sure if its worth mentioning, especially given that you're teaching {N,M} at that point in the article but you could do: (this replaces 1 or more with just 1 which may be overkill)
s/\s+/ /
Or: (this is functionally the same as your original)
s/\s\s+/ /
Therefore it might be worth mentioning there is more then one way to do it.
Your image matching regex:
<img.*? src=["']{1}([^"']+)["']{1}[^>]+>
You might want to come back to this after you discuss grouping and discuss a back reference, this way if the user opened with a single quote a back reference will make sure that you're matching an ending single quote (not a double quote). Something like: (I removed the superflous {1}s)
<img.*? src=(["'])([^"']+)(\1)[^>]+>
Language examples at the end. My suggestion would be to change the "replacement" to something more obvious like "[replacement]" so when someone runs the examples they can see immediately where the replacement was. Also it would be nice to have whitespace in blocks for Perl, PHP, and Javascript. I see you already have it in Ruby and Python.
That seems like enough for you to update the article. Its a great reference that I can point people to. I've been trying to get some co-workers to learn regex for a while!
Thanks, Joseph Pecoraro - http://blog.bogojoker.com
You can use a restricted version of markdown formatting here. You can use the toolbar above the text field to make this more painless. For more information about markdown please refer to the markdown cheatsheet.
Hmm, your Robort sql injection example reminded me of http://xkcd.com/327/ :)