Here's a regular expression gotcha. I was writing a regex pattern this morning to capture the zip code from a US address. A zip code can be a 5 digits number following by an optional 3-4 digit zip4 extension, so here's the regex I wrote:
"\\b\\d{5}(?:[ -+]?\\d{3,4})?\\b"As it turned out, this regex will NOT match a zip code like "12323-1234". Can you spot the problem?
Well, the title of this post actually gave away the answer. This regex fails to match '12345-1234' because the dash character ('-') in the character class does not mean the dash character. With the character class [ -+] I meant to say the character ' ' or '-' or '+', but regex engine will interpret it as ' ' TO '+' which does not include '-'.
To fix the problem, simply put the '-' character at the very end of the character class which will then be intepreted as the actual dash character:
"\\b\\d{5}(?:[ +-]?\\d{3,4})?\\b"The moral of the story is that you need to be careful when putting the '-' character in a character class, because it could mean 'range to' or just dash depending on its location.