Saturday, July 30, 2011
Money and Name RegExp Testing
Blog Entry #222
We continue today our analysis of HTML Goodies' "Checking Your Work: Validating Input -- Getting Started" and "Checking Your Work: Validating Input -- The Cool Stuff" tutorials. At this point our remaining task is to check over the "Cool Stuff" tutorial's regular expression(regexp)-based validation of user input for the text fields of the tutorial guestbook form.
We have previously carried out regexp validations with the test( ) method of the RegExp object and with the search( ) method of the String object (see Blog Entry #49, e.g.) - methods recommended by Mozilla for a simple comparison between a string and a regexp pattern. In contrast, all of the "Cool Stuff" regexp validations employ the match( ) method of the String object, a method we have not used before and which we should discuss before getting under way.
The match( ) game
When a string is successfully matched against a regexp pattern containing parentheses, the string itself and the parts of the string matching the parenthesized parts of the regexp pattern are "remembered" by the JavaScript engine, more specifically, they are organized as an array. There are two JavaScript methods that allow access to this array:
(1) the match( ) method of the String object, and
(2) the exec( ) method of the RegExp object.
(I'm actually not sure whether the array exists prior to the use of these methods or whether the use of these methods creates the array, but for the purposes of this discussion it doesn't really make a difference.)
There are two "Cool Stuff" regexp patterns that contain parentheses:
var telephoneExpression = /^((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}$/;
var moneyExpression = /^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(\.[0-9]{2})?$/;
The telephoneExpression pattern is for vetting the Telephone field. The moneyExpression pattern is for validating a monetary number and is unrelated to the guestbook form - we won't be dealing with this pattern later so we'll use it to illustrate the match( ) method.
With respect to the numbers it accommodates, the moneyExpression pattern can be divided into the following parts:
• [0-9]+ matches a number comprising an unbroken string of one or more digits, e.g., 10000.
• [0-9]{1,3}(,[0-9]{3})* matches one, two, or three digits followed by zero or more comma-delimited triads of digits, e.g., 2, 20,000, or 200,000,000.
(These subpatterns allow for numbers with leading zeroes, e.g., 0030; we'll sort out this issue when we discuss the numbersOnlyExpression regexp pattern.)
• (\.[0-9]{2})? is for numbers with a ¢/p part: it matches a decimal point followed by two digits. The ? quantifier signifies that this part of the match is optional.
So, suppose you are a Web auctioneer who wants to validate user input for a
What is your bid on Item X? $<input type="text" id="userbidID" name="userbidName" />
field with the moneyExpression pattern. The user enters 1,000.07 into the field and submits the value, triggering a function containing a
var moneyArray = document.getElementById("userbidID").value.match(moneyExpression);
statement. What's in the moneyArray return?
Per the preceding discourse, moneyArray is an array (FYI:
typeof moneyArray
merely returns object, but moneyArray.constructor
does return function Array( ) { [native code] }), and it holds four elements:moneyArray[0] = 1,000.07 - for the user's original string;
moneyArray[1] = 1,000 - for the parenthesized ([0-9]+|[0-9]{1,3}(,[0-9]{3})*) part of the moneyExpression pattern;
moneyArray[2] = ,000 - for the parenthesized (,[0-9]{3}) part of the moneyExpression pattern; and
moneyArray[3] = .07 - for the parenthesized (\.[0-9]{2}) part of the moneyExpression pattern.
Got all that? Now, let's look at the code that the tutorial author offers for validating the userbidID field:
// Validate for money
if (document.getElementById("userbidID").value.match(moneyExpression)) {
/* Valid - do nothing */ }
else {
validationMessage = validationMessage + " - Input not a valid money amount\r\n";
valid = false; }
If the userbidID field's value matches the moneyExpression pattern, then the if clause is operative because the returned match( ) array converts to true in a boolean context; in this situation we
do nothing. If the userbidID field's value doesn't match the moneyExpression pattern, then the match( ) operation returns null, which is one of those values (along with undefined, 0, and the empty string) that converts to false in a boolean context, and therefore the else clause is operative; in this case, a new segment is appended to the validationMessage string and the valid flag is set or reset to false.
Do we really need to be using the match( ) method here? You can see for yourself that the preceding conditional only checks if the match( ) return is null or not, and makes no use of the parenthesized parts of the moneyExpression pattern. If all we are interested in is a one-off comparison between the user's input and the moneyExpression pattern, then we can and should trade in this code for a corresponding test( )-based conditional:
if (!moneyExpression.test(document.getElementById("userbidID").value)) {
validationMessage += " - Input not a valid money amount\n";
valid = false; }
At least for a successful match between a string and a parentheses-containing regexp pattern, match( ) execution is slower than test( ) execution - see the Description section of Mozilla's test( ) method page. Is there a match( )/test( ) speed difference if the regexp pattern does not contain parentheses or for an unsuccessful match? You wouldn't think so, but I don't know. Even if we were to stick with the match( ) conditional, however, we should still transfer its else commands to the if clause, toggle the if condition with the ! operator, and then ditch the else clause, à la the test( ) conditional.
What's my name?
Getting back to the guestbook form, the "Cool Stuff" tutorial uses a noNumbersExpression regexp pattern and match( ) conditionals analogous to that given above to validate the form's First Name and Last Name fields:
var noNumbersExpression = /^[a-zA-Z]+$/;
// Validate first name and last name are all letters
if (document.getElementById("TextFirstName").value.match(noNumbersExpression)) { /* Valid - do nothing */ }
else {
validationMessage = validationMessage + " - First name must contain letters only\r\n";
valid = false; }
if (document.getElementById("TextLastName").value.match(noNumbersExpression)) { /* Valid - do nothing */ }
else {
validationMessage = validationMessage + " - Last name must contain letters only\r\n";
valid = false; }
The noNumbersExpression pattern was briefly explained in the closing part of the previous entry; it flags user inputs that do not consist solely of alphabetic characters (its identifier is thus a bit off in that it excludes symbols (e.g., @, #, $) as well as numbers).
It is of course a Webmaster's prerogative to limit a name input to alphabetic characters, but I myself would be less restrictive in this regard. We most recently addressed name validation in Blog Entry #206, in which I noted,
[N]ames can contain white space, hyphens, apostrophes, etc. (Within reason, they can even legitimately contain digits and symbolic characters.)Accordingly, I would subtract noNumbersExpression's ^ start-of-string anchor and $ end-of-string anchor and just use a /[a-zA-Z]+/ or /[a-z]+/i pattern.
You may be wondering, "What about the other extreme, can a name legally be a number or symbol(s), without any letters at all?" LegalZoom.com's Name Change Education Center sends out conflicting signals on this score, in effect saying, "Technically yes, but good luck with trying to change your name to such a name" - LegalZoom cites the example of Prince during his unpronounceable symbol days but gives the impression that it's a much tougher row to hoe for an average Joe. Anyway, I feel no need to cater to these oddball cases given their rarity.
Besides circumscribing the composition of a user's name inputs, you might also want to bound the lengths of those inputs - after all, someone could enter into the First Name field supercalifragilisticexpialidocious, which would pass validation, and you probably wouldn't want that. Towards this end, the noNumbersExpression tests can be augmented with
fieldObject.value.length > number
tests that flag names with more than number characters.if (!noNumbersExpression.test(document.getElementById("TextFirstName").value)) {
validationMessage += " - First name must contain one or more letters\n";
valid = false; }
else if (document.getElementById("TextFirstName").value.length > number) {
validationMessage += " - First name cannot contain more than number characters\n";
valid = false; }
If the user's inputs are limited to alphabetic characters, then these tests can be combined by replacing noNumbersExpression's + operator with a {1,number} operator.
if (!/^[a-z]{1,number}$/i.test(document.getElementById("TextFirstName").value)) {
validationMessage += " - First name must contain letters only, and cannot contain more than number characters\n";
valid = false; }
Alternatively, we can equip the TextFirstName/TextLastName input elements with a
maxlength="number"
attribute that will simply cut off their values at number characters, but that doesn't really count as validation, does it?Vis-à-vis the people I've known, the longest name I've encountered is Pranatharthiharan (17 letters), so I would set number to 20 - seems reasonable, eh?
We'll go through the numbersOnlyExpression, telephoneExpression, and emailExpression regexp validations, and roll out a demo, in the next entry.
reptile7
Actually, reptile7's JavaScript blog is powered by Café La Llave. ;-)