Wednesday, November 03, 2010
All in the Query String Family
Blog Entry #195
I've had a change of heart about moving on from the 'carrying values across Web pages' topic. Besides "A Quick Tutorial on JavaScript Variable Passing", HTML Goodies' Beyond HTML : JavaScript sector features a related, three-page "How to Use a JavaScript Query String Parser" tutorial, authored by Joseph K. Myers and originally appearing at WebReference.com, that I thought we might discuss today.
The "How to Use a JavaScript Query String Parser" tutorial's second page links to a param.js script that, not unlike the "Passing Data Between Pages via URLs" code detailed in the previous post, extracts and objectizes the data in a query string. The tutorial's third page provides a script that prints out the param.js output in the form of a dl (definition list) element. The latter script also calls on an entify.js script that escapes any HTML metacharacters in the data before they are written to the page. No deconstruction is offered for any of this code.
The tutorial's "Format of the query string" section gives the structure of a "basic" query string (the string below appears in the WebReference.com version of the tutorial but not in the HTML Goodies version)
name1=Joseph&name2=Myers&website=http://www.codelib.net/
and notes the connection between a query string and an associative array - so far, so good. The section then unloads on the reader a series of "not standard" query strings without discussing the HTML contexts that might give rise to those strings:
(1) A given name can have more than one value:
website=http://www.codelib.net/&website=http://math111.com/
(2) A boolean query string name can be specified in 'minimized' form:
website=http://www.codelib.net/&secure
(3) Query string name/value pairs are sometimes delimited with semicolons and not ampersands:
website=http://www.codelib.net;secure
Actually, the first of these string types is as standard as standard can be. There are two HTML form situations that produce such a string:
(1) If a user checks two or more checkboxes in a group of checkboxes that have the same name; and
(2) If a user selects two or more options from a <select multiple> menu.
Playing the devil's advocate, I guess you could also give a common name to two or more text inputs (for that matter, there's nothing stopping you from giving the same name to every control, including the submit button) in a form, but this strikes me as really bad coding practice.
As for the other two string types, it's true that they shouldn't be generated from an HTML form - the W3C declares,
Forms submitted with the [application/x-www-form-urlencoded] content type must be encoded [in the name=value&name=value... format](emphasis added) - but with respect to cobbling these strings together from scratch, the query, pchar, unreserved, and sub-delims ABNF productions in RFC 3986, the current "URI: Generic Syntax" specification, make clear that there's nothing invalid about them, either:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
Moreover, the HTML 4.01 Specification's "Ampersands in URI attribute values" section notes that a query string can be appended to the URL href value of an anchor element and recognizes the use of a semicolon to delimit the name/value pairs of such a string.<a href="http://hostname/[pathname/]filename.html?x=1;y=2">Link text</a>
For an example of an =value-less query string, again as part of a href URL, see the "Tracking" section of Wikipedia's "Query string" entry.
Example/deconstruction
Let's suppose we're at http://www.whatever.com/page1.html, which contains the following form:
<form action="page2.html">
Which browser(s) does your family use to surf the Web? (Check all that apply.)<br>
<input type="checkbox" name="Browser" value="Internet Explorer"> Internet Explorer<br>
<input type="checkbox" name="Browser" value="Firefox"> Firefox<br>
<input type="checkbox" name="Browser" value="Safari"> Safari<br>
<input type="checkbox" name="Browser" value="Opera"> Opera<br>
<input type="checkbox" name="Browser" value="Chrome"> Chrome<br>
<button>Submit</button>
</form>
Which browser(s) does your family use to surf the Web? (Check all that apply.)
Internet Explorer
Firefox
Safari
Opera
Chrome
Let's say your better half surfs with Internet Explorer, your young 'un is a Firefox purist, and that you yourself are an Opera user. You check the appropriate checkboxes and click the button, which takes you to the page2.html action page, at which you see the following URL in the browser window's address bar:
http://www.whatever.com/page2.html?Browser=Internet+Explorer&Browser=Firefox&Browser=Opera
The page2.html source holds the code in the first textarea field of the tutorial's third page:
<!-- Import the query-string parser function. -->
<script type="text/javascript" src="http://myjs.us/param.js"></script>
<!-- Import the script which converts unsafe characters to HTML entities. -->
<script type="text/javascript" src="http://myjs.us/entify.js"></script>
<script type="text/javascript">
d = document;
/* Obtain query string parameters. */
q = param( );
...
We first import the param.js and entify.js scripts; the tutorial does not provide a link to the entify.js script but the http://myjs.us/entify.js URL is live as of this writing. We give the document object a d identifier and then call the param( ) function in the param.js script.
function param( ) {
return ptq(location.search.substring(1).replace(/\+/g, " ")); }
The param( ) function in turn calls the ptq( ) function that makes up the rest of the param.js script, and passes thereto the post-? query string in which any plus signs have been replaced by spaces. (N.B. The literalizing backslash of the /\+/g regular expression in the param.js textarea field at the beginning of the tutorial's "Code" section is missing in the HTML Goodies version of the tutorial but is properly present in the WebReference.com version.) In our case, Browser=Internet Explorer&Browser=Firefox&Browser=Opera is passed to ptq( ), which gives it a q identifier.function ptq(q) {
/* Parse the query. */
/* Semicolons are nonstandard but we accept them. */
var x = q.replace(/;/g, "&").split("&"), i, name, t;
... }
The var x = q.replace(/;/g, "&").split("&")
operation(1) replaces any semicolon name/value pair delimiters in q with ampersands (unnecessary for our q), and then
(2) splits the resulting string at the & separator to give an array of name/value pairs, which is assigned to an x variable, as though we had written:
var x = new Array( );
x[0] = "Browser=Internet Explorer";
x[1] = "Browser=Firefox";
x[2] = "Browser=Opera";
We next recast the x array as the following q two-dimensional array
q["Browser"][0] = "Internet Explorer";
q["Browser"][1] = "Firefox";
q["Browser"][2] = "Opera";
via a three-iteration for loop. Note that the loop's initial-expression resets q to an empty object using the object initializer syntax.
/* q changes from string version of query to object. */
for (q={ }, i = 0; i < x.length; i++) { ... }
Here's what happens in the first loop iteration:
(1)
t = x[i].split('=', 2);
This statement splits x[0], Browser=Internet Explorer, at the = separator to generate a
t[0] = "Browser"; t[1] = "Internet Explorer"
array.Somewhat oddly, a 2 limit parameter has been supplied for the split( ) operation; this ensures that the t array will have at most two elements. More specifically, if x[i] had an a=b=c form - e.g., website=http://www.codelib.net/index.html?name1=Joseph - then
t = x[i].split('=', 2);
would give a t[0] = "a"; t[1] = "b"
array and discard the data after the second = separator - see this Mozilla example. We could create an a=b=c name/value pair from scratch but not via a form, for which the browser will percent-encode any = characters in the data.(2)
name = unescape(t[0]);
This statement gives t[0], Browser, a name identifier. Browser does not need to be unescaped and thus the unescape( ) function has no effect.
(3)
if (!q[name]) q[name] = [ ];
This conditional checks if
q["Browser"]
doesn't exist - true for this iteration but false for the second and third iterations - and then sets q["Browser"]
to an empty array using the array literal syntax.(4)
if (t.length > 1) q[name][q[name].length] = unescape(t[1]);
This conditional checks if t.length is greater than 1 - true (t is 2) for all three iterations - and then assigns t[1], Internet Explorer, to
q["Browser"][0]
; Internet Explorer does not need to be unescaped and thus the unescape( ) function has no effect. In addition, this assignment effectively increments q[name].length from 0 to 1.In the second iteration, x[1], Browser=Firefox, is split( ) to give a
t[0] = "Browser"; t[1] = "Firefox"
array and Firefox is assigned to q["Browser"][1]
. In the third iteration, x[2], Browser=Opera, is split( ) to give a t[0] = "Browser"; t[1] = "Opera"
array and Opera is assigned to q["Browser"][2]
. Had our original query string contained any =value-less names (for which t.length would be 1), the ptq( ) function's
else q[name][q[name].length] = true;
clause would have set those names to true.Finally, q is returned first by ptq( ) to the param( ) function and then by param( ) to the page2.html
q = param( );
statement, which gives the return a q identifier (ptq( )'s q is a local variable and is not available outside ptq( )).We are ready to write q to the page - we'll do that, and also discuss the tutorial's concluding "Another Example", in the following entry.
reptile7
Actually, reptile7's JavaScript blog is powered by Café La Llave. ;-)