reptile7's JavaScript blog
Monday, November 22, 2010
 
Query, EarthLink, Future
Blog Entry #197

Query string coda

Two last comments on the query string business...

(1) Q: "In the previous post, we printed out the q["Browser"] = ["Internet Explorer", "Firefox", "Opera"]; array as a definition list. What would happen with just a document.write(q); command?"

A: document.write(q); would write [object Object] to the page, and that's it (document.write(q["Browser"]); would give us an Internet Explorer,Firefox,Opera display, however).

(2) Although I felt that the "How to Use a JavaScript Query String Parser" tutorial's text could have used some (OK, a lot of) editing help, I would nonetheless say that its query string-parsing code is clearly superior to that offered by "A Quick Tutorial on JavaScript Variable Passing" or by Danny Goodman's "Passing Data Between Pages via URLs" article. In particular, neither the "JavaScript Variable Passing" tutorial nor the Goodman article anticipates the a-name-can-have-more-than-one-value situation that Joseph Myers makes a big deal out of (and rightfully so, in retrospect):

(a) Even with the modifications specified by the Walking our way across the URL subsection of Blog Entry #194, the "JavaScript Variable Passing" tutorial's query string-parsing code expects a set number of name/value pairs and is thus obviously incompatible with a uninamed group of checkboxes and/or a <select multiple> menu, which allow a given name to be paired with a variable number of values. (That said, the "JavaScript Variable Passing" code should be formulable as a loop that would take care of this problem, a task I leave to others.)

(b) Applying the Goodman article's "Listing 1" code to the Browser=Internet+Explorer&Browser=Firefox&Browser=Opera query string would produce a one-element results["Browser"] = "Opera"; array. In the first iteration of the for (i = 0; i < srchArray.length; i++) { ... } loop, results["Browser"] is set to Internet+Explorer by the results[tempArray[0]] = tempArray[1]; line, but this same line overwrites Internet+Explorer with Firefox in the loop's second iteration and overwrites Firefox with Opera in the loop's third iteration.

Another detour

In my very first blog entry, I briefly related some back-and-forth that I had had with EarthLink about an HTML reference that EarthLink had written up but was no longer available at EarthLink's Web site. In that discussion, I parenthetically commented, I’ll have more to say about EarthLink, my ISP, and instructional material relating to the Web at a later time. Well, something has happened recently that gives me the excuse to finally follow through on this promise.

Since at least 1998, EarthLink has published an "eLink" newsletter comprising a grab bag of various Web-related features. The most recent issue of eLink is posted here; its "Web Watch" section sports a "Learn Web Tech from the Best" feature whose copy reads:
Learning the Web? Take a refresher or start from scratch
W3C are the people who decide the standards for Web pages and browsers (like HTML and CSS). For free tutorials and info, they're the ones to beat, and their "school" is built for everyone from beginners to pros who want to fill in the gaps.
There is a not-so-little problem here: the feature's Learn Web Tech from the Best heading link and its concluding Learn more link point to http://www.w3schools.com/, the homepage of W3Schools. A one-minute visit to Wikipedia reveals that W3Schools has no affiliation with the W3C; more specifically, the W3C is co-based in the U.S., France, Japan, and China, whereas W3Schools is run by a Norwegian company. (I'm a bit surprised that the W3C hasn't taken legal action against W3Schools to force it to change its name, but perhaps when you're the W3C you don't feel the need to stoop to that sort of thing.)

Now, I can see how an organization like the American Chemical Society might confuse the W3C and W3Schools, but EarthLink? Oh dear. But let me tell you what really gets my goat about this.

Once upon a time EarthLink promoted the creation of Web pages as a Web community activity. The "Using the Internet" sector of EarthLink's Web site contained a "Web Site Workshop" devoted to this end; relatedly, EarthLink's bLink magazine offered HTML-related tutorials on a semi-regular basis. EarthLink even ran for its members an annual homepage contest whose grand prize was a trip for two to a tropical paradise.

Sadly, all of this has fallen by the wayside. The Using the Internet sector and its Web Site Workshop were jettisoned at the end of 2002 (EarthLink's current customer support site is a mere shadow of these materials); bLink and the homepage contest were terminated at the end of 2001. EarthLink does at least still offer a Web site builder* whose Flash demo reassuringly informs us - hip hip hooray! - "No need to learn HTML or graphic design."
*At present, EarthLink's "Site Builder" widget states: The EarthLink Sitebuilder feature has been discontinued for new users as of August 22, 2011.

Why would anyone cook at home when they could just eat out at a restaurant? Why would anyone take up gardening as a hobby when they could just buy vegetables at a grocery store? Why would anyone learn HTML when they could just use an HTML editor? Give me convenience, or give me death! But seriously, folks, human nature being what it is, I'm sure there are some EarthLink members out there who, in a spirit of DIY craftsmanship, are ready, willing, and able to get their fingernails dirty with the details of hand-coding a Web page, and it bugs me that EarthLink has apparently lost interest in catering to them.

Let me get off my soapbox and give you some relevant references:

• Go here to check out a representative bLink issue.

• The "Quick HTML Tag Reference" that triggered all of this can be viewed here; its first-page links to .pdf versions of its pages are live as of this writing. I no longer have a need for this reference, nor would I make use of it if I were teaching an 'HTML 101' course, but there you have it.

• My first blog entry also name-checks a "The ABCs of HTML" bLink article, which can be read here.

• Go here for a 2001 taste of the EarthLink homepage contest.

• The Using the Internet sector and its Web Site Workshop in mostly intact form can still be explored here.

• Adding insult to injury, the April/May 1997 issue of bLink contained an "HTML: It's Not Rocket Science" article - read it here.

• BTW, there are even a few JavaScript-themed articles lurking in the bLink archives - check this out.

All of the above links come courtesy of the Internet Archive and its "Wayback Machine", a resource I was unfamiliar with when I began blogging.

Finally, I wouldn't want this section to imply that there isn't a lot of good instructional material at W3Schools - there certainly is (indeed, the W3C links to W3Schools on its CSS articles and tutorials page nope, that link isn't there anymore) - but for bona fide W3C HTML/CSS tutorials, see:
(1) "Getting started with HTML"
(2) "More advanced [HTML] features"
(3) "Adding a touch of style"
(4) "Starting with HTML + CSS"

What's next?

At this point in time, we have discussed most of the HTML Goodies Beyond HTML : JavaScript sector tutorials that Joe Burns put together during 1999-2002. Some thirty-odd tutorials have been added to the sector in the last few years (as far as I am aware, Joe left HTML Goodies in 2002 - the Internet Archive shows no change in the sector from August 2002 to mid-February 2005); henceforth we will cover some but not all of these newer tutorials. Let's get on with it, then: in the following entry we will check over the sector's "How to Populate Fields from New Windows Using JavaScript" tutorial.

reptile7

Saturday, November 13, 2010
 
Accessing Query String Data, Continued
Blog Entry #196

We're currently working through an example that applies the scripts of HTML Goodies' "How to Use a JavaScript Query String Parser" tutorial to a uninamed set of checkboxes. Picking up where we left off last time, we now have in hand the array below

q["Browser"] = ["Internet Explorer", "Firefox", "Opera"];

and we're ready to print this baby out. Towards this end, we will load the q data into a dl (definition list) element

<dl>
<dt>Browser</dt>
<dd>Internet Explorer</dd>
<dd>Firefox</dd>
<dd>Opera</dd>
</dl>


so as to give the following display:
Browser
Internet Explorer
Firefox
Opera
The browsers on my computer uniformly indent dd element content by half a tab (as they do for li element content of both ul and ol lists), i.e., by four spaces in a monospace font.

Returning to the script in the first textarea field on the tutorial's third page, here's the code that dl-izes the q data:
d.write("<dl>");
for (k in q) {
	d.write("<dt>", entify(k), "</dt>");
	for (i = 0; i < q[k].length; i++)
		d.write("<dd>", entify(q[k][i]), "</dd>"); }
d.write("</dl>");
Because q is a mixed associative-ordinal array, we will use two types of loops to access its data:
(1) An outer, one-iteration for...in loop will load q's first-dimension index value - Browser - into a dt (definition term) element. (IMO, Microsoft does a better job of explaining the for...in statement than does Mozilla - I recommend the former link if for...in loops are new to you.)
(2) A nested, three-iteration for loop will respectively load q's second-dimension element values - Internet Explorer, Firefox, and Opera - into successive dd (definition description) elements.

Before these values are written to the page, however, they are run through the entify.js script's entify( ) function:
/* Convert unsafe characters to HTML entities: */
function entify(s) {
	return ("" + s).
	replace(/&/g, "&amp;").
	replace(/</g, "&lt;").
	replace(/"/g, "&quot;").
	replace(/>/g, "&gt;"); }
Its appearance notwithstanding, the entify( ) function comprises a single statement that carries out two main tasks:

(1) &, <, ", and > characters in the data are replace( )d by their corresponding character entity references to ensure that the browser prints them out as literal characters and does not act on them as metacharacters.

(2) Any non-string (e.g., boolean or number) values are stringified via concatenation with an empty string, as calling the replace( ) method on a non-string will throw an "s.replace is not a function" error.

• The first replace( ) operation returns a string that is acted on by the second replace( ) operation, which returns a string that is acted on by the third replace( ) operation, etc. - daisy-chaining String commands in this way is very much analogous to forming a document.objectA.objectB.property expression. We saw back in Blog Entry #3 that JavaScript's 'dot operator' can be followed by a line break without throwing an error.

&, <, ", and > are not regular expression metacharacters and do not need to be literalized with backslashes for their respective replace( ) operations as was necessary for the conversion of plus signs to spaces by the param.js param( ) function.

• Are the replace( ) operations really necessary? You definitely need the replace(/</g, "&lt;") operation to see a literal < because an unescaped < is always seen as a start-tag open delimiter by the browser (at least this is my experience). In contrast, the browser will interpret &, ", and > as literal characters if it finds them in non-metacharacter contexts; the &/"/> replace( )ments are therefore less crucial - they're a "let's be on the safe side" thing, shall we say, for isolated cases that might require them.

• Is the ("" + s) stringification of non-string ss really necessary? Not if your query string is generated via an HTTP get transaction: across the board, all HTML control values have a string data type. But if your query string is created from scratch and contains =value-less names, and if you assign boolean or number values to those names on the parsing side (recall the else q[name][q[name].length] = true; clause in the param.js ptq( ) function, e.g.), then yeah, you're gonna need it.

Getting back to our example, the entify( ) function has no effect on
(a) Browser, the lone k value for the for...in loop, or on
(b-d) Internet Explorer, Firefox, or Opera, the q[k][i] values of the nested for loop;
these values are all strings and don't contain any &/</"/> characters, and thus they are returned unchanged to the d.write( ) commands that load them into dt or dd elements as shown above.

A curious note concerning the implementation of the param.js/entify.js scripts appears at the top of the tutorial's third page:
Note: The script probably won't work on your local computer if you have a recent browser. Due to cross-site scripting vulnerabilities some browsers won't load the files from myjs.us when they're used in a local file, due to the possibility of malicious scripts accessing local files.
Why would anyone use the param.js/entify.js scripts at the myjs.us site? Anyway, upon copying it in its entirety to my desktop, I find that the tutorial code works very nicely with all of the regular expression-supporting browsers on my computer.

Another example

In the tutorial's "Introduction" section, the author states:
Using a JavaScript query string parser is one of my favorite techniques. I've used it for online photo galleries, tracking coupon codes, writing online order forms, creating some of the original browser-based Rich Text editors and many other things.
There's nothing in the tutorial that relates to any of these applications, I'm sorry to report, but the tutorial does at least conclude with a "JavaScript Form Processor" example that, all at the same page,
(a) solicits a first name from the user,
(b) puts the user's input in a query string, and then
(c) extracts the user's input from the query string to print out a greeting to the user.
The example's demo page is here and its code is given below:

<title>JavaScript Form Processor</title>
<script src="http://myjs.us/param.js"></script>
<script>
q = param( );
firstname = q.name1;
document.write("<h1>Hello ", firstname || "Anonymous", "!</h1>");
</script>
<form>
Type your first name: <input type="text" name="name1" />
<input type="submit" value="Submit" />
</form>


So, we click the JavaScript Form Processor link in the tutorial's "Another Example" section to go to the http://www.codelib.net/home/jkm/2008/189.html demo page, at which we are greeted by a Hello Anonymous! h1 heading (more on this below). We type a first name - say, Joe - into the name1 field and click the button.

The controls' containing form does not have an action attribute, whose default value has a #REQUIRED designation; the W3C caveats, User agent behavior for a[n action] value other than an HTTP URI is undefined. What happens? In practice, the unspecified action URL 'resolves' to that of the current document, and a link is set up to http://www.codelib.net/home/jkm/2008/189.html?name1=Joe via an HTTP get transaction. The demo page reloads and we now see http://www.codelib.net/home/jkm/2008/189.html?name1=Joe in the browser window's address bar.

Next, the param( ) and ptq( ) param.js functions process the ?name1=Joe location.search value to give a one-element array:

q["name1"][0] = "Joe";

The firstname = q.name1; assignment gives the q["name1"] array a firstname identifier (q["name1"] and q.name1 are equivalent expressions). Finally, the

document.write("<h1>Hello ", firstname || "Anonymous", "!</h1>");

line concatenates <h1>Hello , q's Joe value, and !</h1> to give a Hello Joe! h1 heading and writes it to the page.

firstname is not a string but an object reference for an array; its use in the document.write( ) command entails a call to firstname.toString( ), which returns the string Joe.

• The || logical operator [r]eturns [its first operand] if it can be converted to true; otherwise, returns [its second operand]. As a non-empty string, Joe is convertible to true in a Boolean context and therefore we see Joe in the h1 greeting.

When we first access the demo page, the page URL doesn't have a query string; as a result, location.search returns an empty string, which is processed by the param( ) and ptq( ) functions to give a q[""][0] = true; array. In this case, q.name1 (firstname) is undefined, a primitive value that converts to false in a Boolean context, and therefore we see Anonymous in the h1 greeting.

The astute reader will have noticed that the example code imports the param.js script but not the entify.js script. Per the above discussion of entify.js's entify( ) function, I find that a Jo&e, Jo"e, or Jo>e input into the name1 field can be incorporated into the h1 greeting without incident, but a Jo<e input gives a Hello Jo heading, i.e., the browser sees the < as the open delimiter of a start-tag - a start-tag that is not completed and thus the <e is cut off - and not as a literal character. (BTW, an <em>Joe</em> input will give an italicized Joe in the greeting.)

There are some other remarks I could make about the tutorial (e.g., regarding the tutorial's "URL Encoding" section, no, it is not a JavaScript "bug" that different interfaces are required to deal with different types of form control objects) but I think I've had my say at this point. We'll get back to our "here's where we're at in the Beyond HTML : JavaScript sector, and here's where we're going" discussion in the following entry.

reptile7

Wednesday, November 03, 2010
 
All in the Query String Family
Blog Entry #195

I've had a change of heart about moving on from the 'carrying values across Web pages' topic. Besides "A Quick Tutorial on JavaScript Variable Passing", HTML Goodies' Beyond HTML : JavaScript sector features a related, three-page "How to Use a JavaScript Query String Parser" tutorial, authored by Joseph K. Myers and originally appearing at WebReference.com, that I thought we might discuss today.

The "How to Use a JavaScript Query String Parser" tutorial's second page links to a param.js script that, not unlike the "Passing Data Between Pages via URLs" code detailed in the previous post, extracts and objectizes the data in a query string. The tutorial's third page provides a script that prints out the param.js output in the form of a dl (definition list) element. The latter script also calls on an entify.js script that escapes any HTML metacharacters in the data before they are written to the page. No deconstruction is offered for any of this code.

The tutorial's "Format of the query string" section gives the structure of a "basic" query string (the string below appears in the WebReference.com version of the tutorial but not in the HTML Goodies version)

name1=Joseph&name2=Myers&website=http://www.codelib.net/

and notes the connection between a query string and an associative array - so far, so good. The section then unloads on the reader a series of "not standard" query strings without discussing the HTML contexts that might give rise to those strings:

(1) A given name can have more than one value:
website=http://www.codelib.net/&website=http://math111.com/
(2) A boolean query string name can be specified in 'minimized' form:
website=http://www.codelib.net/&secure
(3) Query string name/value pairs are sometimes delimited with semicolons and not ampersands:
website=http://www.codelib.net;secure

Actually, the first of these string types is as standard as standard can be. There are two HTML form situations that produce such a string:
(1) If a user checks two or more checkboxes in a group of checkboxes that have the same name; and
(2) If a user selects two or more options from a <select multiple> menu.

Playing the devil's advocate, I guess you could also give a common name to two or more text inputs (for that matter, there's nothing stopping you from giving the same name to every control, including the submit button) in a form, but this strikes me as really bad coding practice.

As for the other two string types, it's true that they shouldn't be generated from an HTML form - the W3C declares, Forms submitted with the [application/x-www-form-urlencoded] content type must be encoded [in the name=value&name=value... format] (emphasis added) - but with respect to cobbling these strings together from scratch, the query, pchar, unreserved, and sub-delims ABNF productions in RFC 3986, the current "URI: Generic Syntax" specification, make clear that there's nothing invalid about them, either:
query		= *( pchar / "/" / "?" )
pchar		= unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved	= ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims	= "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
Moreover, the HTML 4.01 Specification's "Ampersands in URI attribute values" section notes that a query string can be appended to the URL href value of an anchor element and recognizes the use of a semicolon to delimit the name/value pairs of such a string.

<a href="http://hostname/[pathname/]filename.html?x=1;y=2">Link text</a>

For an example of an =value-less query string, again as part of a href URL, see the "Tracking" section of Wikipedia's "Query string" entry.

Example/deconstruction

Let's suppose we're at http://www.whatever.com/page1.html, which contains the following form:

<form action="page2.html">
Which browser(s) does your family use to surf the Web? (Check all that apply.)<br>
<input type="checkbox" name="Browser" value="Internet Explorer"> Internet Explorer<br>
<input type="checkbox" name="Browser" value="Firefox"> Firefox<br>
<input type="checkbox" name="Browser" value="Safari"> Safari<br>
<input type="checkbox" name="Browser" value="Opera"> Opera<br>
<input type="checkbox" name="Browser" value="Chrome"> Chrome<br>
<button>Submit</button>
</form>


Which browser(s) does your family use to surf the Web? (Check all that apply.)
Internet Explorer
Firefox
Safari
Opera
Chrome


Let's say your better half surfs with Internet Explorer, your young 'un is a Firefox purist, and that you yourself are an Opera user. You check the appropriate checkboxes and click the button, which takes you to the page2.html action page, at which you see the following URL in the browser window's address bar:

http://www.whatever.com/page2.html?Browser=Internet+Explorer&Browser=Firefox&Browser=Opera

The page2.html source holds the code in the first textarea field of the tutorial's third page:

<!-- Import the query-string parser function. -->
<script type="text/javascript" src="http://myjs.us/param.js"></script>
<!-- Import the script which converts unsafe characters to HTML entities. -->
<script type="text/javascript" src="http://myjs.us/entify.js"></script>
<script type="text/javascript">
d = document;
/* Obtain query string parameters. */
q = param( );
...


We first import the param.js and entify.js scripts; the tutorial does not provide a link to the entify.js script but the http://myjs.us/entify.js URL is live as of this writing. We give the document object a d identifier and then call the param( ) function in the param.js script.
function param( ) {
	return ptq(location.search.substring(1).replace(/\+/g, " ")); }
The param( ) function in turn calls the ptq( ) function that makes up the rest of the param.js script, and passes thereto the post-? query string in which any plus signs have been replaced by spaces. (N.B. The literalizing backslash of the /\+/g regular expression in the param.js textarea field at the beginning of the tutorial's "Code" section is missing in the HTML Goodies version of the tutorial but is properly present in the WebReference.com version.) In our case, Browser=Internet Explorer&Browser=Firefox&Browser=Opera is passed to ptq( ), which gives it a q identifier.
function ptq(q) {
	/* Parse the query. */
	/* Semicolons are nonstandard but we accept them. */
	var x = q.replace(/;/g, "&").split("&"), i, name, t;
	... }
The var x = q.replace(/;/g, "&").split("&") operation
(1) replaces any semicolon name/value pair delimiters in q with ampersands (unnecessary for our q), and then
(2) splits the resulting string at the & separator to give an array of name/value pairs, which is assigned to an x variable, as though we had written:

var x = new Array( );
x[0] = "Browser=Internet Explorer";
x[1] = "Browser=Firefox";
x[2] = "Browser=Opera";


We next recast the x array as the following q two-dimensional array

q["Browser"][0] = "Internet Explorer";
q["Browser"][1] = "Firefox";
q["Browser"][2] = "Opera";


via a three-iteration for loop. Note that the loop's initial-expression resets q to an empty object using the object initializer syntax.

/* q changes from string version of query to object. */
for (q={ }, i = 0; i < x.length; i++) { ... }


Here's what happens in the first loop iteration:

(1) t = x[i].split('=', 2);
This statement splits x[0], Browser=Internet Explorer, at the = separator to generate a t[0] = "Browser"; t[1] = "Internet Explorer" array.

Somewhat oddly, a 2 limit parameter has been supplied for the split( ) operation; this ensures that the t array will have at most two elements. More specifically, if x[i] had an a=b=c form - e.g., website=http://www.codelib.net/index.html?name1=Joseph - then t = x[i].split('=', 2); would give a t[0] = "a"; t[1] = "b" array and discard the data after the second = separator - see this Mozilla example. We could create an a=b=c name/value pair from scratch but not via a form, for which the browser will percent-encode any = characters in the data.

(2) name = unescape(t[0]);
This statement gives t[0], Browser, a name identifier. Browser does not need to be unescaped and thus the unescape( ) function has no effect.

(3) if (!q[name]) q[name] = [ ];
This conditional checks if q["Browser"] doesn't exist - true for this iteration but false for the second and third iterations - and then sets q["Browser"] to an empty array using the array literal syntax.

(4) if (t.length > 1) q[name][q[name].length] = unescape(t[1]);
This conditional checks if t.length is greater than 1 - true (t is 2) for all three iterations - and then assigns t[1], Internet Explorer, to q["Browser"][0]; Internet Explorer does not need to be unescaped and thus the unescape( ) function has no effect. In addition, this assignment effectively increments q[name].length from 0 to 1.

In the second iteration, x[1], Browser=Firefox, is split( ) to give a t[0] = "Browser"; t[1] = "Firefox" array and Firefox is assigned to q["Browser"][1]. In the third iteration, x[2], Browser=Opera, is split( ) to give a t[0] = "Browser"; t[1] = "Opera" array and Opera is assigned to q["Browser"][2].

Had our original query string contained any =value-less names (for which t.length would be 1), the ptq( ) function's else q[name][q[name].length] = true; clause would have set those names to true.

Finally, q is returned first by ptq( ) to the param( ) function and then by param( ) to the page2.html q = param( ); statement, which gives the return a q identifier (ptq( )'s q is a local variable and is not available outside ptq( )).

We are ready to write q to the page - we'll do that, and also discuss the tutorial's concluding "Another Example", in the following entry.

reptile7


Powered by Blogger

Actually, reptile7's JavaScript blog is powered by Café La Llave. ;-)