Thursday, March 01, 2007
Google It, Part 3
Blog Entry #68
We continue today with our discussion of HTML Goodies' JavaScript Script Tips #42-44 and their multiple search engine script. Having detailed the inner workings of the Script Tips #42-44 Script in our last session, we are ready to ask, "And just how does it all play out in practice?"
Yahoo!
Altavista
WebCrawler
Excite
Lycos
As indicated, the Script Tips #42-44 Script searches the Yahoo!, AltaVista, WebCrawler, Excite, and/or Lycos search engines per the user's checkbox choices. Later in the post, we'll extend the script to other search engines; indeed, Joe notes in Script Tip #42 that the script in its original form searched up to 10 search engines.
In trial runs with the Script Tips #42-44 Script, I find that
(1) Yahoo! and WebCrawler searches work OK, but
(2) AltaVista, Excite, and Lycos searches do not.
An attempted AltaVista search first leads to a "Yahoo! - Incorrect URL" page (Wikipedia notes here that Yahoo! has owned AltaVista since March 2004); 60 seconds later, I am routed to http://www.altavista.com/*.
An attempted Excite search leads directly to http://www.excite.com/*.
An attempted Lycos search does lead to a page whose URL is http://www.lycos.com/cgi-bin/pursuit?query=search+term* but that is otherwise indistinguishable from http://www.lycos.com/.
(*To determine this information, I added a location feature to the third, windowFeatures parameter of the window.open( ) commands of the AltaVista, Excite, and Lycos if statements in the search( ) function, e.g.:
wind=window.open(search2,"newwindow2","location,width=700,height=200,scrollbars=yes");.)
Paralleling the Primer #18 Script and its faulty FullSearchUrl variable, the problem here lies in the partial URLs assigned to the value attributes of the checkbox input elements. The Script Tips #42-44 Script (like the rest of the HTML Goodies JavaScript Primers and Script Tips material) was written in the late 1990s, so it's not such a surprise that the AltaVista, Excite, and Lycos partial URLs are now outdated; perhaps what's surprising is that the Yahoo! and WebCrawler partial URLs are still functional.
And where do these partial URLs come from, anyway? In the "Checkboxes" section of Script Tip #42, Joe alleges,
The code is pretty easy to find. You just go to the search page of the engine and look at the source code. I'm sure that's where the author got it.In response, I wonder, "Did Joe actually try this?" To be sure, a search engine's source code usually provides clues as to what the form of the search-results URL for that search engine will be, and sometimes you can put two and two together and come up with a functional URL. For example, consider the following form used by Lycos searchers:
<!--Line-breaks have been added for clarity.-->
<form method="GET" action="http://search.lycos.com" class="hpFrm" name="searchform">
<strong>SEARCH:</strong>
  
<!--For user input:-->
<input type="text" style="width:450px; vertical-align:middle;" name="query" value="" id="hpSrchValue" />
  
<!--The "GO GET IT!" submit button is coded by:-->
<input type=image src="http://ly.lygo.com/ly/hp/ggiBut.gif" style="vertical-align:middle;" width="86px" height="24px" border="0" />
  
<!--The "powered by Ask™.com" image is coded by:-->
<img src="http://ly.lygo.com/ly/srch/ask/askSrch.gif" width="40" height="32" style="vertical-align:top;" />
</form>
<!--  is a numeric character reference for a non-breaking space.-->
The user types a search term into the query text box and then clicks the "GO GET IT!" submit button (the <input type="image"> element is discussed here in the HTML 4.01 Specification), whereupon the searchform form is sent off to http://search.lycos.com via the HTTP GET method. Before the fact, then, we would expect the URL for a page of Lycos search results to be a variant of http://search.lycos.com?query=search+term, in which a query=search+term query string is appended to the search.lycos.com hostname, and with respect to the Script Tips #42-44 Script, we would similarly expect http://search.lycos.com?query= to be a workable partial-URL value for the Lycos checkbox - and both expectations prove correct, more or less:
(A) An actual Lycos search using Gila monster as a search term returns a "Lycos Search Results: results for Gila monster" page whose URL is http://search.lycos.com/?query=Gila+monster&x=27&y=8 (you will probably get different values of x and y in the query string if you try this yourself).
(B) When I enter http://search.lycos.com?query=Gila+monster in the browser window's address bar field and hit the return key, the results are browser-dependent:
(i) Netscape 7.02 takes me to the same page as for (A) above - more specifically, it's the same page contentwise but the URL is http://search.lycos.com/?query=Gila+monster;
(ii) MSIE 5.1.6 takes me to a "The web site address you entered could not be found" page; however, if a slash is inserted after the search.lycos.com hostname, then an http://search.lycos.com/?query=Gila+monster input gives a successful search (à la (B)(i)).
I've used Lycos as an example here because the searchform form's action attribute value can be used 'as is' in piecing together a "Lycos Search Results" URL; this is not the case for the corresponding forms in the Yahoo!, AltaVista, and WebCrawler sources, and good luck in fishing anything meaningful out of Excite's insanely complicated source. I nonetheless encourage you, as an exercise, to rummage through AltaVista's source and see if you can craft a working "AltaVista Search" URL; subsequently, you should check your URL's validity via your browser's address bar.
Looking over those checkbox value attribute values, however, I'd bet the farm (or at least a venti Frappuccino) that the Script Tips #42-44 Script's author ("Mujib" or whoever it was) obtained them not from anyone's source code but via the approach used by Joe in his answer to the Primer #18 Assignment:
All I did was go to a search engine, do a search and copy the address from the Location bar. Ta Da!Consider the AltaVista checkbox value:
value="http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&fmt=.&q="
I am highly skeptical that the pg=q, what=web, and fmt=. name=value pairs in the query string above were obtained from AltaVista's source way back when. In any case, we can easily apply (or reapply) Primer #18's do-a-search-and-grab-the-URL method to the Script Tips #42-44 Script and thus bring it up to date:
AltaVista
An actual search gives:
http://www.altavista.com/web/results?itag=ody&q=Gila+monster&kgs=1&kls=0
In the Script Tips #42-44 Script, I find that I can get away with using:
value="http://www.altavista.com/web/results?q="
(FYI: the itag=ody name=value pair is for an <input type="hidden"> element, whereas the kgs=1 and kls=0 name=value pairs are for http://www.altavista.com/'s "SEARCH: USA" and "RESULTS IN: All languages" radio buttons, respectively. Might some users need these name=value pairs to be in the partial URL? Perhaps - I don't know.)
Excite
An actual search gives:
http://msxml.excite.com/info.xcite/search/web/Gila%2Bmonster
In the Script Tips #42-44 Script, you can use:
value="http://msxml.excite.com/info.xcite/search/web/"
Results for Lycos are given above, but let's repeat them anyway.
An actual search gives:
http://search.lycos.com/?query=Gila+monster&x=27&y=8
In the Script Tips #42-44 Script, you can use:
value="http://search.lycos.com/?query="
(I have no idea what the x=27 and y=8 name=value pairs are for, but they are evidently unnecessary vis-à-vis the partial URL.)
Other search engines
In Blog Entry #34, we searched Google using an http://www.google.com/search?q= partial URL. What other search engines might we want to search?
According to the Nielsen NetRatings Search Engine Ratings and the comScore Media Metrix Search Engine Ratings provided by the Reviews, Ratings & Tests section of Search Engine Watch, the top five search engines as of July 2006 are:
1. Google
2. Yahoo!
3. MSN
4. AOL
5. Ask
To add, say, Ask.com to the Script Tips #42-44 Script's stable of search engines, we can augment the script with the following blocks of code:
(1) In the searching form:
<input type="checkbox" name="ask" value="http://www.ask.com/web?q=" />Ask.com<br />
<!--The value value was obtained by the Primer #18 method.-->
(2) In the search( ) function:
if (document.searching.ask.checked) {
search6 = document.searching.ask.value;
search6 += key;
window.open(search6,"newwindow6","width=700,height=200,scrollbars,location,resizable"); }
// It always annoys me when I can't resize a window.
Your Assignment
Add MSN Search (which is now Live Search) and AOL Search to the Script Tips #42-44 Script. For each search engine,
(a) carry out a normal search and
(b) determine a suitable searching checkbox value; then,
(c) write out and
(d) insert into the script
blocks of code corresponding to those for Ask.com above. Finally,
(e) test your code by carrying out MSN/Live Search and AOL Search searches using the Script Tips #42-44 Script.
We'll switch gears in the next entry and check over a script that scales proportionally the width/height of an image and that spans Script Tips #45-48.
reptile7
Actually, reptile7's JavaScript blog is powered by Café La Llave. ;-)