A Primer on Internet Content
|
<html> <head> <title>Welcome Page</title> </head> <body> <p> <center>Hello there</center> </p> </body> </html> |
Figure 1 - Web Page HTML
The browser is an application that can do a few things essential to providing
access to Internet sites. It can navigate to a Universal Resource Locator
(URL) address. It can decipher an HTTP header. More importantly, it knows
how to use the graphical services of the computer on which it is running.
That is, it knows how to draw underlined text, to create margins, to subscript.
So when it receives a transmission with <sub>2</sub>, instead
of displaying the tags, it displays a subscripted "2."
It can provide the contents of a basic web page:
Figure 2 shows an example of the formatting capabilities of HTML, but is it enough? For teachers, students and scientists, the original users, the answer was "yes." Then commerce hit the Internet and things were never going to be the same - it was a quick lesson in how the suitability of an item is only as static as the intended use!
Figure 2 - Formatting Capabilities of HTML
The requests for more capability came in fast and furious. "Tables are fine, but we need to be able to give cells background images." "How about changing the color of the text?" "Why can't tables be side by side?" "Text needs to be able to wrap around an image" "We need a way to show more than one page simultaneously within frames."
With such an outspoken commercial response, the browser makers hurried to respond. They had two viable options: take the long-term route of lobbying to get all the enhancements into the HTML standard, or introduce the enhancements as extensions beyond the standard as quickly as they could turn them out.
The former alternative might seem to be the logical choice since a standard would be maintained - but the Internet was now out of the hands of academia and in the hands of software vendors whose primary interest is market share. Enhancements buy market share, and out they came - a slew of them. Figure 3 details some of the additional capabilities of HTML following this explosion.
Figure 3 - Frames using enhanced HTML
There was a downside to this approach. Remember that HTML does nothing; that it is a method of encoding a request to the browser, which displays it to provide a degree of formatting. Since each browser developer was determining the enhancements for their own product, the competing browsers would not necessarily recognize the HTML that requested the formatting for those enhancements, nor be able to provide such formatting. The default action for browsers is to ignore any tag it doesn't understand instead of raising an error.
For example, if a browser understands that <RBL> indicates subsequent text should be formatted as a list item with rectangular bullets, as seen in Figure 4a, another browser that does not understand that tag would format the same code as seen in Figure 4b.
![]() |
Figure 4a - A Browser Supporting Rectangular Bullets
Item 1 Item 2 Item 3
|
Figure 4b - A Browser Not Supporting the Bullet List
A question then is "how do you create a web page that can be displayed in the same fashion by either browser?" There are two possibilities: avoid any of the HTML extensions, or; at the point when the page is to be displayed, determine which browser is displaying it and pass it HTML it understands. Neither solution is elegant, and choosing the latter route means maintaining multiple versions, but those continue to be the choices. Luckily, the commonality of functionality as charted on a graph prior to the divergence into extensions continues to grow larger.
There is another downside to extended HTML. In the event that a tag is understood by more than one browser, the implementation of that tag might be different. Similarly, the formatting that would result from a particular tag might be supported by each browser, but using a different tag or tag options, such as one browser supporting text color and another typeface.
The largest complaints:
These issues have led to the explosion of technologies that have been bolted on to HTML.
CGI is one of the most misunderstood areas of Internet content. CGI is not a programming language, and not a mysterious black hole. It's simply a gateway, which allows a web page to transfer control outside the HTML environment. Application developers can think of it as a user exit.
To understand CGI, an understanding of the architecture of a web site is needed. When someone surfing the web enters a Universal Resource Locator (URL), such as http://www.puddleduck.com, what happens? A web page appears, or a screen indicating a 404 error because the page doesn't exist, but what is happening under the covers? Following is a step-by-step look.
Figure 5 shows a typical directory tree for a web site. One of the directories is cgi-bin. It is in this directory that CGI binaries will reside. A CGI binary can be an application written in any language such as Java, C or C++. It can also be a script, typically written in PERL on UNIX systems, or PERL5 on PCs.
|
Figure 5 - Web Site Directory Tree
Why transfer control to an application? Typically to manipulate data. One use would be sorting through the mess transmitted when a user fills in a form and then submits it. A form can be coded to email the user data, and you would think that the email would be meaningful and readable, but Figure 6 shows what you actually receive from, for example, a name and address form.
prefix=&fname=j.r.&lname=greenberg&street1=123 Any Street&street2=Apt.456&city=Atlanta&state=GA&zip=12345 |
Figure 6 - Data Posted From A Web Form
If coded to do so, the submission invokes an application and makes available to it the data from the form. Invoking a CGI application is as simple as having a parameter in a form tag like the one below.
<FORM ACTION="/cgi-bin/my-app.exe" METHOD="POST">
This tag will invoke the application my-app and pass it the data from the form. The response from the CGI application is fairly simple as well. Once the application has opened an output stream, it outputs the HTTP protocol entries, and then outputs the equivalent of an HTML file, which results in a page being displayed to the user (Figure 7).
Figure 7 - A Response Page Generated via CGI
That data stream is far from user-friendly though, since it looks like the email message in Figure 6. Formatting it would be a good start for the application. Once the data is formatted and usable, the application might perform acceptance edits and then provide feedback to the user by transmitting a new page to the user's browser. A CGI application can do more than just reformat data. It can access other structures on the server, such as a database. This opens up several possibilities. One would be to accept an order from a web page and post that order to a database, responding to the user with a confirmation number. Another would be to provide information to the user from a database, such as the prices of items that fluctuate too often to make hard-coding them into a web page viable, such as bullion prices.
CGI is available with UNIX and NT servers, although on NT platforms, depending on the programming language used it can be a bit tricky accessing the data stream. CGI programming doesn't need to be accomplished through an application, it can be a script, or command file, written in a scripting language like PERL, which would allow the program to be ported between dissimilar platforms more easily than using, for example, Visual Basic.
If the server on which the web site resides is an NT server, another method of having the server perform background processing is by using Active Server Pages.
Active Server Page technology is a repackaging of Microsoft's OLE technology. An Active Server Page is a web page with an extension of .ASP instead of .HTM.
In addition to the HTML, if any, contained in the page, the file contains either JavaScript or VBScript (discussed later). The script in the file is executed by the server when the page is invoked. Typically, the script will perform some processing and generate HTML statements. What makes ASP great is that the only thing passed to the user is the resulting HTML - there is no visibility into the script code, not even if the user looks at the source code with the browser. The reason is that despite invoking, for example, myfile.asp, the user never sees that file. The resulting HTML is presented to the user as myfile.htm.
An example of the usefulness of ASP is the processing of a form. Instead
of invoking a CGI program, the form data can be provided to an ASP file.
That file can then process the data, mail it in a formatted message, and
generate an acknowledgement page much like that in Figure 7.
The largest drawback to ASP currently is that it's only supported running
on Microsoft NT or Windows 95 servers. Any client on any browser can access
Active Server Pages, because the only thing to reach the client is HTML,
but most Internet Service Providers (ISP) are using servers which don't
use the Microsoft operating systems.
The Java programming language was developed by Sun Microsystems. What makes Java unique is the fact that instead of being compiled into machine-dependent code, it is compiled into an intermediate "P code", which is identical from platform to platform. On each platform is a machine-dependent Java engine that can run this "P code."
Java resembles C++. It is an object-oriented programming language with many of the confusing aspects of C++ removed. There's nothing specific about Java that makes it better to develop CGI applications than another language, but because of its portability it's possible to use it to provide web pages with rich content.
There are still concerns that the architecture of Java brings with it potential security breaches. It is interesting then, that on the other end of the spectrum, the Internet application technology considered to be very secure is the Java Applet.
Microsoft has its own implementation of the Java language, Visual J++. They recently announced that they planned on moving in their own direction with regards to its capabilities, and will no longer follow the Sun standard.
Applet is a cute name for a small application - and a small application is precisely what a Java applet is. Applets are secure because their architecture prevents them from touching anything outside themselves, such as files and memory, except for the creation of a "cookie," a controlled data element managed by the user's browser that is stored in a directory on the user's system.
The typical use of an applet is for a small self-contained graphical element on the web page. The most familiar example of such an element is the ad banner, that rectangle of cycling advertisements like the one shown in Figure 8.
Figure 8 - An Ad Banner
JavaScript and VBScript are script languages built on a subset of Java and Visual Basic respectively. Script languages allow programming statements to be executed without having to compile. The process is accomplished by an engine, which supports the script language, being built into the browser. Because the instructions are being executed by the browser, the script is running on the client, unlike CGI and ASP, which run on the server. This is a good thing, because client code typically runs faster than server code; this is a bad thing, because the user has to sit and wait while the client code downloads along with the web page.
Why would you want to execute code on the client? If there's a database to access, it will be on the server. If a new web page is to be accessed or generated, it will come from the server. What does having code execute on the client buy you? Local processing, with no need to go to the server. An example of the utility of this is performing edits on a form. If there is a field that should have a date, and the user enters the date incorrectly, it would be nice to let the user know without his having to wait until the form is submitted.
JavaScript comes in two flavors, Netscape and Microsoft (called Jscript). The two have many syntactical differences, different capabilities, and even conflicting bugs. Also, it runs differently on each version of the same browser as the bug fixes were only applied to the version current at the time. Because the code runs on the client, the type of server is immaterial. VBScript currently is support only with Microsoft's Internet Explorer.
Luckily, script code is encased in a comment block. Browsers that don't recognize script code will ignore it, thinking it a comment. Testing is more immediate because it can be executed on the client without having to be uploaded to the server.
Cascading Style Sheets introduce the ability to apply more style control than HTML provides. Style sheets can be applied to a page element, a class of elements, or elements on multiple pages in a web site, without having to code the style instructions repetitively.
It used to be tedious to add formatting to similar elements in a page. For example, if after coding a page of paragraph headings
a decision is made to make the text in the list items blue, the old way to accomplish this was to change every heading
but with CSS, styling can be applied to the tag element
every heading of type H2 will appear with blue text. The same result can be obtained through JavaScript by treating the H2 tag as an object.
Styling can also be applied to a class
then any tag that specifies a CLASS="cool" will have that formatting. A global style sheet can be set up so that every page on the site shares similar formatting. CSS provides the ability to format margins, typeface, spacing and other facets of layout. CSS is currently supported primarily by Internet Explorer 4.
Dynamic HTML (DHTML) takes CSS a step further. DHTML is made up of three parts:
The CSS portion of DHTML is as discussed in the section on CSS. By using styles, the visual aspects (color, size, spacing) of the web page can be defined and controlled.
Positioning HTML is the next part of DHTML. Page elements can be positioned absolutely, a specific location on the page, or relatively, as the position relates to other elements.
The third part of DHTML is downloadable fonts, which removes the need
to depend on Times Roman, Helvetica and Courier as the typefaces used
on a page. Although a font would only need to be downloaded if it were
not already present on the client, it can certainly add significant download
time to a page when that happens. Therefore, this facility might only
be appropriate for intranets, where groups within a company need the font
to be able to view documents, forms, etc.
There is a very significant drawback to DHTML, and that is that Netscape
and Microsoft have gone in very different directions in its implementation.
Microsoft has taken the route of opening the entire to web page to reference via the object model. Every item on the page and every aspect of the item is available to be referenced. This then finally allows an ability that users have been screaming for - the ability to dynamically change text without having to resend the entire page. Before this, presentation aspects of the text could be changed, but not the text itself. Now, even the URLs behind the hyperlinks on a page can be altered.
Netscape has chosen not to open up the web elements to the same degree. Instead, Netscape is providing the ability to specify page and element layers. By altering the Z-order of layers, the order in which they're stacked, a different version of page elements can be displayed. This method does not easily apply itself to the dynamic alteration of text.
The two methods are so different, it will be difficult to define a standard. But until a standard is defined and embraced, if ever, the use of DHTML can require having two different versions of the same web page.
The eXtensible Markup Language (XML) is just beginning to become available.
It is based upon the Standard Generalized Markup Language (SGML), and
can be viewed as being SGML-lite.
XML will be invaluable for the presentation of documents at web sites.
It allows for the wholesale declaration of tags that define any and all
portions of a document, much like styles in Microsoft Word or WordPerfect.
Once the tag and the presentation are defined, such as longquote to define
a quotation that is more than a couple lines, that tag will ensure that
all such portions of text are presented in the same fashion in any document.
XML will be a client-based feature. It will have a robust standard, but count on a lengthy burn-in period until every browser treats it the same and the bugs are removed.
Hopefully, this compressed look at the major technologies available for developing Internet presentation content has peaked your curiosity in terms of the opportunities for more advance presentation on the web than just pages of text and cute graphics. On-line magazines, newsletters, documents, brochures; all are now viable, albeit with some degree of difficulty.
On an intranet, the difficulty is minimized because the users can be corralled into using the same clients, servers and browsers. On the Internet, to use these advanced features a decision will need to be made to either maintain various versions of a page, or to exclude segments of the population that are not using the client or browser you will be supporting.
For further reading, I suggest searching the Internet for the standards and extended implementations of the technologies presented here. This information will provide much more detail on each technology as well as the full syntax and capabilities. If you want to roll up your sleeves and dive in, I'd suggest visiting your local bookstore as there are numerous books on each of these technologies.
J.R. Greenberg is a senior project manager at the Hewlett-Packard Company managing their Telecom Software Services & Processes team. He has been with Hewlett-Packard for twelve years, and a computer consultant for 21 years.
J.R. is the co-author of A Methodology for Developing and Deploying Internet and Intranet Solutions by HP Press and Prentice-Hall PTR. He and his family make their home in the Atlanta area. J.R. can be reached at j_r_greenberg@hp.com.