| EIW Fall 2004 Lecture Notes |
|   EIW Home  |   Course Syllabus |
In 1945 Vannevar Bush published an essay titled "As We May Think" in Atlantic Monthly that described the idea of linking documents together to make it easier to keep track of relationships between documents. Although Bush's description was primarily that of a "personal system" (instead of a global system linking documents from many sources), it is often credited as being the earliest description of what we now call hypertext. The term "hypertext" was coined by Ted Nelson in 1965 who went on to provide (along with Douglas Englebart -the inventor of the mouse) a crude implementation of hypertext. Nelson went on to describe and design a system called "Xanadu" that was to be used to put the entire literary content contained in the world online. Work on Xanadu was started in 1979 and still continues...
In 1989, while working at the European Particle Physics Lab , Tim Berners-Lee designed a system that would allow scientists to easily share scientific findings over the Internet. The initial implementation included a text-mode browser and a browser written for the NEXTStep operating system that provided access to hypertext files based on HTML and to USENET new groups. HTML (Hypertext Markup Language) was developed by Berners-Lee as a subset of SGML (Standard Generalized Markup Language). The protocol for retrieval of HTML documents was named HTTP (HyperText Transfer Protocol).
In 1993 the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign developed browser named Mosaic that run under Unix systems running X-Windows (they also developed a version for the Macintosh). The staff at NCSA also provided an extended version of HTML that included support for images (the IMG tag). Although the IMG tag was eventually incorporated in to HTML, the programmers at NCSA didn't wait for the HTML standards organization (headed by Berners-Lee) to incorporate the change in to the standard. Netscape (which was co-founded by the principals at NCSA) and Microsoft continue this practice of supporting additions to HTML before standards committees get around to formalizing updates.
From the book HTML The Definitive Guide (page 8):
HTML is a document-layout and hyperlink-specification language. It defines the syntax and placement of special, embedded directions that aren't displayed by the browser, but tell it how to display the contents of the document, including text, images, and other support media. The language also tells you how to make a document interactive through special hypertext links, which connect your document with other documents -- on either your computer or someone else's, as well as with other Internet resources, like FTP.
It is important to notice that HTML is not a word processing tool, it is a means of describing the structure of documents. By defining the structure of a document it is possible to have a browser decide what is the most appropriate way to render the document contents. Although HTML includes features that are related to page layout (frames, tables, etc), you should keep in mind that HTML meant to reflect document structure more than appearance. It is often possible to tune an HTML document to a particular browser to get a desired effect, only to find out that the page looks completely different when rendered by a different browser (or with a different window width, etc).
HTML documents contain content that is to be displayed and tags that define the structure of the document (and in a few cases to specify formatting instructions). These tags are used by a browser to decide how to display the content, they are not displayed by the browser. HTML documents are simple text files that can be created with any text editor, the tags are just special sequences that are interpreted by the browser. If you want to create a document that includes some content that looks like HTML tags you need to do something special! (more on this later).
HTML tags are always bracketed within a less-than
(<) and greater-than (>)
character. Every tag has a name that indicates to the browser some
information about document structure, and some tags can have
attributes that provide additional information to the browser.
Most HTML tags are used to mark the beginning and end of a region
of a document (for example the beginning and end of a paragraph). These
tags always come in pairs in a document - the leading tag is called the
start tag and the trailing tag is called the end tag.
Both tags have the same name, the only syntactic difference is that the
end tag includes a "/" before the tag name. Here is an example that uses
the P tag to mark the beginning and end of a paragraph:
|
NOTE: In these lecture notes boxes like the one shown above contain HTML code as you would type it using a text editor to create a web page. Boxes inside a narrow dark border show what the HTML code will look like when the browser renders the HTML document.
When rendered by a browser, the above HTML will look like this:
|
Every HTML document should start with the tag Each HTML document includes a head and a body. The head includes
information about the document (possibly the title, author, date of
creation, software used to create the document) and the body contains
the content of the document. There are tags used to identify these
sections:
The head and body tags are actually required by the latest version
of HTML, although most browsers will work fine without them (they
interpret everything as the body of the document). It is possible that
in the future all documents will need to have an explicit head, so
it's best to use the head and body tags on anything you create. Within the document head there is one required header tag -
the These required tags now give us the following document structure:
Most HTML tags can include modifiers called attributes that
provide the browser with additional information about how to render
the document. These attributes are included as Although HTML is a document structing language, there are a few tags that
convey specific formatting information to the browser.
For example, the tag
When rendered by a browser this document might look like this
(notice the word "world" is shown in boldface):
Similarly, the
would look like this:
Many tags are applied to some region of a document, the tags for
bold and italics only effect the document content found between the
start tag ( The bold and italics tags are examples of tags that tell the
browser how to render part of a document. Most of the tags supported
by HTML tell the browser about the contents of a region and let the
browser decide what attribute(s) to apply when rendering text. For
example, the To understand the difference between using an italics tag and an
emphasis tag, consider a blind computer user that has voice generation
software that can read web pages out loud. When this software reads a
section of text that is marked to be in italics the best (correct)
thing the software could do it to state that the next sentence or word
is in italics. For those sections that are tagged as emphasized, the
software could change the pitch of the speaking voice, change to a
different voice, or any other option that the user desires. Keep in
mind that HTML was originally designed to communicate the structure of
documents - using an One final thought about structuring tags (like
Typically the body of an HTML document will include a number of text
elements such as paragraphs, tables and lists. When rendering a
paragraph a browser will wrap each line so that no word is split
between lines - this means that the entire width of the browser window
is used. White space within an HTML document including spaces, tabs
and linefeeds (return characters) is used to delimit words (or tags)
but are not rendered. If you put 2 spaces between words in an HTML
document the browser will ignore this and put a single space (you can
actually put 10 blank lines between words and the browser will still
put a single space between them). To separate individual paragraphs within a document you use the
HTML also support a According to the HTML standard, you should use
However, some people used to use
and here is what this looks like when rendered by the
browser:
This is the second paragraph in this document.
This is the second paragraph in this document.
This is the third paragraph in this document.
By now you aren't even reading the sentences, are you?
According to the standard each paragraph should start with
A number of tags are defined to be used to indicate section
headings within a document. Typically a document contains a number of
sections (chapters), and within each section are subsections, and
within subsections are sub-subsections, and so on. The heading tags
surround some text that is rendered by a browser, typically a section
name (or subsection, etc). The heading tags are
Might be look like this when rendered:
The HTML supports ordered (numbered) and unordered lists. Each list can
include a number of list items, the browser renders these list items
in a way that (hopefully) appears as a list. Unordered lists are contained within the tags
Which looks like this: Dave's favorite cookies:
Here is an ordered list:
Which might look like this: Top 5 reasons to come to class:
HTML supports the display of tabular data using tables. Tables are
also used to manage document layout (probably more often than to
display tabular data). The HTML table model includes three basic
elements - the table (
which will look like this:
The Below is a table that includes some attributes to alter the display
of the table, including borders, background colors and multicolumn
cells. There are other useful attributes that can be used to alter
the display of a table - you can change the spacing between cells, the
alignment of text in the cells, etc. Check any HTML reference for the
details.
which looks like this: Tables are often used to establish the layout for an entire page,
for example to provide a menu on one side of the page and text on the
other. The table below shows an example of this, but you can easily
find better examples by viewing the HTML source of most web pages.
This rather complicated table will end up looking like this:HTML Document Structure
<HTML>
and end with the tag </HTML>, this tells the
browser that this is an HTML document. Although these tags are
required, most browsers work fine without them (although this may
change!).<HEAD> </HEAD>
these tags surround the head of the document and come first
(defore the body tags).<BODY> </BODY>
these tags surround the content of the document.<TITLE>,</TITLE> field. Within the title
tags the document should contain a document title - this title is
typically shown in the title bar of the browser window. Document
titles should convey something useful about the content of the
document.
<HTML>
<HEAD>
<TITLE> Document Title Goes Here </TITLE>
</HEAD>
<BODY>
document body goes here
</BODY>
</HTML>
name=value
pairs within the start tag. For example, a very useful attribute to
the <BODY> tag is the attribute
BGCOLOR. When found in a BODY tag, the
BGCOLOR tag specifies the background color for the entire
document. For example, the tag <BODY BGCOLOR=PINK>
tells the browser that we are starting the body of the document and
that when rendered it should be done with a pink background. There are
lots of different attributes that can be used in the BODY tag
including ones to tell the browser about document margins, default
text color, link color and actions to take when various mouse events
are detected. See any HTML reference for details (for example: The HTML 4.0
specification). Formatting tags: B and I
<B> is used to turn
on boldface. To turn off boldface you use the end tag
</B>. These tags are embedded within the document
content and used by the browser as control information. Here is
part of an HTML document that includes a word in boldface:
Hello <B>World</B>
Hello World
<I>
and </I>
tags are used to indicate italics:
Hello <I>Cruel</I> <B>World</B>
Hello Cruel World
<B> or <I>) and the
end tag ( </B> or </I>). The
start tag turns on some attribute and the end tag turns it off. In
general HTML end tags look just like the corresponding start tags with
the addition of the "/". <EM> & </EM>
tags tell the browser to emphasize a region, although they
don't explicitly tell the browser how to do this. Typically a browser
will put everything between <EM> and
</EM> tags in italics, so (for now) the result is
usually the same as using an italics tag. However, using the italics
tag takes control away from the browser (and the browser
user). Browsers allow users to establish how they would like to view
emphasized text, but anything marked to be rendered in
italics always means use a slanted font. <EM> tag indicates something about
structure (a section is important and should be emphasized), the
<I> tag indicates something about the authors personal
preference as to how a section should look.<EM> ) vs. formatting tags (like
<I> ): It has always been envisioned that the tags
within HTML documents can be used to group and search documents. For
example, a search might look for all emphasized passages in a
collection of web pages - in this case the search would involve only
sections of text between the <EM> and
</EM> tags. Since individual authors might have
different preferences as to how emphasized text should look, if they
each use different formatting tags the search would be impossible
(some might use italics, some might use bold, etc).HTML and text
<P> and </P> tags to surround
each paragraph. You can also use the <BR> tag (line
break) to tell the browser to start a new line (without starting a new
paragraph). NOTE: Unlike the other tags we've seen - the
<BR> tag has no corresponding end tag.<DIV> tag that can be used to
divide the document in to discrete, named sections. The major benefits
of doing this are that it provides an organizational tool for authors,
and that some fancy stuff can be done to apply a style to an
entire division (section) that can include text styles, margins,
colors, etc. <DIV> and <P>
tags with their corresponding end tags. Here is what a correct document
might look like:
<DIV name="section">
<P>This is the first paragraph in this document.
This is the first paragraph in this document.</P>
<P>This is the second paragraph in this document.
This is the second paragraph in this document.</P>
<P>This is the third paragraph in this document.
By now you aren't even reading the sentences, are you?</P>
</DIV>
<P> between
paragraphs and the browsers seem to understand what to do (and still
does). For example, consider the following document body with 3
paragraphs:
This is the first paragraph in this document.
This is the first paragraph in this document.<P>
This is the second paragraph in this document.
This is the second paragraph in this document.<P>
This is the third paragraph in this document.
By now you aren't even reading the sentences, are you?
This is the first paragraph in this document.
This is the first paragraph in this
document. <P> and end with </P>, but the
above style is used in lots of (old)documents that are on the WWW
(mostly older documents). However, I'd suggest using both start and
end tags to make sure your documents will look right in the
future.HTML Headings
<H1>, <H2>,
<H3>, ... <H6>, with
H1 being the highest level heading (usually rendered the
largest) and H6 the lowest level heading. For example -
the following HTML:
<H1>Section 1: The Meaning of Life</H1>
<H2>Section 1.1: Nirvana and RPI</H2>
<H2>Section 1.2: Searching for Truth </H2>
<H3>Section 1.2.1: Using AltaVista in the search for truth</H3>
<H4>Section 1.2.1.1: Combining search terms</H4>
<H3>Section 1.2.2: Using GoTo.com in the search for truth</H3>
<H1>Section 2: The Meaning of HTML</H2>
Section 1: The Meaning of Life
Section 1.1: Nirvana and RPI
Section 1.2: Searching for Truth
Section 1.2.1: Using AltaVista in the search for Truth
Section 1.2.1.1: Combining search terms
Section 1.2.2: Using GoTo.com in the search for truth
Section 2: The Meaning of HTML
<H1>, ... <H6> tags can include a
number of attributes include the ALIGN attribute that
tells the browser how to align the heading. The valid choices are
ALIGN="CENTER", ALIGN="LEFT", ALIGN="RIGHT" and
ALIGN="JUSTIFY". The justify value is not widely
supported by any browser, but centering, left and right alignment work
fine. HTML Lists
<UL> and </UL>. Ordered lists
are contained within the <OL> and
</OL> tags. In both cases each individual list item
is contained within the <LI> and
</LI> tags. Below are a few examples:
Dave's favorite cookies:
<UL>
<LI> Chocolate Chip </LI>
<LI> Chocolate Chocolate Chip </LI>
<LI> Chunky Chocolate Chip </LI>
<LI> Oatmeal </LI>
<LI> Oreo </LI>
</UL>
Top 5 reasons to come to class:
<OL>
<LI> Dave might bring cookies </LI>
<LI> You might learn how to make an HTML list </LI>
<LI> There is nothing on TV from 2:00-4:00 PM</LI>
<LI> You can hide behind a pillar and sleep </LI>
<LI> There might be a test</LI>
</OL>
HTML Tables
<TABLE> and
</TABLE> tags), table row ( <TR>
and </TR> tags) and a table cell (using either
<TH>,</TH> or
<TD>,</TD> tags). The general structure
supported by HTML is shown below, the idea is you build a table from
table rows, and that you build table rows from table cells.
<TABLE>
<TR>
<TD> This is the first cell </TD>
<TD> This is the second cell (still on the first row) </TD>
</TR>
<TR>
<TD> New row! </TD>
<TD> Another cell in the second row </TD>
</TR>
</TABLE>
This is the first cell
This is the second cell (still on the first row)
New row!
Another cell in the second row
<TH> tag is used to table headings (TD
stands for table data, TH for table headings) and simply changes the
default text style used to display the contents of the cell. Using
<TH> is usually remdered in boldface.
<TABLE BORDER=2 BGCOLOR=wheat>
<TR BGCOLOR=WHITE>
<TH colspan=3>Table Attributes</TH>
</TR>
<TR>
<TH>Attribute Name</TH>
<TH>Values</TH>
<TH>Use</TH>
</TR>
<TR>
<TD>BGCOLOR</TD>
<TD><EM>any color name</EM></TD>
<TD>Sets background color</TD>
</TR>
<TR>
<TD>BORDER</TD>
<TD><EM>border width in pixels</EM></TD>
<TD>width of grid lines between cells</TD>
</TR>
<TR>
<TD>CELLPADDING</TD>
<TD><EM>Distances (1pt, 1in)</EM></TD>
<TD>set space between cell edge and cell contents</TD>
</TR>
</TABLE>
Table Attributes
Attribute Name
Values
Use
BGCOLOR
any color name
Sets background color
BORDER
border width in pixels
width of grid lines between cells
CELLPADDING
Distances (1pt, 1in)
set space between cell edge and cell contents
Using Tables for Page Layout
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=10>
<TR BGCOLOR=#808080><TD COLSPAN=3> </TD></TR> <TR>
<TD BGCOLOR=WHEAT VALIGN=TOP>
<TABLE BORDER=0>
<TR ALIGN=CENTER>
<TH BGCOLOR=WHITE>Sites with stock quotes</TH>
</TR>
<TR ALIGN=CENTER>
<TD BGCOLOR=#8080FF>
<A HREF=http://finance.yahoo.com>Yahoo finance</A>
</TD>
</TR>
<TR ALIGN=CENTER>
<TD BGCOLOR=#80FFFF>
<A HREF=http://www.ragingbull.com>Raging Bull</A>
</TD>
</TR>
<TR ALIGN=CENTER>
<TD BGCOLOR=#FF80FF>
<A HREF=http://www.etrade.com>eTrade</A>
</TD>
</TR>
<TR ALIGN=CENTER>
<TD BGCOLOR=#80FF80>
<A HREF=http://www.eschwab.com>Charles Schwab</A>
</TD>
</TR>
</TABLE>
</TD>
<TD>
<P>The sites shown in the menu all contain information about
stocks and provide stock quotes. Some of these sites also
support on-line trading of stocks, although you need to
establish an account before you can start losing money.</P>
<P>This is really just filler text to show that you can use a
table to do page layout. The text in this cell is treated just
as if it was itself a page. You can include anything in a cell
you would put anywhere in the body of an HTML document.</P>
<H3>Here is a heading!</H3>
<P>Did you notice that the other cell of this table contains
a table within the cell? </P>
</TD>
<TD BGCOLOR=#808080> </TD>
</TR>
<TR BGCOLOR=#808080><TD COLSPAN=3> </TD></TR>
</TABLE>
|   | |||||||
|
The sites shown in the menu all contain information about stocks and provide stock quotes. Some of these sites also support on-line trading of stocks, although you need to establish an account before you can start losing money. This is really just filler text to show that you can use a table to do page layout. The text in this cell is treated just as if it was itself a page. You can include anything in a cell you would put anywhere in the body of an HTML document. Here is a heading!Did you notice that the other cell of this table contains a table within the cell? |
  | |||||
|   | |||||||
Creation of a Hyperlink is done with the <A>,</A>
tags. The text between the <A> and </A> tags becomes the
link - when a user clicks on this text the browser open a new
document. The location and name of the new document (the destination
of the link) is included in the <A> tag as the value of the HREF
attribute. This value is specified as a URL. A simple
example:
|
When rendered by a browser:
|
In the example above the destination of the link is the
URL
http://www.cs.rpi.edu/~hollingd/eiw. This
URL is a fully specified URL, since it
includes the specification of a protocol (http), a
hostname (www.cs.rpi.edu) and a resource
(/~hollingd/eiw). If you want to create a link to another
document that is on the same web server as the one that provided the
page containing the link, you can skip the protocol and hostname parts
and use a relative URL. For example, you could use
the /~hollingd/eiw on any page that is stored on the http
server running on www.cs.rpi.edu.
|
If you want to specify the name of a file that is in the same
directory as the current page, you can just use the file name
itself (without any leading "/").
|
In general it is a good idea to use relative URLs whenever possible. This makes it easy to move an entire web site - all the URLs refer to files in the same directory, and are independent of the specific server hosting the web site. We will talk more about this when we look at building web sites.