
The Internet is a worldwide network of networks that connects devices.
These devices, called hosts or end systems were always computers until recently, and most still are, but increasingly these end systems are devices such as PDAs, GPS systems, environmental sensing devices, cell phones, webcams, even refrigerators. No one has a good idea of how many such devices are on the Internet at any given time, because devices are added and removed continually, and there is no central authority to register or control these. Current estimates are well over 230 million.
We can talk loosely of three tiers of networks. The outermost tier, tier 3, is the network of a company, a college, a government agency or other organization which has hosts connected to it. These have one or more connections to Internet Service Providers (ISPs), which constitute tier 2. These sometimes have direct connections to other ISPs, but they always connect to one or more Tier 1 providers, the Internet Backbone. These are large telecommunications providers such as Sprint, MCI, Qwest, Cable and Wireless, or AT&T.
All Tier 1 providers connect to all other tier 1 providers, and each is connected to a number of tier 2 ISPs.
Suppose an RPI student wants to order a pizza on the web. He sends a request from his computer in his dorm room. This request is routed through the RPI network, eventually reaching RPI's Edge Router or Gateway, which passes the request on to RPI's ISP, Broadwing. It may take several hops within the Broadwing network, before it is passed on to an Internet backbone provider. It may take several hops inside this network before being passed on to another backbone provider, who in turn passes it on to one of its tier 2 ISP customers. This passes the request on to the pizza web server. The pizza web server sends a reply back to the student's computer in the same fashion, although the reply may not necessarily take the same route though the network.
A typical message may take 15 or more hops to get from its source to its destination, regardless of whether its destination is just down the street or halfway around the world.
Devices are connected to the Internet in many different ways. Many computers are on a local area network (LAN) running Ethernet; some can be connected through a wireless hub; home users can connect to their ISPs with a dial-up connection, a cable connection or DSL, and newer devices connect via satellite.
There has to be agreement between a sender and a receiver on how to communicate. Such an agreement is called a Protocol. A protocol defines the format and order of messages exchanged between communicating entities and actions taken in response to various messages.
Much of this course will be a discussion of the various protocols used on the Internet.
Internet protocols are developed by the Internet Engineering Task Force (IETF). The Standards documents are called Requests for Comments, (RFCs). The first RFC was published in 1969. As of December, 2005, there were 4333 RFCs.
All end user systems and all of the intermediate nodes on the Internet communicate using the Internet Protocol (IP). IP is a packet switching protocol. This concept is fundamental to understanding the Internet.
We can divide networking protocols into two broad categories, connection-oriented and connectionless. In a connection oriented protocol, the two ends of communication contact each other, agree on parameters, and sometimes determine a path through the network and reserve resources prior to any actual data being transmitted. After the communication is done, the reserved resources are freed up. The analogy is a traditional phone call.
A connectionless network is the opposite; messages are sent to the network, and each intermediate node simply passes the message on until it reaches its destination. The analogy here is the post office.
IP is a connectionless protocol. Another name for this is packet switching. Each intermediate node is called an IP Router. A router receives a packet from some other router, looks at the destination address, and passes it on. There are no acknowledgments (at this level).
A sender breaks a large message down into small chunks, called packets. Of course the packets are sent in order, but there is no guarantee that they will arrive at their destination in the same order that they were sent. Packets can get dropped, and different packets to the same destination can take different routes through the network, so it is the responsibility of the receiver to reassemble the packets into a message. (Different packets of a stream taking different routes is a theoretical possibility, but in reality it is extremely rare)
IP is an unreliable protocol; this does not mean that it loses a lot of information (in fact the Internet is now extremely reliable). The term unreliable when applied to a protocol means that it does not guarantee delivery; packets are forwarded on a best effort basis. A reliable protocol is one which acknowledges receipt of messages in some fashion.
There are tradeoffs to using a packet switching network architecture vs a connection-oriented design. A packet switching network can be much more efficient, efficiency defined as the percentage of the available bandwidth that is actually used. With a connection oriented protocol that reserves bandwidth prior to any communication, not only is there overhead associated with setting up and tearing down the communication channel, but during most communication sessions, there is a significant amount of down time when no data is being transferred, but the resources are still reserved and thus are not available for other sessions.
Packet switching uses store and forward transmission. Each link has a set of buffers. It reads a packet from a link, stores it in one of the buffers, figures out where to send it, and passes it on. It is possible under conditions of heavy usage that packets can arrive at a node faster than the router can process them, resulting in buffer overflow and lost packets.
With a virtual circuit system, the actual routing process can be done more quickly than with datagrams.
The concept of a protocol stack underlies much of the study of networking. A protocol stack is a set of protocols that work together to transmit information from one computer to another. The protocols are layered, and each layer has the illusion that it is communicating with the equivalent layer on the other computer, but in fact it is communicating with the layers above it and below it in the stack.
An analogy might be President Bush talking to Vladimir Putin, the President of Russia. President Bush speaks English (sort of), and President Putin speaks Russian (He might speak English as well, but let's pretend that he doesn't). Bush says something in English, looking at President Putin, but in fact he is speaking to a translator. The translator translates what he says into Russian. President Putin replies in Russian, looking at President Bush, but in fact he is speaking to a translator.
The archetype protocol stack is the ISO (International Standards Organization) OSI (Open Systems Interconnection) Reference Model. This is a seven layer protocol stack on which all other protocol stacks are based. No "real world" communication systems actually use this model in its entirety. Here is the OSI-ISO protocol stack
There are so many good descriptions of this model that I am not going to try to describe the seven layers. Here is a good link. You are responsible for this material (including the jokes).
The Wikipedia description of the OSI model
The Internet runs a four level protocol stack.
This diagram shows two computers, labeled Host A and Host B and an Intermediate Router. In practice there would be many intermediate switching element, but only one is shown. An application on Host A wants to communicate with a peer application on Host B. The two applications must speak the same protocol. A typical example would be a web browser which wishes to request a document from a web server in a distant city. The protocol that web browsers and web servers use to communicate is http which stands for HyperText Transmission Protocol.
The Application Layer (the web browser) on Host A has the illusion that it is communicating directly with the server on Host B (pardon the anthropomorphism), but in reality it sends its message to the Transport Layer software on the same computer. In the diagram, the dotted arrow represents the illusion, the solid arrow represents reality.
The Transport Layer is responsible for making sure that complete messages are delivered end to end. This may sound like a trivial problem but it is not, because messages are often broken up into chunks as they are sent over the Internet, and it is possible for these chunks to get lost. Also, they may not arrive in the same order that they are sent.
The Transport Layer Protocol that is generally used on the Internet is TCP, the Transmission Control Protocol. This will be described in more detail below. For the moment, you need to know that the TCP layer on the sending computer establishes a connection with the TCP layer on the receiving computer, and they talk TCP to make sure that the message is received in its entirety and free of errors.
TCP may break a large message into smaller segments. The Segment is the unit of transmission.
The two TCP layers have the illusion that they are talking to each other, but in reality, they communicate with the Network Layer. The only network layer protocol used on the Internet is the Internet Protocol (IP).
There is a second transport layer protocol called User Datagram Protocol (UDP), which is also widely used on the Internet. In contract to TCP, which is connection oriented and reliable, UDP is connectionless and unreliable. UDP is used when speed is of the essence, such as with a file server.
The Network Layer is responsible for routing messages from one place to another. All routers on the Internet run the IP protocol. Each has several possible output lines and it has to figure out which output line to send each packet in order to get it to its destination.
In the Protocol stack diagram above, there is an Intermediate Router. The top two layers, the Application Layer and the Transport Layer, run only on the two end computers but the lower two layers, the network layer and the Physical Layer, run on each intermediate node as well. Recall that although there is only one Intermediate Router shown, there may be many such switching elements between the two hosts.
The bottom layer is the Physical Layer. This is responsible for actually translating the software message into a physical representation and putting them on the wire (or through the air in a wireless network). This is an enormously complex undertaking, but is primarily in the realm of computer engineering rather than computer science, so for this course, we can just assume that there is a physical layer without going into too much detail.
The unit of transmission on the physical layer is the frame.
There are numerous different physical layer protocols, and a message which takes a number of hops on the Internet to get from one host to another will be translated into a number of different physical representations. A typical physical layer protocol is IEEE 802.3, commonly known as Ethernet.
Each layer of the protocol stack on the sender side does its work by attaching a header (and sometimes trailing information as well) to the message which is passed down from the next higher layer in the stack. The Transport Layer receives a message from the application; it attaches a TCP header onto the front and passes this down to the network layer. The network layer appends an IP header onto the front of this and passes it on to the physical layer. The physical layer (Ethernet for example) attaches a header (and a checksum trailer) to this message and sends it to the next switching element.
In theory, the message received at each layer is identical to that sent by the corresponding peer at the other end. (Nit picking readers can find instances where this is not the case, such as the TTL field in the IP header)
The physical layer of the receiver reads the header information, strips the header (and trailer) off and passes the remainder to the network layer. The network layer reads the IP header (ignoring the rest of the message). If this is the final destination of the message, the network layer strips off the IP header and passes the remainder of the message up the stack to the TCP layer. Otherwise, the network layer determines where to send the message for its next hop and passes the message back down the physical layer for its next journey.
The Transport layer at on the receiving host reads the TCP header, strips it off, and passes the message up to the appropriate application process.
There are many different protocols at each level. Here are some representative protocols for the Internet.
| Application Layer | HTTP, telnet, ftp, email, VoIP |
| Transport Layer | TCP, UDP |
| Network Layer | IP |
| Physical Layer | Ethernet, WiFi, ATM, X.25, Frame Relay |
Robert X. Cringley has written an excellent and very amusing history of computing, with wonderful coverage of the development of the Internet. It is called Nerds 2.0.1, and you can read it here.
The concept of a packet switching network was developed by Leonard Kleinrock while he as a graduate student at MIT in the early 1960s. Kleinrock demonstrated that a packet switching network would be feasible for computers to share data. At that time there was no such thing as a computer network (in fact the computers of the time did not have monitors or keyboards either); the only network that existed at the time was the phone system, which was completely connection oriented.
This was the height of the cold war, and the Department of Defense had created the Advanced Research Projects Agency (ARPA, now DARPA). DoD was concerned that a nuclear attack would wipe out the phone system, because it did not have much redundancy built in; if a few key switching centers were destroyed, a large part of the phone system would cease to function.
ARPA wanted to develop a highly redundant, decentralized data network that would not be vulnerable to nuclear attack, and they adopted Kleinrock's notion of a packet switching network. In 1968 they issued an Request for Proposals to build such a network. The contract was won by Bolt Beranek and Newman (BBN), a consulting firm in Boston.
The major players at the time, AT&T, IBM, and Control Data Corporation (CDC) did not bid on the project. AT&T saw this as a threat to their monopoly; they wanted computers to use the phone lines for communication (in spite of the overhead of establishing connections for each data transfer). IBM and CDC thought that it was impossible, or at least too expensive to implement. Others simply did not like the idea of computers sharing data.
BBN designed and built the first computer network based on packet switching. The first site was UCLA.
Here is how Bob Cringley describes the first network communication.
A month later, the second IMP was installed at the Stanford Research Institute (SRI). They had a SDS-940 mainframe computer connected to the IMP, so a different interface was written by the graduate students at Stanford. When it was working they were ready to test the first connection in the ARPANET, so they got on the phone with UCLA and coordinated the login."Did you get the L?" Charlie Klein, an undergraduate at UCLA, asked. "Yes," came the answer from Stanford. "Did you get the O?" asked UCLA. "Yes," answered Stanford. When Klein typed 'G' another first occurred - the network crashed.
There were originally four sites, UCLA, SRI, UC Santa Barbara, and the University of Utah. Other sites were added later. By 1980 there were about 200 sites. At that time what is now the Internet was called the ARPAnet. The term Internet as a network of networks was coined by Vinton Cerf and Robert Kahn in 1974. The original routers were called Interface Message Processors (IMPs), initially each host had its own IMP, since an individual computer could not run the protocols of the time.
The initial networking protocol was the Network Control Protocol (NCP), but by the late 1970s the major protocols, IP, TCP, and UDP had been developed. TCP/IP received a big boost when it was incorporated into Berkeley Unix. This was a widely distributed operating system developed primarily by Bill Joy, one of the founders of Sun Microsystems. Berkeley Unix was also the first operating system to use the now standard socket interface.
On January 1, 1983 TCP/IP became the standard; all hosts and routers on the Internet had to use it.
The original uses of the Internet were remote login through the rlogin and telnet protocols, and file transfer through the ftp protocol.
In 1972, email, the first Internet killer app was developed by Ray Tomlinson (RPI '63), also at BBN. He is credited with choosing the @ sign to delimit the user name from the network name.
The early inklings of the web were developed during the 1980s. A protocol called gopher was developed at the University of Minnesota What we now know as the World Wide Web was invented at CERN by Tim Berners-Lee in 1989. He developed HTTP, HTML, and wrote the first web server and web browser. In 1992 there were about 200 web servers deployed
In 1994 Marc Andreesen and Jim Clark formed Mosaic communications, this later became Netscape. The mosaic browser was distributed freely, and this kicked off the WWW revolution. In 1996 Microsoft developed their own browser, Internet Explorer, and bundled it with their operating systems. This triggered the infamous browser wars and a major court fight. By the time it was over, the issue was moot because Netscape was out of business.
The Internet is one of the most successful technologies in history. Other revolutionary technologies, such as the printing press, the telephone, or the automobile, took much longer to become widely established than the Internet.
Any new computer technology needs a killer application, an application which is so vital that everyone has to use it. Spreadsheets and word processing were the killer apps of the personal computer. There were four killer apps which lead to the current ubiquity of the Internet.
However, the Internet is beginning to show its age. By far the most serious problem with the current technology is security. The web was developed to be used by academic researchers, and security and privacy issues were ignored in the first generation protocols. Security is now a major concern of course, but measures to address it have to be added on to insecure protocols.
A second issue is Quality of Service (QoS). The Internet was not developed to serve real time applications. If a file download or an email is delayed for a second in the network, no one will know or care. But if a video is being downloaded and viewed in real time, or if two people are using the Internet to talk to each other on the phone, a one second delay is fatal. There needs to be a mechanism by which routers can give priority to such real time applications. Although the IP header has a quality of service field, so this is theoretically possible, this field is ignored by most Internet routers.
Most observers agree that real time applications like Voice over IP (VoIP or Internet Telephone) or streaming video will be the next killer app for the Internet, but this can only happen if the QoS problem is addressed.
There are several ongoing political issues around the Internet. The Internet Corporation for Assigned Names and Numbers (ICANN), the nonprofit Internet naming service, is dominated by the United States, and other countries resent this. The European Union has called for an International governing body to assume this function.
The Internet has always been free. This has rankled some ISPs, particularly those that also provide telephone service. Madison River, a small telecommunications company that also provided Internet service, blocked its Internet users from using VoIP. They were quickly ordered to cease this practice by the FCC, although it was not clear that they were breaking any laws, but there are hints that such practices may re-occur.
Introduction to Internet Architecture and Institutions
The Wikipedia description of the OSI model
Pioneers of the Internet: How Rensselaer Alumni helped change the way the world communicates