The traditional networking model is client-server. A particular host is either a server, waiting for connections, or a client, which connects to a server, generally to download documents or other information.
In the past few years, a new model of network services has developed, called Peer-to-Peer (P2P), in which every host is both a client and a server. A particular host may be a client at one point in time, connecting with another host to download information, and a moment later it may be a server, dispensing information (perhaps the same information that it just obtained) to another host.
By far the most commonly used and most widely publicized use of P2P is with file sharing, typically music and video files. Currently such programs use more Internet bandwidth than any other application, including the Web. These protocols were generally developed by individual entrepreneurs, and implementations were widely distributed as freeware. The original P2P file sharing system was Napster, but it was followed by a number of others, including Gnutella, Kazaa, and BitTorrent. Each of these uses a different model.
There are two problems with a client-server model. First, the server is potentially a bottleneck and a failure point. When the server is down or off the network, no one can get service. Second, if the server is dispensing potentially illegal information, it is easy for the authorities to shut it down.
However, P2P has its own problems to overcome. The biggest is locating content. How does a user who wants to download a particular file locate another user who has the file available. This is complicated by the fact that since individual hosts are simply the personal computers of ordinary people, they are often shut down; and if their ISP uses DHCP, their IP address may change.
Napster
Napster was the first widely used P2P application for file sharing. Napster solved the problem of locating content by running large centralized server farms. Whenever a peer running Napster connected to the Internet, the computer would contact the server with a list of files that it was able to share. The central server would store the files and IP addresses. The central servers would periodically send out an "Are you there?" query to each peer to make sure that they were still running and still connected to the Internet.
Napster was set up primarily to distribute MP3s. When a peer wanted to locate a particular song, the peer would contact the server, find out the IP address of a server that had it, and send an HTTP GET request to that peer. That peer would in turn send the MP3 file to the requester.
Note that to a purist, this is not a true P2P application, since it relied on centralized servers. As such, the problems with traditional client server protocols were still there; the servers represented a failure point; they could be a bottleneck; and they could be shut down by the authorities. However, most of the bandwidth is used in distributing the large MP3 files, so in that sense Napster was able to avoid network congestion and bottlenecks, by widely distributing the actual file distribution.
The fact that Napster used centralized servers was its downfall. Because most of the content advertised by Napster was copyrighted material, they were shut down by the justice department. Napster has since been reincarnated as a music distribution system in which users pay, and so they have avoided their legal problems.
Gnutella
The next version of P2P file sharing was Gnutella, originally developed by Justin Frankel. The original Gnutella used a pure P2P model with no centralization at all. The freely downloaded Gnutella client had to know about the existence of at least one other Gnutella client. When a Gnutella client was booted, it would contact one or more other Gnutella clients to advertise its existence and the files that it was offering. Each of these clients in turn knew about the existence of other client.
When a particular peer A wanted to download a particular song S, it would send a query to each of the other Gnutella peers to whom it was directly connected. If one of them had that song available, it would reply. Otherwise, each of these would contact each of its neighbors with a message "Peer A is looking for song S". If one of the second level neighbors had song S, it would reply to A with a message saying "I have song S". At that point A would send an HTTP GET request to the peer that had the song. If none of the neighbors of the neighbors had song S available, they would send a query to each of its neighbors, and so on. This is called Query Flooding
In its simple form, as described above, query flooding would quickly flood the entire network with queries. If you think of the network as a graph, and each node has, say, 10 neighbors, there are roughly 10,000 queries being sent out by the fourth round. If there was no mechanism to squelch queries, the network would quickly be overwhelmed. To solve this, each query had a Time To Live (TTL) field, set by the initial requester, and which was decremented each time it was forwarded. When the TTL field of a query got to zero, the query would no longer be forwarded. This had the effect of limiting traffic, but it also increased the probability that a song would not be found, particularly if it was obscure.
Gnutella also had a bootstrap problem. When a new Gnutella peer joined the Internet, it needed some way of locating other peers. Gnutella maintained sites that would provide a list of peers that would be likely to be up and running. The new peer would then send a ping message to each of these. If a peer received such a ping message it would reply with some information, generally including the number of files it had available, its IP address, and the list of all of its direct neighbors. This reply is called a Pong. The initial peer could then send a Ping to each of the neighbors. Very quickly, it would know about many Gnutella peers and could connect to as many of these as it chose.
Gnutella is an open standard, and so a number of freeware clients are available, for a variety of platforms, each with its own user interface.
Because there is no central server, there is no way that the Gnutella network can easily be shut down, or even monitored. As a result, hosts using the Gnutella protocol are in widespread use.
An anonymous implementation of the pure P2P model called Freenet is under development. In Freenet, everything is encrypted so that even an individual peer does not know what files he/she is sharing, and when a peer downloads a file, he/she does not necessarily know where it is coming from. Files are replicated and continually moving around. It uses key based routing.
It is similar to Gnutella in that a peer initiates a search and sends a query to its neighbors. If its neighbors cannot satisfy the request, they query each of their neighbors, etc until either the file is found or TTL is exceeded. Peers can insert files into the net anonymously as well.
Kazaa
The problem with Gnutella is that, even with the TTL field, query flooding generates a huge volume of network traffic, but in spite of this, many queries fail, even though there are hosts with that file available. A newer protocol, Kazaa, is an attempt to be the best of both Gnutella and Napster. Kazaa's software is proprietary.
Kazaa does not use dedicated server farms, but not all peers are equal. Some peers, presumably the ones with more powerful Computers and faster Internet connections, are designated as group leaders. Ordinary peers connect to a group leader. A typical group leader may have several hundred such children. The group leader maintains a database of all of the files available to each of its children.
Group leaders also maintain connections to other group leaders, and thus they form a graph. When an ordinary peer wants to download a particular song, it sends a query to its group leader. If one of the children of the group leader has that song available, it sends a reply back to the requester, and, like the above two protocols, the requester sends an HTTP GET request directly to the peer which has that file.
This structure solves many of the problems of both a fully centralized database like Napster and a fully distributed system like Gnutella. Because only the group leaders propagate queries, there is far less network traffic; but despite this, a query is more likely to be successful than on Gnutella. On the other hand, there are no central servers; the group leaders are simply ordinary peers that have volunteered to take on group leader responsibilities.
Kazaa also has a mechanism for distributed hash tables, so that the availability of files can be widely distributed. KaZaA is now the most widely used P2P file sharing system on the Internet.
A newer protocol, called Free Net, still under development, focuses primarily on anonymity at the expense of performance.
BitTorrent
Napster, Gnutella, KaZaA, and their clones have been wildly successful for the sharing of MP3s. A new trend is the sharing of Movies; this introduces new technical problems (new legal issues too, since many of the movies available on the web have been pirated, but that is for a different course). The typical MP3 is 3-8 MB while a full length movie is might be 1000MB. Downloads would be slow, even on a fast machine.
BitTorrent is a protocol developed by Bram Cohen to facilitate the sharing of very large files (i.e videos). Suppose someone has a video to share, and ten peers would like to download it. The server would be a major bottleneck. With any of the above protocols other than BitTorrent, the server would send the file first to one peer, then to another, and this would be very slow. To solve this problem, the BitTorrent node breaks the file down into many small pieces (typically each piece is about 250KB). Peer1 might get chunk 32, Peer2 might get chunk 20, Peer3 might get chunk 14, Peer 4 might get chunk 41 etc. After the first round, each peer becomes both a downloader and an uploader. Peer 1 could get chunk 41 directly from Peer 4, Peer 2 could get chunk 32 from Peer 1, etc. At the same time, the server would continue to distribute new pieces to the peers. This is a dramatically faster way of getting large files out to many peers.
This process continues until each peer has the complete file.
The many different peers which are downloading (and uploading) the pieces of the file more or less simultaneously is called a swarm, and the original distributor of the file is called the seed.
Paradoxically, the more popular a file is, the faster each peer can download it because there are more peers in the swarm, and each has some pieces of the file to share.
BitTorrent gives the best download performance to the people who upload the most, a property known as "leech resistance", since it discourages "leeches" from trying to download the file without uploading it to anyone.
There is still the problem of locating files. BitTorrent maintains central servers (called trackers) which coordinate the action of the peers. A tracker only manages connections, it does not have any knowledge of the contents of the files being distributed, and therefore a large number of users can be supported with relatively limited tracker bandwidth.
To share a file using BitTorrent, a user creates a .torrent file, a small "pointer" file that contains:
BitTorrent is open source; there are a number of implementations. There are a number of BitTorrent search engines on the web.
Grid Computing
Grid computing is a variant of P2P. Instead of sharing files, what is shared is computing power. It allows many volunteer computers all over the planet to share in solving extremely computationally intensive problems.
The original project was SETI@home. In this project, volunteers kept their computers on the network, and when they were not in use, they did computation to look for evidence of extra terrestial life.
Other example inlcude protein folding, financial modeling, earthquake and climate simulation, and decryption.
Required Reading
Here is the Wikipedia blurb on Kazaa. It focuses more on Kazaa's legal troubles than on its technology, but makes good reading.
There used to be a very readable description of the bit torrent protocol on the web, but it seems to have disappeared, so you can read the Wikipedia description, which is the best that I can find