Network Programming, Class 10
The Domain Name System (DNS), email

Name Servers

The Domain Name System is a mapping between hostnames and IP addresses (used by gethostbyname)

A hostname can be a simple name or a fully qualified domain name (mary-kate, or mary-kate.cs.rpi.edu)

There are two ways to identify a host, by hostname and by IP address.

Essentially all application layer protocols use this.

When the internet was first created, a hosts file was maintained centrally and distributed to everyone on the internet, but this approach did not scale, and required a lot of human intervention.

Every client knows the name of at least one name server.

DNS servers run BIND (Berkeley Internet Name Domain) on port UDP 53.

The function gethostbyname sends a request to the local name server and gets a record back.

Caching - Name servers cache names (this is called nonauthoritative name binding)

Recall that gethostbyname() returns a struct hostent

struct hostent {
    char *h_name;  /* official (canonical) name of host */
    char ** h_aliases; 
    int h_addrtype;  /* host address type AF_INET */
    int h_length; /* length of address (4) */
    char **h_addr_list; /* IP addresses */
};

Every host has an official name, the canonical name, and may have one or more aliases.

Some names can have more than one IP address. For example, www.google.com or www.yahoo.com have multiple IP addresses because if all queries to google or yahoo went to the same address, it could not handle the load. If gethostbyname is called several times for such a host, it will return a complete list of IP addresses each time, but the order will be different, on the assumption that most requestors will start with the first address on the list.

Here is a short program that displays a canonical name and all aliases for its argument.

When this program is run with the argument yahoo, it gets

h_name is www.yahoo.akadns.net
Alias: www.yahoo.com
IPv4 addr 216.109.118.76
IPv4 addr 216.109.118.79
IPv4 addr 216.109.117.204
IPv4 addr 216.109.117.205
IPv4 addr 216.109.118.65
IPv4 addr 216.109.118.70
IPv4 addr 216.109.118.74
IPv4 addr 216.109.118.75

There is also a function

struct hostent * gethostbyaddr(char *addr, socklen_t len, int family);

The first arg is not really a char *, it is an in_addr.

The .NET help page for gethostbyname says that this function has been deprecated, and that you should use getaddrinfo().

Hierarchical name space

A typical url is broken down into fields. Consider this:
mary-kate.cs.rpi.edu.

mary-kate is the name of an individual host, all the other fields are domains.

You can think of names as a tree, with the a root, The root node has a number of top level domains (.edu, .com, .us, .net, .mil, .cn, .fr ). There are more than 250 top level domains (including the recently approved .xxx domain for pornography). Each of these TLDs has a number of children. The .edu domain has children with names like .rpi, .cmu, .mit, .hvcc, .williams. Each of these has its own children. The .rpi server has childen such as .cs .pde, .sis or .poly. These could have their own children. There can be as many as 127 levels, but in practice, a url rarely has more than about five levels.

The Domain Name System is a distributed database. There is no one single point of failure, and there is no one node that knows all of the names.

Suppose a program calls gethostbyname, with the argument mary-kate.cs.rpi.edu. This request is sent to a name server. Theoretically, a name server contacts a root name server. There are 13 root name servers, with the names A through M. You can see a listing of them at The root server organization home page Most of these are run by organizations based in the US, but the majority of root name servers are not in the US. In fact the exactly location of many root name servers is kept secret for security reasons.

The root name server returns the IP address of one or more name servers for the top level domain In our example, it would contact one of the .edu name servers.. The name server contacts one of these, which returns the IP address of a server for the .rpi domain. The name server contacts this server, which returns the IP address of a name server that knows about the cs domain. The name server contacts this server, which returns the IP address of mary-kate.

Names are delegated. Each organization has to get permission for a new domain from the level above. A new college that wanted a .edu domain would have to get the permission of EDUCAUSE, the non-profit organization that handles .edu domain registration.

A company called Network Solutions is in charge of maintaining the .com name list. When you register a domain name, it goes through one of several dozen registrars who work with Network Solutions to add names to the list. Network Solutions, in turn, keeps a central database known as the whois database that contains information about the owner and name servers for each domain. If you go to the whois form, you can find information about any domain currently in existence.

Each of these name servers is actually a cluster of replicated servers. For example, the F root server actually has 37 replicated sites all over the world. Each site has to know the IP address of every name server in its domain (a .edu name server has to know the IP address of the .rpi name server, the .mit name server etc.). Even relatively low level name servers like the rpi name server are replicated, so that if one goes down, hosts at rpi can still be accessed by the Internet.

Every organization with publicly accessible hosts must provide publicly addressible DNS records that map the names of their hosts to IP addresses.

When I described how this system worked, I used the word "theoretically". In fact, all levels do extensive caching. If I want to send a request to google from my laptop, the browser sends a request to the local name server. If another host on the same network has recently made a request to google, google's IP address would be cached on the local name server and it would return the value immediately. Similarly, if someone else had recently requested the IP address of a url in the .edu domain, the local name server would have cached the address of a name server for .edu, and so it would not need to send a query to a root name server to get this.

This means that changes in the database take a while to propagate, and in the interim, can produce wrong answers. Propagation is not the right term; this is not like Routing algorithms.

This introduces the concept of an authoritative reply vs a nonauthoritative reply. Each domain has an authoritative name server, and a reply from an authoritative name server is authoritative. (if you can think of a better way to word that, let me know). A reply from some other name server that had cached the name is not authoritative. RPI operates several name servers for rpi names which are authoritative. The data is replicated across all of them and is supposed to be identical.

Typically an application might maintain its own cache. IE caches DNS records for 30 minutes. This means that if I make two requests to google within 30 minutes, IE does not need to even contact the local name server.

Each record has a time to live field associated with it, a time that it should be removed from the cache to prevent records from becoming stale. This is typically set by the authoritative name server.

There are two ways that requests can be done, iteratively or recursively. In an iterative request, the root level name server sends the address of the top level domain name server back to the requestor, and the requestor (the local name server) sends a query to the TLD name server. This sends the address of the next level name server back to the requestor, who sends a query to the next level name server.

In a recursive request, the root level name server contacts the top level name server directly, the top level name server contacts the next level and so on. Each level passes the answer up the tree and the root returns the answer to the local requestor.

An iterative request generates more network traffic, but puts less burden on the name servers.

DNS records and messages

DNS servers have resource records RRs with four fields (a four-tuple).

Name, Value, Type, TTL

The meaning of name and value depends on type

If type=A, name is a hostname and value is an IP address

If type=NS, name is a domain, and value is the name of the authoritative name server.

If type=CNAME Value is a canonical hostname for the alias hostname

For example, this record
foo.com relay1.bar.foo.com, CNAME, ..
signifies that there is a host with the canonical name relay1.bar.foo.com, but it has an alias foo.com.

if Type=MX value is the canonical name of a mail server

MX records also have priorities associated with them, so that one org can run multiple mail servers.

(foo.com, mail.bar.foo.com, MX, ...)

if type=AAAA value is a 128 bit IPv6 address

if type=PTR this maps IP addresses to host names - the reverse of an A record.

if type is SOA (Start of Authority) - specifying the dns server which provides authoritative infomation about an Internet domain.

A TXT record allows the admin to insert arbitrary text

DNS messages

There are only two types of DNS messages, queries and replies and they have the same format

IdentificationParameter
Number of QuestionsNumber of Answers
Number of AuthorityNumber of Additional
Question Section
Answer Section
Authority Section
Additional Info Section

There is a 12 byte header section, which has the following fields

The question section contains queries for which answers are desired.

The Answer section contains answers to queries. This contains resource records. A query can have multiple resource records, since there can be multiple answers to a query.

The Authority section contains one or more authorities, also a resource record.

There is a utility called dig (the domain information groper) on freebsd which can be used to trace DNS queries. Here is what it returned when I entered dig +trace +norecurse www.on.lt (This is a lithuanian address; I chose it because it was not likely to be in any local cache.) There was an intial query to a root server, a second query to a name server which presumably served up addresses in the .lt domain, and then a response with the actual addess.


 <<>> DiG 9.3.0 <<>> +trace +norecurse www.on.lt
;; global options:  printcmd
.                       8843    IN      NS      F.ROOT-SERVERS.NET.
.                       8843    IN      NS      G.ROOT-SERVERS.NET.
.                       8843    IN      NS      H.ROOT-SERVERS.NET.
.                       8843    IN      NS      I.ROOT-SERVERS.NET.
.                       8843    IN      NS      J.ROOT-SERVERS.NET.
.                       8843    IN      NS      K.ROOT-SERVERS.NET.
.                       8843    IN      NS      L.ROOT-SERVERS.NET.
.                       8843    IN      NS      M.ROOT-SERVERS.NET.
.                       8843    IN      NS      A.ROOT-SERVERS.NET.
.                       8843    IN      NS      B.ROOT-SERVERS.NET.
.                       8843    IN      NS      C.ROOT-SERVERS.NET.
.                       8843    IN      NS      D.ROOT-SERVERS.NET.
.                       8843    IN      NS      E.ROOT-SERVERS.NET.
;; Received 292 bytes from 128.213.1.1#53(128.213.1.1) in 3 ms

lt.                     172800  IN      NS      HYDRA.HELSINKI.FI.
lt.                     172800  IN      NS      NS-LT.RIPE.NET.
lt.                     172800  IN      NS      SUNIC.SUNET.SE.
lt.                     172800  IN      NS      NEMUNAS.SC-UNI.KTU.lt.
lt.                     172800  IN      NS      NN.UNINETT.NO.
lt.                     172800  IN      NS      NS.UU.NET.
lt.                     172800  IN      NS      NS3.OMNITEL.NET.
;; Received 332 bytes from 192.5.5.241#53(F.ROOT-SERVERS.NET) in 80 ms

on.lt.                  7200    IN      NS      ns2.zoneedit.com.
on.lt.                  7200    IN      NS      ns3.zoneedit.com.
on.lt.                  7200    IN      NS      nemunas.sc-uni.ktu.lt.
;; Received 124 bytes from 128.214.4.29#53(HYDRA.HELSINKI.FI) in 109 ms

www.on.lt.              7200    IN      A       212.59.2.76
on.lt.                  7200    IN      NS      ns2.zoneedit.com.
on.lt.                  7200    IN      NS      ns3.zoneedit.com.
on.lt.                  7200    IN      NS      nemunas.sc-uni.ktu.lt.
;; Received 124 bytes from 69.72.158.226#53(ns2.zoneedit.com) in 18 ms

Security Issues

Cache poisoning: Redirecting a query to a site other than the requested site by sending erroneous DNS responses. This can be done by taking over a name server, or beating a legitimate name server in replying to a query.

Cybersquatting Buying names that you have no legitimate right to and then selling them at exhortibant prices to the legitimate businesses. Domain name disputes are typically resolved using the Uniform Domain Name Resolution Policy (UDRP) process developed by the Internet Corporation for Assigned Names and Numbers (ICANN). Critics claim that the UDRP process favors large corporations and that their decisions often go beyond the rules and intent of the dispute resolution policy.

A domain hack is an unconventional domain name that uses parts other than the SLD (second level domain) or third level domain to create the full title of the domain name. An example is examp.le, where "examp" is the SLD and "le" is the TLD (however this is an impossible example as "le" is not an existing TLD). inter.net was the first. Other examples include http://del.icio.us/

The National Acadamy of Science did a study of DNS Signposts in Cyberspace: The Domain name System and Internet Navigation (2005) Here are some conclusions.

Overall, the DNS technical system and institutional framework have performed reliably and effectively during the two decades of the DNS's existence. However, the continued successful operation of the DNS is not assured; many forces, driven by a variety of factors, are challenging the DNS.s future.

Required Reading

The Internet Society has an informative and amusing FAQ on root name servers. Read it

Here is a short explanation of DNS poisoning

Here is an article from the Sept 30 NY Times about ICANN politics

email

There are several kinds of email software. Individual users use a User Agent (UA or mail reader such as Eudora, Pine, Outlook, or Netscape messenger.

These connect with a Mail Transfer Agent (MTA). An MTA server has a mailbox for each user. When a user agent connects to the MTA, it authenticates the user and then sends the mail.

The mail server connects over the internet with other mail servers using a the SMTP Protocol (Simple Mail Transfer Protocol). Note that the mail server is both a client and a server. It has to have a mechanism for temporarily storing mail for unreachable servers (it retries every 30 minutes or so for several days).

Here is RFP 821 SMTP (August, 1982)

At the time, it did not occur to anyone that email would be used for anything other than English text so SMTP only understands 7 bit ascii. So other files such as word documents or images has to be encoded.

One such encoding mechanism for converting binary text to character text is base64 encoding. A 24 bit (3 byte) chunk of binary data is converted converted into four 6-bit chunks using the characters A-Z,a-z,0-9,+,/. This means that the resulting file is larger by a factor of 4:3.

------=_NextPart_000_7103_2e33_71fc
Content-Type: image/jpeg; name=''P9250077.JPG''
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=''P9250077.JPG''

/9j/4AAQSkZJRgABAAEBOgE6AAD//gAfTEVBRCBUZWNobm9sb2dpZXMgSW5j
LiBWMS4wMQD/2wCEAAwICQsJCAwLCgsODQwPEyAUExEREyccHRcgLigwMC0o
LCwzOUk+MzZFNywsQFdARUxOUlNSMT1aYFlQYElQUk8BDQ4OExATJRQUJU80
Here are some SMTP commands

HELO - Identify the SMTP sender to the SMTP receiver. (Obsoleted by RFC 2821)

EHLO - Identify the SMTP sender to the SMTP receiver under Extended SMTP.

MAIL - Set the envelope return path (sender) and clear the list of envelope recipient addresses.

RCPT - Add one address to the list of envelope recipient addresses.

DATA - Consider the lines following the command to be e-mail from the sender.

RSET - Reset the envelope.

NOOP - Ask the receiver to send a valid reply (but specify no other action).

QUIT - Ask the receiver to send a valid reply, and then close the transmission channel.

It is possible to use telnet to connect to an SMTP server and enter some SMTP commands. Here is such a runstream

freebsd4.8 >telnet cliffclavin.cs.rpi.edu 25
Trying 128.213.1.9...
Connected to cliffclavin.cs.rpi.edu (128.213.1.9).
Escape character is '^]'.
220 cliffclavin.cs.rpi.edu ESMTP Sendmail 8.12.10/8.12.10; Mon, 3 Oct 2005 09:04:27 -0400 (EDT)
HELO cs.rpi.edu
250 cliffclavin.cs.rpi.edu Hello ingallsr@ashley.cs.rpi.edu [128.213.7.3], pleased to meet you
MAIL FROM:haydent@cs.rpi.edu
250 2.1.0 haydent@cs.rpi.edu... Sender ok
RCPT TO: ingalr@rpi.edu
250 2.1.5 ingalr@rpi.edu... Recipient ok
DATA
354 Enter mail, end with "." on a line by itself
Subject: This is a test

No message

.
250 2.0.0 j93D4RaA036490 Message accepted for delivery
QUIT
221 2.0.0 cliffclavin.cs.rpi.edu closing connection
Connection closed by foreign host.

When a User Agent wants to send an email, it can use SMTP to connect to the MTA, and MTAs use SMTP to connect to other MTAs when they want to transer an email message from a sender to a receiver. However, UA Blocking Spam Here are some rules for blocking spam

If a mail comes from an unidentified IP address not attached to a domain name, just a number only, then don't accept the mail. All legitimate ISP mail servers have domain names attached.

The simplest and by far most widely deployed authentication scheme begins with a reverse DNS lookup of the connecting IP. If there is no answer, it's a safe bet that the IP is not a legitimate sender. If there is an answer, a forward DNS lookup of that answer authenticates the sender if it returns the connecting IP. In other words, we look up the name of the connecting IP, and look up the IP of that name, and they must match.

If mail comes from a mail server which is listed in one of the RBL's (realtime blackhole list) then we block it with an error message that indicates this. The error will say something like "We do not accept mail from spam friendly ISP's such as China Telecom" or "We can not accept your mail as your mail server is in the published list of open relays."

Look for key words (Viagra, Mortgage)

Vipul's Razor is a distributed, collaborative, spam detection and filtering network. Through user contribution, Razor establishes a distributed and constantly updating catalogue of spam in propagation that is consulted by email clients to filter out known spam. Detection is done with statistical and randomized signatures that efficiently spot mutating spam content. User input is validated through reputation assignments based on consensus on report and revoke assertions which in turn is used for computing confidence values associated with individual signatures.

Spam Assassin is a widely used spam blocking tool, used by the Computer Science Dept. It performs a number of tests on each message and totals a score. If the score is above a certain threshold (set by the user), it is considered to be spam. Here is some output.

X-Spam-Score: 11.497 (***********) HTML_MESSAGE,MIME_HTML_ONLY,PYZOR_CHECK,RCVD_HELO_IP_MISMATCH,
                                   RCVD_IN_XBL,RCVD_NUMERIC_HELO,URIBL_SBL
X-Spam-Report: Spam Report from cliffclavin.cs.rpi.edu:
 3.2 RCVD_HELO_IP_MISMATCH  Received: HELO and IP do not match, but should
 1.3 RCVD_NUMERIC_HELO      Received: contains an IP address used for HELO
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.0 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
 2.8 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
 3.1 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
                                  [86.107.16.2 listed in sbl-xbl.spamhaus.org]
 1.1 URIBL_SBL              Contains an URL listed in the SBL blocklist
	                          [URIs: allsoftoem.net]

Here are some other spam blocking messages

 0.6 NO_REAL_NAME           From: does not include a real name
 3.2 CHARSET_FARAWAY_HEADER A foreign language charset used in headers
 2.0 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP address
                           [58.9.68.10 listed in dnsbl.sorbs.net]
 1.3 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
	                  [Blocked - see ]
 3.4 URIBL_JP_SURBL         Contains an URL listed in the JP SURBL blocklist
	                  [URIs: 457416.com]
 2.5 MIME_CHARSET_FARAWAY   MIME character set indicates foreign language
 1.7 INVALID_MSGID          Message-Id is not valid, according to RFC 2822
 0.9 DATE_IN_PAST_12_24     Date: is 12 to 24 hours before Received: date
 0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay lines
 0.0 HTML_MESSAGE           BODY: HTML included in message
 1.8 RCVD_IN_DSBL           RBL: Received via a relay in list.dsbl.org
                            []
 0.3 RCVD_IN_NJABL_PROXY    RBL: NJABL: sender is an open proxy
			    [219.83.100.19 listed in combined.njabl.org]
 3.1 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
	                    [219.83.100.19 listed in sbl-xbl.spamhaus.org]
 1.1 URIBL_SBL              Contains an URL listed in the SBL blocklist
                            [URIs: ranchasa.com]
 3.4 URIBL_JP_SURBL         Contains an URL listed in the JP SURBL blocklist
                            [URIs: ranchasa.com]
 3.6 URIBL_SC_SURBL         Contains an URL listed in the SC SURBL blocklist
	                    [URIs: ranchasa.com]
 0.8 DRUGS_ERECTILE_OBFU    Obfuscated reference to an erectile drug
 0.0 DRUGS_ERECTILE         Refers to an erectile drug
 0.0 DRUGS_ANXIETY          Refers to an anxiety control drug
 0.2 DIGEST_MULTIPLE        Message hits more than one network digest check
 0.0 FORGED_OUTLOOK_HTML    Outlook can't send HTML message only
 0.0 FORGED_OUTLOOK_TAGS    Outlook can't send HTML in this format
 0.4 DRUGS_DIET             Refers to a diet drug
 1.3 HG_HORMONE             Talks about hormones for human growth
 0.0 DRUGS_ANXIETY_EREC     Refers to both an erectile and an anxiety drug
 0.6 ADVANCE_FEE_2          Appears to be advance fee fraud (Nigerian 419)

Required Reading

Here is a good reference for email and SMPT. Follow each of the links on this page. Make sure you read his page on SMTP