Remote Procedure Calls
The idea behind a remote procedure call is that it looks like a regular function call to the calling program, but it is executed on a different machine.
There are several reasons why a programmer might want to use RPCs
There are some obvious disadvantages to using RPCs.
There are a number of rpc frameworks, these include
Sun RPCs
There are RPC servers and clients. The server is the machine on which the remote procedure runs, and the RPC client is the process, generally on a different machine, which makes the RPC call. Like a thread, a remote procedure call can only pass one argument, and it can only return one argument, so, as with threads, if you want to pass multiple arguments to an RPC, you have to create a struct and pass the struct as an argument.
Each remote program is identified by a unique 32 bit number, and each procedure within the program is identified by a unique 32 bit number. It also supported version control, so you can assign a version as well. One host can run multiple versions of code simultaneously. So a specific remote procedure can be identified by a triple (prog, vers, proc).
The RPC mechanism on the remote machine enforces mutual exclusion to make sure that only one instance of each procedure is running at a time. This is particularly important for RPCs that update databases, because it would be easy to corrupt a database if several procedures were permitted to update the same records simultaneously.
RPCs can use either UDP or TCP. UDP is substantially faster, but it can present problems because it does not assure reliability. This can present problems. Suppose a procedure is called and no response is received, and so it is called again. This means that the same procedure may be called twice. The sun RPC libraries have a simple timeout retransmission strategy, but is not reliable in the strict sense. In practice, most RPCs are done on local area networks, and these tend to be highly reliable, and so this is not a serious issue.
A particular remote program is not at a well-known port. Rather, the client first connects with a port mapper, which is at a well known port (111) on the server, and this tells it which port to use for that program. Each remote program has to register itself with the port mapper, and the port mapper assigns a port number to it. The Sun RPC mechanism hides this users. Users do not need to worry about the Port Mapper.
The RPC system has several levels of authentication. The purpose of these is to prevent unauthorized users from accessing remote procedure calls.
The default is none, which means that anyone can access a remote procedure if they know its address. There is also Unix authentication, which checks for Unix style permissions, but this can be easily subverted. There are higher levels of authentication as well.
Sun rpcs and java rmis work by automatically developing stub routines and skeleton routines. The developer generally starts by writing the client and server (caller and callee) as if they were using ordinary function calls. Once the code is written, tested, and debugged, a stub routine is written for the caller. The stub should have the same signature as the called function, so no changes are required for the client.
The stub routine marshalls the arguments and deals with the networking. It makes the call to the server, gets a response back, and passes the result back to the call.
The server is the mirror image. A skeleton function deals with all of the networking, and calls the function.
Much of the stub and skeleton code is automatically generated by the sun rpc system. Users to not need to deal with the port mapper, or converting data to standard forms with xdr, or any socket stuff at all. This is all handled by run time libraries.
Distributed File Systems
One of the main uses of RPCs today is for distributed file systems. The files themselves are on file servers, powerful computers with huge disk farms attached. This allows users to sit down at any computer on the network and get access to their files, with the illusion that the files reside on their local machine. A typical user cannot easily tell whether the file system is distributed or local, and usually doesn't care (until the file server crashes). All of the file system calls are identical, but the implementation of the file system calls in the kernel has to use RPCs for a distributed file system.
The CS lab runs one version of this, (NFS, the network file system developed by Sun Microsystems), the RPI computer system runs a different protocol, AFS, the Andrew File System developed at CMU.
NFS
NFS was developed by Sun Microsystems, originally for diskless clients, but now a standard. It is a protocol, not a product. This means that anyone can implement it if they wish. Many computer manufacturers use code licensed from Sun, but people are free to write their own implementation of NFS.
A key component of NFS is the concept of a mount. The mount protocol allows the file server to hand out remote access privileges to clients. The file server runs a mount daemon mountd. When a new client is booted, it calls the mount system call, which attaches a specific directory tree to a mount point, which is a node on the client's local file directory tree.
An example will clarify this. Suppose the client has a directory that looks like this:
and the server has a directory that looks like this:
When the client is booted, it makes a call to mount, requesting the server to mount file system D to /C. After this is done, the file system on the client would look like this to a user.
Note that some of the files are local and others are remote. but this is transparent to the user; it looks like a single file system. The user does not need to know where the file actually resides.
NFS servers are dumb and NFS clients are smart. It is the clients that do the work required to convert the generalized file access that servers provide into a file access method that is useful to applications and users.
The server is stateless. A stateless protocol means that each call is independent of every other call.A server should not need to maintain any protocol state information about any of its clients in order to function correctly. Stateless servers have a distinct advantage over stateful servers in the event of a failure. With stateless servers, a client need only retry a request until the server responds; it does not even need to know that the server has crashed, or the network temporarily went down. The client of a stateful server, on the other hand, needs to either detect a server failure and rebuild the server's state when it comes back up, or cause client operations to fail.
When a user calls open, the call has to figure out if the file is local or remote, and if it is remote, the NFS client on that machine has to contact the appropriate server.
Since NFS has to accommodate heterogeneous file systems. (i.e. DOS) the client is the only one that interprets full path names. This may mean multiple NFS queries to resolve a single request For example, the file /a/b/c/d might take four request to resolve. But this means that the server doesn't need to know anything about the client's naming system or directory structure.
Once a client has identified and opened a file, the server gives the client a handle for subsequent reads and writes. This is an opaque data structure that the client uses for future reads and writes to that file.
Note that since the server is stateless, the client has to store file offset info. This means that a call to lseek is completely local.
Performance is an extremely important concern. For this reason, communication between clients and servers uses UDP. Also, when the server is started, it forks off a number of processes at creation so that it does not need to call fork create a thread for each request.
NFS has a function calls to do almost anything that the user might want to do with a file. Here are some examples.
AFS
AFS is a distributed filesystem that enables co-operating hosts (clients and servers) to efficiently share filesystem resources across both local area and wide area networks. It is far more robust and scalable than NFS.
AFS is marketed, maintained, and extended by Transarc Corporation (now owned by IBM), but it is based on a distributed file system originally developed at Carnegie-Mellon University. This was called the Andrew File System, named after both Andrew Carnegie and Andrew Mellon. CMU has developed a newer version called CODA.
Recall that with NFS, different clients could mount the same file
system in different places. AFS has gone to the opposite extreme;
there is one AFS file system for the planet. If you are on an AFS
system such as RCS, the root is /afs. This provides access
to every (or at least many) systems running AFS. (Hint: Don't
type ls -l /afs because it will need to contact each of
the sites in the world to get the information, and this takes a while.
You might want to go to lunch while you wait for this. But you should
try this command.
ls /afs)
This means that if you are on the RPI computer system, you can type
cd /afs/cs.wisc.edu
and it will look as though you are on the University of Wisconsin file system.
AFS files are grouped together in cells. An AFS cell is a collection of servers grouped together administratively and presenting a single, cohesive filesystem. Typically, an AFS cell is a set of hosts that use the same Internet domain name. For example, all of the files in the rpi.edu domain constitute a cell. AFS cells can range from the small (1 server/client) to the massive (with tens of servers and thousands of clients).
AFS client machines run a very efficient Cache Manager process. The Cache Manager maintains information about the identities of the users logged into the machine, finds and requests data on their behalf, and keeps chunks of retrieved files on local disk.
The effect of this is that as soon as a remote file is accessed a chunk of that file (often the whole file) gets copied to local disk and so subsequent accesses are almost as fast as to local disk and considerably faster than a read across the network. Local caching also significantly reduces the amount of network traffic,
Unlike NFS, which makes use of /etc/filesystems (on a client) to map (mount) between a local directory name and a remote filesystem, AFS does its mapping (filename to location) at the server. This has the tremendous advantage of making the served file space location independent.
Location independence means that a user does not need to know which file-server holds the file, the user only needs to know the pathname of a file.
To understand why such location independence is useful, consider having 20 clients and two servers. Let's say you had to move a filesystem "/home" from server a to server b.
Using NFS, you would have to change the /etc/filesystems file on 20 clients and take "/home" off-line while you moved it between servers.
With AFS, you simply move the AFS volume(s) which constitute "/home" between the servers. You do this "on-line" while users are actively using files in "/home" with no disruption to their work.
With location independence comes scalability. An architectural goal of the AFS designers was client/server ratios of 200:1 which has been successfully exceeded at some sites.
AFS files are stored in structures called Volumes. These volumes reside on the disks of the AFS file server machines. Volumes containing frequently accessed data can be read-only replicated on several servers. For example, if there are many users using the C compiler gcc, there can be several instances of it on different servers. Note that a given user does not know anything about this. He or she just types gcc, and the AFS server finds an instance.
AFS (and thus RCS) do not use the standard Unix permission system, and this is a source of confusion, because the Unix file permission bit are settable and visible, but ignored. On a typical Unix system, permissions are done on a file specific basis, but on AFS, permissions are on a directory basis.
The AFS permission system allows the owner of a directory to set four types of permission for that directory, lookup, insert, delete, and administer. Each file has three types of permissions, read, write and lock. Unlike normal Unix, these can be set for specific users, you can give Suzy read privileges for files in a directory for example. You can even give read privileges for everyone except Suzy.
AFS is far more secure than NFS. It uses the Kerberos authentication system, which will be discussed in detail in a later lesson.
Java Remote method invocation (RMI)
The object equivalent of RPCs. It allows a client to access the methods of an object on a remote server as if it were local.
The underlying transport mechanism is hidden from the user. The method can return any primitive java type or serializable java object
Advantage: It can use the full power of java; you can pass java objects as parameters and get java objects returned. It can use subclasses.
Disadvantage: It is java specific
Java-RMI uses a network-based registry program called RMIRegistry to keep track of the distributed objects. (Note: The RMI Registry is an RMI server itself). It is a naming service; it does not handle the actual invocation.
The client needs a stub, and the server needs a skeleton. These are generated by rmic.
CORBA (Common Object Request Broker)
Developed by the Object Management Group (OMG), it allows for clients to access remote objects. Unlike RMI, it is not limited to one language.
The Object Request Broker is the key component. It manages all communication between its components. It allows objects to interact in a heterogeneous, distributed environment, independent of the platforms on which these objects reside and techniques used to implement them.
IIOP, the Internet Inter-Orb Protocol, is a protocol for communication between CORBA ORBs
For each object type, you define an interface in OMG IDL. The interface is the syntax part of the contract that the server object offers to the clients that invoke it. Any client that wants to invoke an operation on the object must use this IDL interface to specify the operation it wants to perform, and to marshal the arguments that it sends. When the invocation reaches the target object, the same interface definition is used there to unmarshal the arguments so that the object can perform the requested operation with them. The interface definition is then used to marshal the results for their trip back, and to unmarshal them when they reach their destination.
The IDL interface definition is independent of programming language, but maps to all of the popular programming languages via OMG standards: OMG has standardized mappings from IDL to C, C++, Java, COBOL, Smalltalk, Ada, Lisp, Python, and IDLscript.
The functions of the ORB are as follows:
There are many implementations, such as Orbix from Iona
RPCs on the web (XML-RPC)
The problem with RPCs on the web has been encoding data (interoperability), but xml solves that problem.
XML-RPC is a Remote Procedure Calling protocol that works over the Internet.
An XML-RPC message is an HTTP-POST request. The body of the request is in XML. A procedure executes on the server and the value it returns is also formatted in XML.
Procedure parameters can be scalars, numbers, strings, dates, etc.; and can also be complex record and list structures.
Here is a trivial example (from the XML-RPC web site
<?xml version="1.0"?>
<methodCall>
<methodName>examples.getStateName</methodName>
<params>
<param>
<value><i4>41</i4></value>
</param>
</params>
</methodCall>
The payload is in XML, a single
If the procedure call has parameters, the methodCall must
contain a params child. This can contain any
number of params.
A response to this might look like this
XML-RPC defines some basic data types
Scalar values like int, boolean, double, string, base64
Also structs and arrays
Note that this is not really anything new. The protocol is http.
Web services are a package of protocols which allow businesses to
locate web services and use web services independent of platform or
language.
Everything is based on XML
The three protocols are SOAP, WSDL, and UDDI
SOAP (Simple Object Access Protocol
A SOAP message is an ordinary XML document containing the following elements:
All the elements above are declared in the default namespace for the
SOAP envelope:
Here is a skeleton SOAP document
The required SOAP Body element contains the actual SOAP message
intended for the ultimate endpoint of the message.
Immediate child elements of the SOAP Body element may be
namespace-qualified. SOAP defines one element inside the Body element
in the default namespace
("http://www.w3.org/2001/12/soap-envelope"). This is the SOAP Fault
element, which is used to indicate error messages.
Here is a soap message to get the price of apples
The body may have a fault element.
Sub Element Description
Holds application specific error information related to the Body element
Here is a complete example
Here is what a response might look like
WSDL Web Services Description Language
An xml document used to describe web services
It has these major elements
The
It defines a web service, the operations that can be performed, and
the messages that are involved.
The
Here is a simple example that gets a definition of a term
Compared to traditional programming, glossaryTerms is a function
library, "getTerm" is a function with "getTermRequest" as the
input parameter and getTermResponse as the return parameter.
portType
It defines a web service, the operations that can
be performed, and the messages that are involved.
Operation Types
The request-response type is the most common operation type, but WSDL defines four types:
One-way The operation can receive a message but will not return a response
The relationship between WSDL and SOAP
For each operation the corresponding SOAP action has to be
defined. You must also specify how the input and output are
encoded. In this case we use "literal".
UDDI (Universal Description, Discovery and Integration) is a
directory service where businesses can register and search for Web
services.
Problems the UDDI specification can help to solve:
UDDI (Universal Description Discovery and Integration)
used for describing, discovering and integrating web
services
At its core, UDDI consists of two parts.
White pages - general info about a specific company
Yellow pages - classification data for the company or service
Green pages - technical info
There are three components to the data architecture
The UDDI data model describes business and web services
The UDDI API is soap based, used for searching uddi data
The UDDI cloud services (registries)
(Microsoft, IBM)
UDDI includes an XML Schema that describes four core types of information:
There are well established business taxonomies to assist this
An example registry is XMethods
Many companies distribute their own soap specs.
One example is The google web apis. These consist of
java classes which can be used to perform google searches
using soap.
The API defines a class GoogleSearch, with methods such as
setQueryString(). The method doSearch() performs
the actual search by sending a soap message to google. The
return type is a GoogleSearchResult
which contains an array of GoogleSearchResultElements.
Here is the xml file
Required Reading:
<?xml version="1.0"?>
<methodResponse>
<params>
<param>
<value><string>South Dakota</string></value>
</param>
</params>
</methodResponse>
Web Services (SOAP, WSDL, UDDI)
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Header>
...
...
</soap:Header>
<soap:Body>
...
...
<soap:Fault>
...
...
</soap:Fault>
</soap:Body>
</soap:Envelope>
<m:GetPrice xmlns:m="http://www.w3schools.com/prices">
<m:Item>Apples</m:Item>
</m:GetPrice>
<faultcode> A code for identifying the fault
<faultstring> A human readable explanation of the fault
<faultactor> Information about who caused the fault to happen
<detail>
POST /InStock HTTP/1.1
Host: www.stock.org
Content-Type: application/soap+xml; charset=utf-8
Content-Length: nnn
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Body xmlns:m="http://www.stock.org/stock">
<m:GetStockPrice>
<m:StockName>IBM</m:StockName>
</m:GetStockPrice>
</soap:Body>
</soap:Envelope>
HTTP/1.1 200 OK
Content-Type: application/soap; charset=utf-8
Content-Length: nnn
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Body xmlns:m="http://www.stock.org/stock">
<m:GetStockPriceResponse>
<m:Price>34.5</m:Price>
</m:GetStockPriceResponse>
</soap:Body>
</soap:Envelope>
<portType> The operations performed by the web service
<message> The messages used by the web service
<types> The data types used by the web service
<binding> The communication protocols used by the web service
<portType> element is the most important WSDL element.
<portType> element can be compared to a function
library (or a module, or a class) in a traditional programming language.
<message name="getTermRequest">
<part name="term" type="xs:string"/>
</message>
<message name="getTermResponse">
<part name="value" type="xs:string"/>
</message>
<portType name="glossaryTerms">
<operation name="getTerm">
<input message="getTermRequest"/>
<output message="getTermResponse"/>
</operation>
</portType>
Request-response The operation can receive a request and will return a response
Solicit-response The operation can send a request and will wait for a response
Notification The operation can send a message but will not wait for a response
<message name="getTermRequest">
<part name="term" type="xs:string"/>
</message>
<message name="getTermResponse">
<part name="value" type="xs:string"/>
</message>
<portType name="glossaryTerms">
<operation name="getTerm">
<input message="getTermRequest"/>
<output message="getTermResponse"/>
</operation>
</portType>
<binding type="glossaryTerms" name="b1">
<soap:binding style="document"
transport="http://schemas.xmlsoap.org/soap/http" />
<operation>
<soap:operation
soapAction="http://example.com/getTerm"/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
</operation>
</binding>
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<ns1:doGoogleSearch xmlns:ns1="urn:GoogleSearch"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<key xsi:type="xsd:string">00000000000000000000000000000000</key>
<q xsi:type="xsd:string">shrdlu winograd maclisp teletype</q>
<start xsi:type="xsd:int">0</start>
<maxResults xsi:type="xsd:int">10</maxResults>
<filter xsi:type="xsd:boolean">true</filter>
<restrict xsi:type="xsd:string"></restrict>
<safeSearch xsi:type="xsd:boolean">false</safeSearch>
<lr xsi:type="xsd:string"></lr>
<ie xsi:type="xsd:string">latin1</ie>
<oe xsi:type="xsd:string">latin1</oe>
</ns1:doGoogleSearch>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
The
W3School Tutorial on SOAP
The
W3School Tutorial on WSDL
The
W3School Tutorial on Web Services