CSCI.4210 Operating Systems
Unix system calls and kernel structures for files

This lesson will discuss the Unix system calls that deal with files and some of the kernel structures that implement them. We have already seen three of the most important file system calls.

int open(const char *pathname, int flags, int mode)

ssize_t read(int fildes, void *buf, size_t nbyte)

ssize_t write(int fildes, const void *buf, size_t nbyte)

The open system call returns a file descriptor, a low positive integer which is used as an index to the open file table associated with each process. This file descriptor is then used as the first argument for the read and write operations.

Each open file has associated with it a file offset. This is a pointer to the place in the file where the next read or write will start. When a file is first opened, the file offset is set to the beginning of the file, byte zero. Each time that bytes are read, the file offset is automatically advanced to the next unread byte. For example, if a file contains 1000 bytes, and reads occur in chunks of 100 bytes (i.e. the variable nbyte has the value 100) the first call to read will read in bytes 0 - 99. When this call returns, the file offset will be set to byte 100 so the next read will read bytes 100 - 199, and so on.

The file offset can be changed with the lseek system call. Here is the function prototype.

off_t lseek(int fildes, off_t offset, int whence);
The first argument is the file descriptor, which must be a previously opened file, the second argument is the new offset value, and the third argument is one of three symbolic names. If whence is SEEK_SET, the offset is from the beginning of the file; If whence is SEEK_CUR, the offset is from the current offset, and if whence is SEEK_END, the offset is from the end of the file. Note that the value of offset can be negative, so the last is not a crazy as it seems.

Here are some examples. In all cases assume that the file offset is initially at byte 100.

lseek(fd, 30, SEEK_SET) The next read would start at byte 30

lseek(fd, 30, SEEK_CUR) The next read would start at byte 130

lseek(fd, -30, SEEK_END) The next read would start at the byte 30 from the end of the file.

Note that this applies to writes as well as to reads, and it is possible to have a file with a hole in it, by writing past the end. For example

lseek(fd, 30, SEEK_END)
write(fd, buffer,10)

If lseek is successful, it returns the current file offset, so the following call can be used to get the current offset.

CurrOffset = lseek(fd,0,SEEK_CUR);

The Unix kernel maintains a table of all open files. Each entry contains

A successful call to open creates a new entry in this table. The process file descriptor table has a pointer to this kernel structure.

This can be pictured like this.

Recall that when a fork occurs, the file descriptor table of the parent is copied to the child. This means that two (or more processes) have access to the same open file. This can be diagrammed like this.

Since the file offset pointer is stored in the kernel data structure, either the child or the parent can change this, either with a read or write system call or with an lseek system call. Here is a simple program to demonstrate this. It assumes the existence of a file called Alphabet with contains the 26 uppercase letters in order.

#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
     int fd;
     pid_t pid;
     size_t n;
     char buffer[32];
   
     if ((fd=open("Alphabet",O_RDONLY)) < 0) {
         perror("error opening Alphabet");
         exit(0);
     }
     if ((pid=fork()) < 0) {
         perror("error on fork");
         exit(0);
     }
     if (pid==0) {  /* child */
        sleep(2);
        n=read(fd,buffer,5);
        buffer[n]='\0';
        printf("Child reads %s\n",buffer);
        exit(0);
     }
     else { /* parent */
        n = read(fd,buffer,5);
        buffer[n] = '\0';
        printf("Parent reads %s\n",buffer);
        wait(NULL);
        n = read(fd,buffer,5);
        buffer[n]='\0';
        printf("Parent reads %s\n",buffer); 
    }
     return 0;
}
Note that there are two different processes reading from the same opened file, and that each read from either process updates the file offset in the kernel file structure. Here is the output.

Parent reads ABCDE
Child reads FGHIJ
Parent reads KLMNO

If two processes each open the same file independently, there will be two entries in the kernel file table, each with its own offset value. This would be diagrammed like this.

Here is a short sample program.

#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
     int fd;
     pid_t pid;
     size_t n;
     char buffer[32];
   
     if ((pid=fork()) < 0) {
         perror("error on fork");
         exit(0);
     }
     if (pid==0) {  /* child */
         if ((fd=open("Alphabet",O_RDONLY)) < 0) {
             perror("error opening Alphabet");
             exit(0);
	 }
         sleep(2);
         n=read(fd,buffer,5);
         buffer[n]='\0';
         printf("Child reads %s\n",buffer);
         exit(0);
     }
     else { /* parent */
         if ((fd=open("Alphabet",O_RDONLY)) < 0) {
             perror("error opening Alphabet");
             exit(0);
	 }
         n = read(fd,buffer,5);
         buffer[n] = '\0';
         printf("Parent reads %s\n",buffer);
         wait(NULL);
         n = read(fd,buffer,5);
         buffer[n]='\0';
         printf("Parent reads %s\n",buffer); 
     }
     return 0;
}
The output of this program would be
Parent reads ABCDE
Child reads ABCDE
Parent reads FGHIJ

Links

Recall that you can have multiple hard links to a file; which means that one file (inode) can have two or more different directory entries which refer to it. The command
link existingfile newfile
will do this.

There is a also a system call
int link(char *existingfile, char *newfile);
so that you can do the same thing from inside a program.

You can create a symbolic link with the system call
int symlink(const char *name1, const char *name2);
One important difference between hard links and symbolic links is that hard links must be in the same file system (on the same server) while symbolic links can cross file servers (more on this shortly).

Getting information about a file The Unix system call stat can be used to get information about a file. Here is the function prototype.

    #include <sys/types.h>
    #include <sys/stat.h>

    int stat(const char *path, struct stat *buf);

The first argument is the pathname of a file, the second is a pointer to (address of) a stat structure. Here are the fields of a stat structure.

     mode_t   st_mode;     /* File mode (see mknod(2)) */
     ino_t    st_ino;      /* Inode number */
     dev_t    st_dev;      /* ID of device containing */
                           /* a directory entry for this file */
     dev_t    st_rdev;     /* ID of device */
                           /* This entry is defined only for */
                           /* char special or block special files */
     nlink_t  st_nlink;    /* Number of links */
     uid_t    st_uid;      /* User ID of the file's owner */
     gid_t    st_gid;      /* Group ID of the file's group */
     off_t    st_size;     /* File size in bytes */
     time_t   st_atime;    /* Time of last access */
     time_t   st_mtime;    /* Time of last data modification */
     time_t   st_ctime;    /* Time of last file status change */
                           /* Times measured in seconds since */
                           /* 00:00:00 UTC, Jan. 1, 1970 */
     long     st_blksize;  /* Preferred I/O block size */
     blkcnt_t st_blocks;   /* Number of 512 byte blocks allocated*/

Here is a short program which displays the number of bytes in a file, the name of which is passed in as an argument.

#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  struct stat buf;
  int retval;

  retval = stat(argv[1],&buf);
  if (retval == 0)
    printf("The file %s has %d bytes\n",argv[1],buf.st_size);
  else
    printf("Error on stat\n");
  return 0;
}

Unix Directory System Calls

Unix allows a program to open a directory and read the names of all of the files in it. The call to open a directory is opendir. Here is the function prototype.

     #include <sys/types.h>
     #include <dirent.h>

     DIR *opendir(const char *dirname);
It takes one argument, the pathname of a directory, and it returns a pointer to a DIR. This is an opaque structure similar to a windows handle. The system call readdir reads an entry in a directory. Here is its prototype.
    #include <sys/types.h>
    #include <dirent.h>

    struct dirent *readdir(DIR *dirp);
It takes a pointer to a DIR as an argument, and returns a pointer to a dirent structure. There is only one useful field in a dirent, d_name which is the name of the file. Once you have the name of the file, you can use stat to get whatever other information you may need about that file.

Each time that readdir is called, it returns the dirent of the next file, and when there are no more files, it returns NULL.

Windows File system implementation and APIs

FAT

Modern Microsoft operating systems support several different file systems. The simplest of these is the FAT file system. Recall that FAT stands for File Allocation Table, and the idea behind this is that the system maintains a table for each file which contains the disk addresses of each block.

FAT comes in several flavors, including FAT12, FAT16 and FAT32. A pure file allocation table would have an entry in the table for each sector (512 bytes), but for performance reasons, the disk is broken up into larger units called clusters or allocation units, which contain a number of contiguous sectors, usually in the range of 4 to 64 (2,048 to 32,767 bytes). Each file is allocated a number of clusters; one cluster cannot contain data for more than one file.

A portion of each disk is allocated for the FAT, but this is usually kept in memory for faster access.

FAT12 was a component of early DOS computers. The FAT12 file system used a 12 bit field to hold the cluster number. This meant that a volume using FAT12 could only have 4,086 clusters (12K minus a few for housekeeping). This is hardly ever used today.

FAT16 is still used on many older systems; you might have been able to guess that it uses a 16 bit field for cluster numbers. A FAT16 file system could have 65,526 clusters.

FAT32 is used on the more modern Windows operating systems. It uses a 28 bit field to store cluster numbers (that's not a typo, apparently they thought that the name FAT32 was more aethetically pleasing than FAT28). Since the maximum cluster size that it can support is 32KB, the FAT32 file system can support disks of up to 2^41 bytes, which is more than enough.

There is one FAT for the file system. Directory entries contain that FAT entry of the first cluster of the file. This entry contains the cluster number of the first cluster and the FAT entry for the second cluster of the file. The FAT entry for the second cluster contains the FAT entry for the third cluster, and so on. This allows the operating system to obtain the cluster numbers of all of the clusters for a file without any time consuming disk reads (assuming that the FAT is in memory).

Note that deleting a file on a FAT file system (or on most other file systems) does not mean that the contents are erased. It simply means that the entry in the FAT is deleted. In fact, often the file system does not even delete entries in the FAT table; it simply marks them as deleted. This makes it possible to recover a deleted file (often third party software will have such a utility). Of course over time, the clusters get overwritten and so recovery is no longer possible.

The clusters of a file can be distributed anywhere on the disk. As we saw last week, it is more efficient if the clusters of a file are stored contiguously. The defrag utility goes over the entire disk and stores logically contiguous clusters of each file on physically contiguous clusters. This permits much faster file access.

NTFS

The New Technology File System (NTFS) was introduced as a part of the Windows NT operating system. It incorporated a number of improvements over the various FAT systems, particularly FAT16, which was the standard at the time that NTFS was first developed. These improvements included:

NTFS also uses clusters, but unlike FAT, everything is configurable. Your text describes the architecture of NTFS fairly well (pp 834-844), so I will not cover the implementation details here.

WIN32 APIs for files

There are two WIN32 APIs that you need to traverse a directory.

#include <windows.h>
HANDLE FindFirstFile(LPCTSTR lpFilename, LPWIN32_FIND_DATA lpFindFileData)
This call takes a filename as its first argument. This file name can include wildcards (the asterisk can be used to represent zero or more characters). If there is at least one such file, it returns a search handle, and the information about the first file found is stored in the second argument. The second argument is a pointer to a structure of type WIN32_FIND_DATA. This structure contains a number of fields which have information about a file. The field cFileName contains the name of the file. Here is a link to the online help page for WIN32_FIND_DATA.

To see if there is more than one file which satisfies the condition, use the API

BOOL FindNextFile(HANDLE hFindFile, LPWIN32_FIND_DATA lpFindFileData)

The first argument is the handle returned by FindFirstFile. The second argument has been discussed above. If there is another file which meets the criteria, this function returns true, and the values of the file are stored in the second argument. If there are no more files, it returns false and the values of its second argument are indeterminate.

This is tricky stuff, so here is a short example. This piece of code displays the name and size of all of the .h files in the directory C:\samples\proj1\

#include <windows.h>
#include <string.h>
#include <stdio.h>
int main()
{
     WIN32_FIND_DATA fileinfo;
     char pathname[255];
     HANDLE searchhandle;
     BOOL retval;
     strcpy(pathname,"c:\\samples\\proj1\\*.h");
     searchhandle = FindFirstFile(pathname,&fileinfo);
     if (searchhandle != INVALID_HANDLE_VALUE) {
          printf("%s %d\n",fileinfo.cFileName,fileinfo.nFileSizeLow);
          retval = FindNextFile(searchhandle,&fileinfo);
          while (retval == TRUE) {
               printf("%s %d\n",fileinfo.cFileName,fileinfo.nFileSizeLow);
               retval = FindNextFile(searchhandle,&fileinfo);
          }
     }
     return 0;
}