| OpSys Fall 2005 - HW1 |
|   OpSys Home   |   Assignment   |   readline   |   Environment Variables   |   regex   |   Requirements   |   Submitting   |   Resources |
- Unix C Program for playing with environment variables
The objectives of this assignment are:
You are to write a program that interacts with the user via a command prompt (your program prompts for a command, reads a command as a line of text and prints out any results). The actual commands you need to support are rather simple:
Below is a sample session. The output of the program is shown in blue, the black text was typed by a human user.
> ./hw1 prompt> hello fred I don't understand that command prompt> set fred = blah prompt> print blah blah = <empty> prompt> print fred fred = blah prompt> print SHELL SHELL = /bin/bash prompt> delete fred prompt> print fred fred = <empty> prompt> set fred = hi there fred I don't understand that command |
You can include a quit command if you want, or from Unix you can simply hit ^D (Ctrl-d) to indicate EOF to the program (and readline will return NULL.
IMPORTANT!: You must be able to print out the value of any existing environment variable, not just the new ones created by the user. Some environment variables you can expect to already have values include: SHELL, PATH, HOME and HOSTNAME. From the Unix shell you can use the command "set" to print out the value of all your environment variables.
- The readline library
Your program must use the GNU readline library to get input from the user. The readline library provides functions that make it possible for the user to scroll back through previous commands, search through previous commands, edit previous commands, etc (readline is used by many programs including the bash shell to handle user input, this is why you can hit up-arrow to recall the previous command entered.).
You can get the details of the functions provided by the readline library by issuing the command "man readline" at the unix prompt. If you don't know how to use the "man" command - try "man man".
Basically the readline library provides a function named (oddly
enough): readline() that will read input from standard
input and allow the user to poke through any history that has been
given to the readline library. readline() returns
char * pointing to the user input, or a NULL pointer
indicating that it has found EOF. The string returned by
readline is null terminated and has been allocated from
the heap - this means you need to free this memory when you are done
with it.
To use the readline library you must tell the linker to include the
readline and curses libraries, this means you need to add
-lreadline -lcurses to your compile line (see below for
an example). On the CS FreeBSD machines you also need to tell the
compiler where to find the readline includes and library, so you need
to add -I/usr/local/include -L/usr/local/lib to the compile line.
The code shown below is a simple example of using the readline
library, including the insertion of each line entered into the
readline history. This program doesn't do anything with each line it
gets, it just shows how to use readline(). This code is
also available here: simprl.c.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <readline/readline.h>
#include <readline/history.h>
/* Simple example of using gnu readline to get lines of input from
a user. Needs to be linked with -lreadline -lcurses
add_history tells the readline library to add the line to it's
internal histiry, so that using up-arrow (or ^p) will allows the user
to see/edit previous lines.
*/
int main(int argc, char **argv) {
char *s;
while (s=readline("prompt> ")) {
add_history(s); /* adds the line to the readline history buffer */
free(s); /* clean up! */
}
return(0);
}
How to compile and test this code:
To build an executable on a CS FreeBSD machine:
gcc -Wall -o simprl -I/usr/local/include -L/usr/local/lib simprl.c -lreadline -lcurses
on many other OSs (including Linux) you don't need the -I or -L options (readline is installed in the usual places so the compiler doesn't need to be told anything special):
gcc -Wall -o simprl simprl.c -lreadline -lcurses
./simprltype in lines of text, verify that you can scroll through previous lines with the arrow keys (or ^P and ^N). Hit ^D on a line by itself to quit.
- Environment Variables
Every Unix process includes an array of strings called the 'environment'. Each of these strings is of the form "name=value" (this is by convention, there is no requirement that every string in the environment be of this form). There are various functions available for accessing these strings by the "name" contained in each one. These name/value pairs are referred to as environment variables. Below is a list of some of the functions you can use to access/manipulate these environment variables:
setenv,putenv: change/set the value of a variable
given the name.getsetenv: get the value of an environment variable
given the name.unsetenv: remove an environment variable.There is also a global variable named environ that can
be used to access the environment strings.
For the details on any of the above, use the unix man command ("man environ" or "man setenv", ...)
- Regular Expressions and the POSIX regex library
Part of this assignment also involves using regular expressions to parse the user input. The idea is to get exposure to a non-trivial library (as well as to learn how to use some very useful functions!).
Regular expressions are used by Unix in lots of places. There are
regular expressions used at the command line to match file names
(something like ls *.c is a simple example), many Unix
commands are based on regular expressions (commands like
sed,
grep, awk, perl and many more).
There are a number of flavors of regular expressions, that is, different languages for specifying complex patterns to be matched. The purpose of this assignment is not to force you to learn all about the languages used to express regular expressions, but rather to use them. I am providing sample code that can be used to parse something very similar to the more complex command required for your homework (the set command you need to support), it is expected that you modify this to handle your set command and can come up with regular expressions to handle the other commands based on the example I've provided. Note that there is lots of information on the web about regular expressions that can help you understand the language of regular expressions.
The sample code below uses the POSIX regular expression handling functions (there are others, including the GNU functions and BSD functions - feel free to use any that you find useful). The code includes lots of comments to get you started, the idea is for the assignment you need to customize this code to handle parsing of commands entered by the user.
NOTE: For this assignment is it certainly not difficult to parse the user input using more traditional means, and we are certainly not claiming that using regular expressions is the best way (in fact this is a rather "heavyweight" approach!). You are required to use regular expressions simply to expose you to them and to get you used to figuring out how/where to get information about C libraries.
You can get lots of information about the regular expression
handling functions by looking at the following man pages:
regex and re_format.
The sample program shown below uses a regular expression to parse
argv[1], the first command line parameter specified by
the user when the code is run. The regular expression used will match
any string that looks like: "set var = value", where var is completely
alphabetic characters, value is alphanumeric characters and the
whitespace can be missing or any number of spaces. If the program
finds that the string entered as argv[1] is matched by
the regular expression, it prints out the variable name and value
(extracted by the regular expression). This code is also available
here: testregex.c, you can build this
program with the following command line:
gcc -Wall -o testregex testregex.c
and then test it by running ./testregex. Note that the
regular expression functions used in the code below are part of
standard libraries, so you don't need to do anything special to tell
the compiler you want to include them. Dave will be
going over this code in some detail during class.
#include <stdio.h>
#include <stdlib.h> /* exit() */
#include <string.h> /* for strncpy() */
#include <sys/types.h> /* needed by regex */
#include <regex.h> /* regular expression library */
/* Sample of using POSIX regular expression library.
This attempts to match a regular expression to the first command line argument.
*/
/* this function extracts the part of a string that was matched by a
regular expression as indicated by the regmatch_t argument. New memory
is allocated for a copy of the matched substring and the new copy is
null terminated. This function returns NULL if the regmatch_t
indicates that no match was made */
char * get_match(regmatch_t m,const char *input) {
char *match=NULL;
int len;
/* if no match specified, return NULL */
if (m.rm_so==-1) {
return(NULL);
}
/* len is the length of the substring that was matched */
len = m.rm_eo-m.rm_so;
/* allocate enough memory for a copy of the resulting substring */
match = (char *) malloc(len + 1);
if (match==NULL) {
fprintf(stderr,"Error allocating memory in get_match\n");
exit(1);
}
/* copy the substring */
strncpy(match,input+m.rm_so,len);
/* null terminate the copy of the substring! */
match[len]=0;
return(match);
}
/* Example of using regular expression library from C.
For details on the regular expression library, try
"man regex"
*/
int main(int argc, char **argv) {
char *s;
int i;
/* here the sample regular expression is defined. For more information
about POSIX regular expressions you can use "man 7 regex" for a
complete description, or google for POSIX regex and get more than
you want...
This regular expression will match any string that looks roughly like
this: "name = value", where name can be anything containing alphabetic
characters and value can be alphanumeric. There can be any number of spaces
between the name and the '=', and between the '=' and the value. There must
be nothing else in the string (or it won't match!). Here are some
strings that will match: "PROMPT = Hello" "Count = 22" "fred=1234joe"
strings that won't match: " noleadingspace = allowed" "123=456"
Here is the breakdown if this regular expression:
^ matches the beginning of the string. This simply forces the
next part of the regular expression to match the first character
(otherwise there could be anything before the first alphabetic char).
[[:alpha:]]+ this matches any sequence of alphabetic characters.
the [[:alpha:]] actually says match one alphabetic character, and the +
means match at least one.
The [[:space:]]* means match any sequence of 0 or more spaces (whitespace).
the * actually means "0 or more".
The = matches '=' (only one).
[[:alnum:]]+ this matches any sequence of alphanumeric characters.
(+ means one or more).
The $ matches the end of the string. This means the string must end in
something that matches the [[:alnum:]]+ right before the $.
The parentheses are special, they don't actually match any characters
in the string, instead they tell the regular expression to "remember" the
part of the string that matched the part of the regular expression that is
in parentheses. This is actually the main reason we are using the
regular expression, we want to know what part of the string matches each
parenthesized section of the regular expression. The first parenthesized
part will be the "name" and the second will be the "value" in "name = value".
We also use the regular expression to find out if the entire string is of
the right form (if not then there will be no matches - we can say the string
is not legal).
*/
const char *regular_expression = "^([[:alpha:]]+)[[:space:]]*=[[:space:]]*([[:alnum:]]+)$";
regex_t pattbuf; /* where the 'compiled' regular expression is stored */
regmatch_t matches[10]; /* where we will get the offsets of all matches */
/* make sure we got a command line argument! */
if (argc<2) {
printf("You must supply and argument (the string to be matched).\n");
printf("For example: %s \"path = hello123\"\n",argv[0]);
exit(1);
}
/* compile the regular expression (POSIX extended regular expression syntax */
if (regcomp(&pattbuf, regular_expression,REG_EXTENDED)) {
/* some problem with the regular expression - this is fatal... */
fprintf(stderr,"Error - pattern won't compile\n");
exit(1);
}
if (REG_NOMATCH == regexec(&pattbuf,argv[1],10,matches,0)) {
printf("No match found - illegal input\n");
} else {
/* some matches found - print them out */
/* first match is for the whole string, we don't care about that one!
remaining matches are for the parts of the regular expression
that are in parentheses */
i=1;
while (s = get_match(matches[i],argv[1])) {
printf("Match %d: <%s>\n",i,s);
free(s);
i++;
}
}
/* free up the compiled regular expression */
regfree(&pattbuf);
return(0);
}
|
- Project Requirements
The following are the requirements for the project:
Your program must compile and run on the CS department FreeBSD
machines (freebsd.remote.cs.rpi.edu).
Your program must use the readline library to get input from the user. The user must be able to scroll back through the previous commands they have entered (since the program was started - no persistence between runs is expected).
Your program must correctly set and display the value of environment variables, including those inherited by your program from the shell (variables like PATH and HOME).
Your submission must include a file named README that includes the following information:
Grading: Grades will be based on the formula below. Note that to get full credit we must be able to understand your code (it must be commented!)
| 30% | Proper handling of environment variables (can be set, deleted and printed) |
|---|---|
| 30% | Use readline properly |
| 20% | Use regular expressions to do at least some parsing (you don't have to use regular expressions for everything, just prove you can use them to do anything useful.) |
| 20% | Code quality (comments, organization, how hard is it to understand ?). |
You can get partial credit for any part (for example if you don't get all the commands working properly).
If you code does not compile and run under FreeBSD on the CS machines, you will lose at least 50% (the remaining 50% partial credit will be awarded based on visual inspection of the code).
- How to Submit
Log in to WebCT at webct.rpi.edu using your RCS id and password. Once you get to MyWebCT click on "Operating Systems", and from there go to the homework drop boxes. Submit your files (individually, zipped or tarred) to the drop box labeled HW1
-Resources
General Unix information (commands, etc):
Unix and C programming:
Libraries
Make