EIW Fall 2004 Lecture Notes

Perl Introduction


Our Focus

We will explore a small subset of the perl language, specifically the parts we need to write CGI programs. For anyone who wants to learn about the rest of perl there are many books and WWW resources that describe the entire language.

Perl History

Perl was developed by Larry Wall as a replacement for AWK (a text processing language available on Unix systems since the beginning of time). Although Perl has been around for a while (10+ years ?), widespread use of the language coincided with the growth of The Internet. Perl has changed over the years, not just in the base of available code/libraries, but the language itself has changed to keep up with the needs of programmers.

Originally Perl was developed as a tool for report generation, the name PERL stands for "Practical Extraction and Report Language". Perl can now do lots more, the addition of systems programming facilities in the language means that perl can be used to develop applications that require access to system services (including network applications).

Compiled vs. Interpreted Languages

With an interpreted language such as (traditional) Basic, a program (a text file with a series of commands) is read by an interpreter program and the interpreter parses each command and acts accordingly. C and C++ are compiled which means that a program called a compiler converts text files containing C/C++ programs in to a form that is executable by the processor directly (into machine language). Perl is actually somewhere between an interpreted and compiled language - a perl script is completely parsed and compiled before any statements are executed. This means that there is no compiled machine code that must be stored in a file (as with a compiled language), although it also means that there is some delay while the compilation takes place. Once the compilation phase is complete the resulting code is executed - which means that the script will run fast (there is no need for an interpreter to parse each expression once the compilation is done). Perl does support the evaluation of code that is created at run time (statements in the perl script build new chunks of perl code and ask for them to be evaluated), so the perl system includes facilities for compiling new expressions once execution of the script has started.

Although perl is not a true interpreted language, the perl executable is often referred to as "the perl interpreter".

A simple Perl Script

Below is a simple perl script:

print("Hello World\n");

The perl print function sends a string to STDOUT. Although the above example shows the argument inside parenthesis, in perl you can omit the parenthesis when calling functions, so this is also acceptable:

print "Hello World\n";

Running a Perl Script

To run a perl script you need to get and install the perl distribution on your PC. The ActiveState Perl distribution for Windows is the latest and most complete port of perl to Windows (there are other ports). You need to know where the perl program itself is installed, chances are it is something like C:\PERL\BIN\PERL. You may want to add the path to the perl executable to your PATH so you don't have to type the full name each time - I did this by adding C:\PERL\BIN to the PATH set in the file C:\autoexec.bat. Windows NT/2000 users can add perl to the path using the system administration control in the control panel.

To run a perl script that you've saved in the file "foo.pl" you would type the following at the DOS command line: perl foo.pl or possibly c:\perl\bin\perl foo.pl if you didn't change your DOS PATH. This will start the perl interpreter (the program perl.exe) and give it your script as input. If there are syntax problems with your script, perl will print out some error messages and quit. If your script is OK, perl will go ahead and run it, and any print statements will result in output sent to the screen. Below is an example of running the perl script show above (at a DOS prompt):

C:\DLH\IT\units\5-Perl>perl foo.pl
Hello World

Perl Variables - Scalars

The simplest type of perl variable is a scalar. A scalar variable can hold a single value, although it can hold any kind of value. In other programming languages there are different kinds of variables for holding integers, floating point numbers, and strings - in perl there is just one kind of variable and it can hold any of these.

Every variable has a name that is made up from alphanumeric characters (you can also use the underscore character '_'). Additionally, scalar variables all start with the '$' character. The following are valid perl scalar variables: $foo, $foo_blah, $foofoofoofoofoofoofoofoofoo.

Assigning a value to a scalar variable in perl looks just like C/C++, here are some examples:

$pi = 3.141593;
$foo = "foo";
$Foo = 27;

Like C/C++ each statement ends with a semi-colon. Unlike C/C++, you can assign a string constant to a variable. Remember that perl variables are not associated with any specific data type - you can assign numbers or strings to any scalar variable.

There are other kinds of perl variables, for example there are list variables that are like arrays in C/C++. While scalar variables can hold a single value, lists variables hold multiple values. Whenever we refer to a perl scalar we are talking about a simple constant or variable that can hold a single value (numeric or string).

Constants (Literals)

We've already seen some Perl constants, for example the number 3.141593 and the string "foo" are scalar constants. Perl supports the same notation for floating point constants as C/C++, so you can have constants that look like 6.02E23 and -1.5E-3. Integer constants are kinda obvious, although you have to make sure you don't start an integer constant with a 0, as perl takes this as a signal that the constant is represented in octal (base 8) or hexadecimal (base 16). If you don't know about octal or hexadecimal representation don't worry, just don't ever start an integer constant with a 0.

String constants can be enclosed in either single or double quotes, so these are both string constants: 'I am a string' and "I am a string in double quotes". However, if you are a C/C++ programmer and used to embedding stuff like newlines \n or tabs \t in string constants you need to use double quotes, since perl doesn't interpret backslash as anything special inside a singly quoted string. For example, the following string constants are different:

'Hello\n' "Hello\n"

The first string (in single quotes) has 6 characters, the last two characters are '\' and 'n'. The second string has 5 characters, the last being a newline represented by \n. The following table shows some of the special backslash escaped characters recognized by perl (in double quoted strings):

\n Newline
\r Return
\t Tab
\a Bell
\\ Backslash
\" Double quote

Perl Mathematical Operators

Perl supports the usual set of mathematical operators so you can do stuff like this:

$y = $m * $x + $b;
$radians = $degrees * (3.141593/180.0);
$seconds_per_year = 365 * 24 * 60 * 60;

In addition to the operators +, -, / and *, perl supports the ** exponentiation operator (just like Fortran), so the expression $y**2 is $y to the 2nd power and 10**1.87 is 10 to the power 1.87.

Perl also supports the %modulo operator just like C/C++

Perl String Operators

Concatenation

Perl supports a string concatenation operator that combines two strings. The symbol used for this operator is a single period (.). Here are some examples that show this operator in action:

$myname = "Dave" . " " . "Hollinger";
$myname = $first . $blank . $last;

Repetition

Perl also supports a string repetition operator. The symbol used for the repetition operator is x (the letter x). The string to be repeated is on the left of the operator and an integer repetition count is on the right, as in the following examples:

ExpressionValue
"M" x 4"MMMM"
"Hello" x 2"HelloHello"
"joe" x (5-2)"joejoejoe"

Perl Comparison Operators

Perl supports the typical set of comparison operators, although it supports both numeric and string comparisons. Since scalar data can be either string or numeric, you have to tell perl whether to use a numeric or a string comparison operator! All the comparison operators result in a value of True or False (more on how perl represents this later). The following table shows both sets of comparison operators:

ComparisonNumeric
Operator
String
Operator
Equal == eq
Not Equal != ne
Less than < lt
Greater than > gt
Less than or equal to <= le
Greater than or equal to >= ge

String Comparison Operators

When comparing strings, perl uses the ASCII value of each character as the basis of comparison. So the first character of each string is compared, and if they are different the string whose first character has a greater ASCII value than the other is "greater than". You don't really need to know the ASCII value of each character to understand this - since 'a' is less than 'b' is less than 'c' (and so on).

NOTE: '0' is less than '1' is less than '2', ...     This means that "1876" is less than "4"

Operator Precedence and Associativity

Perl numeric and comparison operators follow the same rules as in C/C++. The string concatenation operator . has higher precedence than the repetition operator x and both are left associative. Parentheses can be used to force the order of evaluation (as in C/C++).

The Assignment Operator

We have already seen how to assign a value to a scalar variable using the = assignment operator. In addition to changing the value of a variable, an expression involving the assignment operator itself has a value. So, just like the expression 2+3 has the value 5, the expression $x = 2 + 3 has a value. The value of an assignment expression is a reference to the variable assigned a new value. This makes it possible to do things like this:

$x = $y = $z = 0.0;
$x = ($y = $z + 2);

Assignment is right-associative, so the first example above is the same as $x = ($y = ($z = 0.0)).

Data conversion

Perl automatically converts values (variables or constants) between numeric and string depending on the context. For example, if an expression tries to apply the numeric addition operator + to a string the string will first be converted to a number. The following example expressions involve automatic data conversions:

2 * "3.141593" The string is converted to a number before the multiplication
(117 lt 23) Both numbers are converted to strings (the string comparison operator forces this). The result is true, since "117" lt "23" (when compared as strings).

When converting from a string to a number, perl ignores any leading whitespace and any trailing non-numeric stuff is ignored. For example, the string " 0.35HiJoe" would be converted to the number 0.35. If there is nothing in the string that looks like a number, the conversion results in the value 0. For example, the expression 12 * "HiDave" has the value 0 since "HiDave" would result in the value 0.

Potential Troubles

Since perl does this data conversion automatically, and doesn't warn you that it's doing anything - you can easily make simple mistakes that are very hard to find. Here is an illustration of a common mistake:

 
$x = "GET";
if ($x == "POST") {
	# handle a CGI POST method here...

Since the mathematical comparison operator == is used in the above expression, each of the strings are converted to numbers, and both have the numeric value 0, so they are (numerically) equal. The author of the above code probably wanted the following:

 
$x = "GET";
if ($x eq "POST") {
	# handle a CGI POST method here...

This code uses the string comparison operator eq.

Exercises

What is the value of each of the following expressions?

  1. "HelloWorld" x "1"
  2. 17 + "13"
  3. 17 + "thirteen"
  4. ("Senator" . " " . "Hillary") == "Senator Hillary"
  5. ("Senator" . " " . "Hillary") eq "Senator Hillary"
  6. "987654321" gt "9871654321"
  7. 10 x 2
  8. 10 . 2

Variable Interpolation in Double Quoted Strings

We've already seen that inside double quoted strings perl interprets some character sequences as special - for example the sequence "\n" means a newline. Perl also does variable interpolation of scalar variables inside doubly quoted strings (and not inside singly quoted strings). Interpolation means that the occurrence of the variable name is replace by the value of the variable. For example, suppose we have a variable named $college that currently has the value "RPI". The string "I go to $college" would become "I go to RPI" since perl finds the variable $college inside a doubly quoted string.

Potential Problem with variable interpolation

Suppose I have the following variables and corresponding values:

Variable NameCurrent Value
$num17
$number23

What does perl do with the string "The result is $number"?

Perl uses the longest variable name it can match - so in the above case it would use the variable $number and not the variable $num. The resulting string would therefore be "The result is 23". We could force perl to use the variable $num by putting curly braces around the variable name: "The result is ${num}ber", in this case the resulting string would be "The result is 17ber".

If you want to create a string that has a $ before something that might be a variable name you must do something to tell perl not to do variable interpolation. For example, suppose you want to print out the string "The variable name is $foo" and want exactly that string, not the string that would result from variable interpolation. There are a couple of ways to deal with this problem:

Reading from STDIN

You can read a scalar value from standard input (typically the terminal in which you are running your perl program) using the cryptic notation <STDIN>. You can put this statement anywhere in an expression where it would be valid to put a scalar constant or variable reference. Some Examples:

$foo = <STDIN>;
$x = 17 + <STDIN>;

Perl reads an entire line of input (stops reading once it hits a newline) each time it sees <STDIN> in a script. If there is no input available the perl program will wait for a complete line (it will also stop trying to read once it sees an End of File marker). If you have a perl script like this:

print "Enter your age\n";
$age = <STDIN>;
print "Enter your weight\n";
$weight = <STDIN>;

and the user types the line 17 200 the variable $age will get the value "17 200" and the program will wait for another line to assign a value to the variable $weight.

<STDIN> newlines and chop

The <STDIN> input operator returns an entire line including the newline character. If you are reading a number this doesn't matter since when interpreting the string as a number perl will ignore any trailing stuff in the string that is not numeric. So this will work:

print "Enter your age\n";
$age = <STDIN>;
print "In 20 years you will be ";
print $age+20;
print " years old\n";

Assuming the user type a single numeric value followed by a newline (pressing the Enter key) this script will do what we want. The reason that $age+20 works is that perl converts the string assigned to the variable $age to a number when it sees the mathematical plus operator.

When reading strings we often need to get rid of the newline that perl leaves for us. The chop function will chop the last character of a string off and throw it away - leaving everything except the last character. If we don't use chop in the following example we won't get what we want:

print "Enter your first name\n";
$fname = <STDIN>;
print "Enter your last name\n";
$lname = <STDIN>;
print "You are $fname $lname\n";

If the user types Dave and presses Enter, then Hollinger and presses enter the output of the program will look like this: (user input shown in italics)

Enter your first name
Dave
Enter your last name;
Hollinger
You are Dave
Hollinger

We can use chop to get rid of the newlines and get what we want:

print "Enter your first name\n";
$fname = <STDIN>;
chop $fname;
print "Enter your last name\n";
chop($fname);
$lname = <STDIN>;
print "You are $fname $lname\n";

This would produce the following:

Enter your first name
Dave
Enter your last name;
Hollinger
You are Dave Hollinger

You can also use something like chop($name=<STDIN>).

Perl Printf (for C programmers)

The perl printf function is very similar to C printf function - it creates some output string and sends it to STDOUT. All the printf formatting options from C are available in perl, so you can do stuff like this:

printf("Hello %s World \%d \%f\n","cruel",11,1.5);