The ABCs of Perl Statements

I chose option two.
The script
Listing A shows the script I wrote. Rather than hammer on
the algorithm and design of the program here, I’m going to walk through the
script as an introduction to the major statements in Perl 5. I wrote the script
on a Windows 98 machine using ActivePerl
from ActiveState. The script took
roughly three hours to write and test, requiring about an hour to get the basics
and the rough algorithm laid out and another two hours to tweak until it
successfully converted the data.
This is a quick-and-dirty script, but it does illustrate the key functions and
constructs of Perl.
$input = “Inventory.txt”; #set $input to
be the name of the delimited file
$output = “>inv.txt”; #set $output to the name of the output file
This first snippet shows how to define a string variable in Perl (e.g.,
$variable_name = “string”;). Perl statements are terminated by a
semicolon (;). Comments begin with a pound sign (#).
open (INV, $input); #open $openme for reading
open (OUTFILE, $output); #open Inv.txt for writing
The open statement is used to open files for reading, writing, or
appending. The first argument in an open statement is the name of the
filehandle that is going to be used as an argument for future statements. (A
filehandle is a data structure through which a Perl script can access a file.)
The second argument is the variable that contains the string of the filename.
When I defined $output, I used > to tell the interpreter to open Inv.txt for
writing. To open a file for appending, use >>.
while(<INV>) { #process the input file
The while statement tests for a condition before executing the code in
braces. It will loop until the condition is false. In this case, the loop is
executed until the end of the file is reached.
$offset =0; #the first 30-character field offset
$TheLine = $_; #assign the current line to $TheLine
The next couple of lines declare and initialize two integer variables we will need in formatting an output string. The second line assigns the current line of the input file to $TheLine. Perl uses $_ as the default input and pattern-matching variable. In this case, it was assigned by the while statement.
chomp($TheLine); #remove the newline character
The chomp statement is useful for removing line-terminating characters,
such as newlines.
@cols = split("\t",$TheLine); #split on tabs
This statement introduces two indispensable facets of Perl. Arrays in Perl begin
with @ and are assigned a list of elements. The split statement takes two
arguments—the character to be split upon and the string to be split.
Split returns a list of elements.
$splitme = $cols[6]; #assign the string in the 7th column to $splitme (the item description)
This statement demonstrates how to point to an element in an array. The array is indexed from 0 to (N-1), where N is the number of elements.
@splitup = split('',$splitme); #turn string into an array of characters
if ($#splitup >= 30) { #if more than 30 characters
The second line here introduces our second conditional expression, the if
statement. The code preceding the if statement is executed if the
condition contained in the if statement's parentheses are met. Perl uses
$#array_name to store the index of the last element in the array.
for($i = 0; $i <= 10="" 10;="" $i++)="" {="" #find="" the="" first="" space="" within="" last="" characters="" <="" font="">
Now I have introduced the for loop. The structure of for loops in
Perl is similar to the for loops found in C, C++, or Java. For
loops follow this syntax: for(initial condition; exit condition; increment
statement).
if ($splitup[(30 - $i)]=~ /\W/) { #check for whitespace
The condition in this if statement is the next new concept we’ll focus on. Here, I have used a regular expression—specifically, pattern matching. This expression will return true if the string contains a whitespace. Pattern-matching conditions are written with the following syntax: $string =~ /pattern/.
The last line in Listing B introduces the join function. Join is basically the opposite of split. Join takes two arguments, the separator and the array of elements. Join returns a string that is constructed by joining the elements of the array and placing the separator between each element.
If you'll look toward the end of Listing C, you'll find my first else statement in the script. The else statement is executed if the if statement’s condition(s) that came before it were not executed.
if ($#splitup <=0 3="" ){="" $newguy="\t\t\t" ;}="" #if="" no="" description="" is="" given="" make="" empty="" fields="">
$cols[6] = $newguy; #insert 3 field description into array
$newline = join ("\t",@cols); #create output string
print OUTFILE $newline . "\n"; #write output string to file.
Now we are finally writing data. The print function is used to print a
string. If no filehandle is given, it will print to standard I/O. So in this
statement, I'm writing to my output file one line at a time as it is created.
You can concatenate strings in Perl using a period.
}#end while
close OUTFILE; #close the output file
Here, the end of the file was reached and we close our output file so that it gets written from memory to disk. The close statement takes one argument, the filehandle it is to close.
Summary
To recap, here are the elements I used in writing the script:
Perl is easy to learn, and there are many places on the Web that offer useful tutorials and documentation. Two great places to start are learn.perl.org and O’Reilly’s perl.com.