The ABCs of Perl Statements

What should you do when you are tasked with formatting a horrendous amount of data from one application to another? You write a Perl script to do it for you, of course! Learn about the basics of writing Perl statements with this script walk-through.
Written by Howard Shaw, Contributor
Being a hired-gun developer, I inherit all kinds of client problems. Recently, I was tasked with exporting a delimited flat file from one application and formatting it so it could be imported into another application. The data had 3,000 inventory items, and the item description was in a variable-length field that needed to be turned into three 30-character fields. I had two options: use a text editor and edit 3,000 entries or write a Perl script to do it for me.

I chose option two.

The script
Listing A shows the script I wrote. Rather than hammer on the algorithm and design of the program here, I’m going to walk through the script as an introduction to the major statements in Perl 5. I wrote the script on a Windows 98 machine using ActivePerl from ActiveState. The script took roughly three hours to write and test, requiring about an hour to get the basics and the rough algorithm laid out and another two hours to tweak until it successfully converted the data.

This is a quick-and-dirty script, but it does illustrate the key functions and constructs of Perl.
$input = “Inventory.txt”; #set $input to be the name of the delimited file
$output = “>inv.txt”; #set $output to the name of the output file

This first snippet shows how to define a string variable in Perl (e.g., $variable_name = “string”;). Perl statements are terminated by a semicolon (;). Comments begin with a pound sign (#).
open (INV, $input); #open $openme for reading
open (OUTFILE, $output); #open Inv.txt for writing

The open statement is used to open files for reading, writing, or appending. The first argument in an open statement is the name of the filehandle that is going to be used as an argument for future statements. (A filehandle is a data structure through which a Perl script can access a file.) The second argument is the variable that contains the string of the filename. When I defined $output, I used > to tell the interpreter to open Inv.txt for writing. To open a file for appending, use >>.
while(<INV>) { #process the input file

The while statement tests for a condition before executing the code in braces. It will loop until the condition is false. In this case, the loop is executed until the end of the file is reached.
$offset =0; #the first 30-character field offset
$TheLine = $_;  #assign the current line to $TheLine

The next couple of lines declare and initialize two integer variables we will need in formatting an output string. The second line assigns the current line of the input file to $TheLine. Perl uses $_ as the default input and pattern-matching variable. In this case, it was assigned by the while statement.

chomp($TheLine); #remove the newline character

The chomp statement is useful for removing line-terminating characters, such as newlines.
@cols = split("\t",$TheLine); #split on tabs

This statement introduces two indispensable facets of Perl. Arrays in Perl begin with @ and are assigned a list of elements. The split statement takes two arguments—the character to be split upon and the string to be split. Split returns a list of elements.
$splitme = $cols[6]; #assign the string in the 7th column to $splitme (the item description)

This statement demonstrates how to point to an element in an array. The array is indexed from 0 to (N-1), where N is the number of elements.
@splitup = split('',$splitme); #turn string into an array of characters
if ($#splitup >= 30) { #if more than 30 characters

The second line here introduces our second conditional expression, the if statement. The code preceding the if statement is executed if the condition contained in the if statement's parentheses are met. Perl uses $#array_name to store the index of the last element in the array.
for($i = 0; $i <= 10="" 10;="" $i++)="" {="" #find="" the="" first="" space="" within="" last="" characters="" <="" font="">

Now I have introduced the for loop. The structure of for loops in Perl is similar to the for loops found in C, C++, or Java. For loops follow this syntax: for(initial condition; exit condition; increment statement).
if ($splitup[(30 - $i)]=~ /\W/) { #check for whitespace

The condition in this if statement is the next new concept we’ll focus on. Here, I have used a regular expression—specifically, pattern matching. This expression will return true if the string contains a whitespace. Pattern-matching conditions are written with the following syntax: $string =~ /pattern/.

The last line in Listing B introduces the join function. Join is basically the opposite of split. Join takes two arguments, the separator and the array of elements. Join returns a string that is constructed by joining the elements of the array and placing the separator between each element.

If you'll look toward the end of Listing C, you'll find my first else statement in the script. The else statement is executed if the if statement’s condition(s) that came before it were not executed.

if ($#splitup <=0 3="" ){="" $newguy="\t\t\t" ;}="" #if="" no="" description="" is="" given="" make="" empty="" fields=""> $cols[6] = $newguy; #insert 3 field description into array
$newline = join ("\t",@cols); #create output string
print OUTFILE $newline . "\n"; #write output string to file.

Now we are finally writing data. The print function is used to print a string. If no filehandle is given, it will print to standard I/O. So in this statement, I'm writing to my output file one line at a time as it is created. You can concatenate strings in Perl using a period.
}#end while close OUTFILE; #close the output file

Here, the end of the file was reached and we close our output file so that it gets written from memory to disk. The close statement takes one argument, the filehandle it is to close.

To recap, here are the elements I used in writing the script:

  • Variables (scalars and arrays)
  • Flow control (while and for loops, and if/else statements)
  • Regular expressions
  • Functions (chomp, join, and split)
  • File I/O
  • Perl is easy to learn, and there are many places on the Web that offer useful tutorials and documentation. Two great places to start are learn.perl.org and O’Reilly’s perl.com.

    Editorial standards