Ali.as


Perl Guide

1. The Perl language

Among the many programming languages that exist, Perl is a distinctly unique one that comes quite close to the wishful dream of the egg-laying wool milch sow. At least in the case of sheep, nature does not countenance excessive optimization, or, as they say in a faraway land, a sheep can focus on either wool production or meat growth when it eats, but not both at the same time. While many programming languages have a high degree of specialization for particular purposes, Perl was designed with the express purpose of solving many common tasks from a wide variety of fields in a simple, straightforward, casual, and elegant manner.

Perl is popular for complex applications that require the processing of large amounts of textual data, such as the automated generation of HTML files or the management of user data in a large computer network. Because of its runtime characteristics, Perl is less likely to be used for time-critical applications such as managing a rocket launch; nor is it necessarily suited for 3D programming the latest race car simulation. An almost prototypical application is the connection of databases to a WWW interface using CGI, which is disproportionately often implemented in Perl.

Example programs for database access and WWW-based user interfaces can be found in chapter User Interfaces. Perl is therefore possibly among the programming languages with the largest user exposure the most unknown and stands thereby in direct contrast to perhaps the most well-known programming language with however relatively small user exposure, i.e. BASIC.

Perl allows and supports a structuring of programs into reusable modules, which are exchanged by programmers worldwide and can be obtained from a publicly accessible archive.

1.1 What is Perl?

The following explanations of this paragraph are partly based on Tom Christiansen's Perl course (CPAN:/doc/perl_slides.tex. See also chapter about the Perl archive).

is an interpreted language that borrows from C in many aspects, but also integrates elements of sed, awk, sh, Pascal, and other languages. Perl presents powerful functions for manipulating text-like data, but is also capable of handling binary data. Due to its borrowing from well-known languages, Perl's basic features can be learned quickly. Programs can be developed easily with Perl, since Perl does not know an elaborate compiler run, but simply reads the script and executes it or rejects it in case of errors in the script before execution.

The distinction between programming and scripting language is not easy to make in the case of Perl, especially since the boundary is not clearly defined anyway. Rainer Fischbach compares in his article Difficult Demarcation. Von den Job Control Languages bis Perl und Python (iX 12/1999, p. 60 ff.) about 17 different script languages (to which he also counts Perl) and also discusses the difficulties of classification. Tobias Himstedt, Kristian Köhntopp, Frank Pilhofer, Holger Schwichtenberg, Henning Behme and Christian Kirsch use in their article Fast, not dirty. Checking form entry in different script languages (iX 12/1999, p. 72 ff.) the languages JavaScript, Perl, PHP, Python, Tcl and VBScript to write a WWW program with the same functionality. They summarize that the decision for a language often falls rather due to individual previous knowledge than because of the Design.

Perl lends itself to tasks that were previously solved with shell, awk or sed, and works faster and more efficiently than these because many arbitrary limitations (lengths of strings, etc.) do not exist in Perl. Perl adapts dynamically to the data volume and is able to store whole files as one string if the available memory allows it. Furthermore, many problem solutions developed with Perl are highly portable, since Perl is available for a number of hardware platforms and operating systems. An example are the sgml-tools, with which also this text was written.

1.2 Perl as a hybrid of interpreter and compiler

Perl is an interpreting language, but nevertheless surprises by the extraordinary speed of the program flow. This results from Perl's concept of reading in a script and converting it into bytecode before execution, which is not executed until the entire script has been processed without error. This procedure has several advantages. Unlike shell programs, whose execution is started even if they contain errors, a syntactically incorrect Perl program will not be executed. However, this does not prevent damage caused by semantic errors! In addition Perl can issue detailed error messages, which allow a fast isolation of the error. The compiled bytecode runs with a speed that does not give reason for a coffee break even with large amounts of data, since two of the main brakes of classical interpreters are omitted: neither each line of source code must be re-read before each (possibly repeated) program step (file accesses always take time, since here the operating system with its unknown load is claimed), nor must each such re-read line be repeatedly converted into bytecode. The bytecode of a program generated by Perl is normally not accessible, but there are programs and since version 5.005 Perl modules, with which it can be stored and loaded again for the purpose of execution.

1.3 Availability and Modularity

Since Perl is freely available for a large number of operating systems, there are hundreds of sample programs and modules contributed by programmers worldwide and kept available in the Perl archive (see CPAN). It is therefore always worthwhile to search the archive before tackling larger projects. It is very likely that you will find a script or a module there that either solves the posed problem exactly or provides the necessary tools for a simple and elegant solution. The modules available for Perl are presented in the chapter Modules.

1.4 Current Version

The current Perl version is stored in the archive stable.tar.gz of the source code directory of CPAN. At the end of November 1999 the stable version was 5.005_03; development versions, which are not yet stable, are already at or close to 5.006.

1.5 Portings

Judging by the current state of CPAN and its mirrors (see also CPAN and sources of supply) on CD-ROM, Perl is available for at least the following architectures and operating systems:

UNIX variants: AIX, Altos, Apollo, A/UX, BSD/OS, ConvexOS, CX/UX, DC/OS, SINIX, DEC OS/F, DGUX, DYNIX, EP/IX, ESIX, FreeBSD, HP-UX 9, IRIX, Interactive Unix, Linux, LynxOS, MPE/IX, netbsd, NeXT, SCO, Solaris, SunOS, Ultrix, Unicos, etc. Ports to other platforms: Acorn Archimedes (RISCOS), Amiga, AOS, AS/400, Atari, BeOS, Guardian, LynxOS, Mac, HP MPE/ix, MSDOS, IBM MVS (=OS/390), Netware, Plan 9, QNX, VMS, Stratus VOS, Windows 3.1, Windows NT and Windows 95.

Limitations of these ports

Because there are major differences in bit ordering, file systems, function calls, network capability, process communication, process generation, memory management, and preset variable sizes between individual operating systems, not all of the functional features of the original can be found on all platforms.

Without any restriction Perl can be compiled on the following platforms: Amiga, Plan 9, QNX, VMS and of course modern UNIX variants. The full feature set of Perl is not necessarily available on non-UNIX platforms, and some Perl ports are accompanied by libraries with platform-specific support, such as the Win32 port ActivePerl with its associated library.

The discussion and examples in this text always implicitly refer to Perl 5.005 on Linux. As of November 1999, Perl version 5.004 was also still in wide use; however, even older versions should be avoided.

1.6 Sources of supply

Perl is protected by the GNU General Public License and the GNU Artistic License and is therefore freely available. The distribution of modified source code and the distribution of binary code is only allowed if the original sources are included. The exact regulations can be read in the files README and Artistic in the root directory of the Perl source code.

Central Perl Archive CPAN

The central Perl archive on the Internet is called CPAN (Central Perl Archive Network, in analogy to CTAN, Central TeX Archive Network) and consists of a network of mirrored servers with a uniform directory structure. When logging into the CPAN archive, the system automatically determines which server is the closest and redirects the user to it.

CPAN on CD-ROM

Similar to CTAN, there are also regular prints of CPAN on CD-ROM. A well-known product is the CD-ROM Perl, which is available in bookstores and is published annually by Walnut Creek CDROM. The disadvantage of this publication rhythm is, of course, that certain, newest modules may not be included on it.

Perl bundled with other software

Besides the quasi-canonical CD-ROM version with the CPAN content, Perl is included with all Linux distributions as a fully compiled and set up system with a large selection of modules. Anyone who sets up Linux on their computer can assume that Perl is installed. A simple test is to type perl -v at the command prompt. The following output should appear:

It can usually be assumed that the Perl installation included with these distributions is complete in the sense of the Perl source file on CPAN. However, the author of these lines is aware that the Perl version included with the Microsoft NT service pack consists of only a few elementary components.

1.7 Sources of information about Perl

The information available on Perl is very diverse and can be divided into the system documentation (on-line) and the documentation accompanying the system. Both the electronic documentation and the documentation in book form are of extraordinarily high quality, so that as a rule no question remains unanswered.

The overview (man perl) recommends working through this mountain of information in the order given. This sounds as if reading all these manpages is necessary to work successfully with Perl. This is not the case; if you have acquired a basic understanding of Perl syntax (e.g. after studying the perlsyn-manpage and some sample programs), then perlfunc will probably be the most important manpage for your daily work. It documents all of Perl's built-in functions; in addition to the general syntax, there are almost always detailed examples and references to other functions with related purposes. Other manpages, also frequently consulted, are likely to be perlre (regular expressions), perlrun (overview of command-line options), and perlform (declaration of output formats). The content of the other manpages clearly exceeds the limits of an introduction to Perl and partly requires longer familiarization.

The Perl documentation is also available in pod format. The abbreviation stands for ``Plain Old Documentation'' and describes a simple markup language that allows the storage of program code and documentation in one file. The manpage perlpod contains more detailed information. Many modules are shipped with pod documentation; on every working Perl installation you will find the commands pod2html, pod2latex and pod2man, which can be used to convert pod documents into HTML text, LaTeX files or manpages. CPAN:/doc/pod2x/ also offers pod2fm, pod2texinfo and pod2text, which can be used to create FrameMaker, texinfo and ASCII versions of pod documents. All programs mentioned are of course written in Perl.

Perl's FAQ family

The actual Perl FAQ (Frequently A(sk|nswer)ed Questions), last revised in 1997, is available on USENET (see below), at CPAN (doc/FAQs/FAQ/PerlFAQ.html), or in nine parts as perlfaq[1-9]-manpage. In addition, there are further topic-related FAQs for a number of well-known questions, e.g. on CGI programming, on various ports of Perl (DOS, Mac, Windows in different variations); all these texts can be found in CPAN:/doc/FAQs/. The manpage perlfaq2 gives further references to documentation in the CPAN hierarchy.

Canonical Perl books

There are now quite a few dozen books on Perl programming (see the Bibliography chapter), but only three texts that are recognized as canonical texts in the Perl community. These are ``Programming Perl'' (by Larry Wall, the creator of Perl, and Tom Christiansen and Randal L. Schwartz), ``Learning Perl'' by Randal L. Schwartz, and ``The Perl Cookbook'', by Tom Christiansen and Nathan Torkington.

Programming Perl, Second Edition

With the release of Perl 5.000, a new edition of ``Programming Perl'' has been published. The book is the most comprehensive and complete reference to the language and was co-authored by Larry Wall. The perlfunc-manpage is largely congruent with the book section containing Perl functions. The book is indispensable for work, because it contains not only the complete command reference and a list of the most important Perl modules, but also detailed introductions to certain topics (such as nested data structures).

Because of the camel on the book cover, it is also known as the Camel Book. To distinguish it from its predecessor (for Perl 4), it is sometimes called the Blue Camel Book. The predecessor was red.

Learning Perl, Second Edition This introductory book to Perl introduces the major constructs of Perl without scaring the beginner with the detail of the standard reference. This book is often referred to simply as the Lama book in Perl news groups.

The Perl Cookbook

The Cookbook for Perl is excellent for practical work, since it discusses possible solutions to a large number of recurring problems and presents sample solutions in Perl. For example, it contains numerous examples of string handling, regular expression handling, date and time processing, as well as separate chapters devoted to the most important data types and application areas of Perl.

Other Perl books

The distinction between canonical and non-canonical texts does not refer to the quality of further literature, but to the close connection between the origin of Perl and the related canonical documentation, which partly originate from a single source. Other books need not therefore be of lower quality.

Perl in a Nutshell

By Ellen Siever, Stephan Spainhour and Nathan Patwardhan. O'Reilly, 1999. ca. 656 pages. The Nutshell book is good as an introduction to a whole range of Perl-type applications, devoting entire sections to Perl's Win32 support, CGI programming, or graphical user interface. If you are learning Perl because you really just want to write a CGI program or a demonstration program with a graphical user interface, without delving into the depths of X or the Windows API, you may well be happy with this book alone, as it also contains a compact introduction from reference to installation of Perl as well as the basics of the language.

Advanced Perl Programming

By Sriram Srinivasan. O'Reilly, 1997. ca. 404 pages. This book is definitely not for beginners who want an initial orientation to Perl. Rather, several of its chapters continue topics that are only covered in introductory fashion in the other books, or are not mentioned at all. Programmers who want to extend Perl themselves cannot avoid this book.

Algorithms with Perl

By Jon Orwant, Jarkko Hietaniemi, and John Macdonald. O'Reilly, 1999. ca. 684 pages. This book is not a Perl book in the strict sense. It is more of a concise computer science compendium using Perl as a symbolic language. Many books on computer science and programming use a programming language-like metalanguage, but it is not executable in real computers. Similar to the pedagogical intent with which N. Wirth wrote PASCAL, the authors use Perl as the basis, with the great advantage that all examples are immediately executable without further implementation (and the problems that come with it).

This book, despite its different subject focus from Perl, is recommended to all who have discovered Perl as a powerful language for many applications, since the examples help to find Perl idiomatic examples of sometimes startling simplicity.



Content copyright 2002 Ali.as
All rights reserved.