=head1 NAME

interperl - Intermediate Perl for Sysadmins

=head1 DESCRIPTION

This is an intermediate level training document on Perl that describes
perl constructs and algorithms to improve programmer efficiency.

=head1 Introduction

Perl is a free programming language created by Larry Wall and maintained 
by a global group of thousands of open source volunteers.

Perl has been called I<the duct tape of the Internet> and will likely
forever be so. In the words of it's creator, perl 
makes I<easy things easy, and hard things possible>. It is a rich
language that helps you program all manners of sysadmin tasks quickly,
scale/grow them and maintain them well though their lifetime.

=head2 Objective

The goal of this document is to introduce some intermediate level concepts
in perl for working system administrators. By practicing the concepts
described in this document, you will be able to B<start thinking in perl>,
and would be I<able to design the right datastructures and algorithms to
get the maximum programming simplicity>.

NOTE: B<this is not a document about efficiency of programs. It is a
document about how to write programs that are easily maintained>.


=head2 Organization of this document

This material assumes the reader is familiar with I<basics of perl> which
is available in the companion document B<Perl Basics for Sysadmins>.

Readers are I<assumed to have a basic understanding of perl datastructures,
basic unix system calls, and rudimentary exposure to regular expressions>.

Writers are assumed to have atleast 1 year of programming experience in
atleast 2 different languages, one of which is an intrepreted language
like B<perl> or B<shell>.

=head2 Additional pointers for learning

This document is not a substitute for programming, nor does it
substitute the documentation that comes with perl! The more you code, 
the better you can program. What is not obvious is that the more 
you read, the smaller your programs need to be to get the same work done.

It is B<mandatory> that you read the perl documentation
available on your system. At the very least, you should try to read all
the manual pages mentioned in this document. Reasonably competent system
administrators can implement 90% of their regular tasks with minor
modifications to the program snippets available in the core documentation
that comes with perl.  L<"Resources"> section gives you details on 
where to look for complete and authoritative information.

=head2 Why Perl?

Perl is designed to be like C: flexible and powerful enough
to manipulate the machine's capabilities directly. Perl is also designed
to be like B<sh, sed, awk>: I<creating complex datastructures with ease,
and prototyping solutions very quickly>.

=head2 TMTOWTDI

Most programming languages have a minimalist set of constructs 
(succumbing to I<orthogonality> of design). There is usually
I<one way> to do a particular task in such languages.
Perl differs from such languages. It has been designed with 
redundancy in mind: multiple constructs abound that do almost 
similar things. 

If programming in other languages can be equated to a walk through
a maze with orthogonal turns, programming in perl feels more like a walk 
through the grass in a park. This has led to the
perl motto B<There's more than one way to do it>, abbreviated to
TMTOWTDI or I<Tim Toady>. 

=head2 Extensible

The most current version of perl is version 5.8.3. Version 5 has been 
built with extensibility in mind. This has resulted in the largest
collection of perl extensions (called Modules) and a 
worldwide group of volunteers who actively maintain the comprehensive
perl archive of networks CPAN (http://www.cpan.org/).

=head2 History

The first version of perl was released in 1987. After successive
refinements version 4 of perl was released in 1991, which also coincided
with the first release of I<The Camel> book, I<Programming Perl>.

Perl version 4 quickly became very popular. As many people started
using perl for more than a few simple tasks, the limitations of the language
made it difficult for people to add new features. To prevent perl from
forking into many versions, a complete rewrite of perl was done and
released as version 5. Perl version 5 was more extensible than 
version 4. It contained large-scale-programming features, 
added completely new features like lexical variables and closures,
re-hauled regular expression engine, references, and made it possible
to pretty much extend perl infinitely. Version 5 supports more 
operating systems; the standard distribution comes with a
clean abstraction for database support (DBI), a Tk port to perl (Perl/Tk)
and boasts a Win32 port for PCs running Microsoft operating systems
(this port has since been integrated into the core perl distribution in
source form).

For the most current updates and feature list for perl, you should
see the distribution, which is always available at 
http://www.perl.com/CPAN-local/

=head1 Perl Data Types

Perl provides you with three basic, but powerful data types. Unlike most
languages, perl allows you to grow/shrink them dynamically
without you ever having to worry about memory allocation/de-allocation. Perl
does it all for you. The three fundamental data types in perl are called
I<Scalars>, I<Lists> and I<Hashes>.

=head2 Scalars

A scalar is the fundamental data type in perl. A scalar can hold a
single value. This value may be a string, number, a file-handle,
a typeglob, or a reference to another perl data type. 

Here is a translation table from C to perl:

	int,float,double => scalar (numeric interpolation)
	char *           => scalar (string interpolation)
	file *fp         => filehandle (*STDIN)
	symbol table     => typeglob (*FOO{THING} )
	&(struct foo)ptr => reference to ANY{THING}


Here are some examples:

	$a = 'this';   print "String   = $a\n"; #stores 'this' in $a
	$answer = 42;  print "Number   = $answer\n"; 
	$ref = \$a;    print "Reference= ",ref($ref)," => $ref \n";
	$r = *STDIN;   print "Typeglob = $r\n";
	#
	#prints:
	#
	#------------------- output start---
	# String   = this
	# Number   = 42
	# Reference= SCALAR => SCALAR(0x90c8170) 
	# Typeglob = *main::STDIN
	#
	#------------------- output end  ---

You can build a scalar from other scalars through numeric and string
operations. The following examples show interpolation at work:

	$x = "2.00"; $y = 4; $z = "abc";
	print "Numeric interpolation: $x+$y gives =>", ($x+$y), "\n";
	print "String interpolation : \$z gives $z\n";
	print "Concatenation of $x . $y gives ", ($x . $y ) , "\n";
	print "String multiplication \$z x $y, gives [", $z x $y, "]\n";
	#
	#prints:
	#
	#------------------- output start---
	# Numeric interpolation: 2.00+4 gives =>6
	# String interpolation : $z gives abc
	# Concatenation of 2.00 . 4 gives 2.004
	# String multiplication $z x 4, gives [abcabcabcabc]
	#
	#------------------- output end  ---

The `+' operator is the familiar numeric addition. The `.' operator is the
string I<concatenation> operator that concatenates it's left and right
operands and returns the result.

As you can see, scalar values can be built dynamically, and can grow
or shrink at programmer's will.

=head2 Lists and Arrays

B<A literal list is a collection of scalar values>. 

When a list of values need to be stored somewhere, you will usually use
B<arrays>. Thus, an array is a list each of whose element really contains a
B<scalar> value. This is the most important thing you need to know about lists.

As with scalars, lists can be built dynamically, and their size can
be increased or decreased by adding, deleting or splicing elements at
will. Arrays act like the English word `these'. You prefix an array with
the B<@> character. However, to get the I<scalar> element of an array,
you need to derefence the array with the B<type of the result value you
are expecting to be stored at the location>. Typically, you will store
scalar B<VALUES> in an array, so you would want to do something like this:

	#
	#direct definition, literal list
	#
	#
	$"=", ";
	@replicators = ('rna-strand', 'dna-strand', 'exon', 'intron', 'prion');
	print "Replicators: @replicators\n";
	#
	#
	#Assign to an element
	#
	$description[0] = 'Kingdom'; print "Description[0] = $description[0]\n";
	#
	#Push multiple elements dynamically (runtime)
	#
	push @description, split(/:/, 'Phylum:Order:Class:Family:Genus:Species');
	print "Description now: @description\n";
	#
	#Split words using the quoting operator 'qw'
	#
	@woman = qw(Animalia Chordata Mammalia Primates Hominidae Homo Sapiens);
	#
	#PRINT in 3 ways:
	#
	#
	print "using a 'for' block and 'print'\n";
	print "$_\n" for @replicators;
	#
	print "\nusing a 'for' iterator and 'printf':\n";
	for (0..$#description) {
		printf "Linneaus says, %-20s => %s\n", $description[$_], $woman[$_];
	}
	#
	print "\nUsing 'map', 'sprintf' to transform a list:\n";
	print map { sprintf("%02d %s\n", $_, $woman[$_]) } 0..$#woman;


	#prints:
	#
	#------------------- output start---
	# Replicators: rna-strand, dna-strand, exon, intron, prion
	# Description[0] = Kingdom
	# Description now: Kingdom, Phylum, Order, Class, Family, Genus, Species
	# using a 'for' block and 'print'
	# rna-strand
	# dna-strand
	# exon
	# intron
	# prion
	# 
	# using a 'for' iterator and 'printf':
	# Linneaus says, Kingdom              => Animalia
	# Linneaus says, Phylum               => Chordata
	# Linneaus says, Order                => Mammalia
	# Linneaus says, Class                => Primates
	# Linneaus says, Family               => Hominidae
	# Linneaus says, Genus                => Homo
	# Linneaus says, Species              => Sapiens
	# 
	# Using 'map', 'sprintf' to transform a list:
	# 00 Animalia
	# 01 Chordata
	# 02 Mammalia
	# 03 Primates
	# 04 Hominidae
	# 05 Homo
	# 06 Sapiens
	#
	#------------------- output end  ---


There are various other operations you can perform on arrays. here are some
examples:

	#
	$"=", ";
	#
	push @a, 1, 'two'; print "A = (@a)\n";
	pop @a; print "A is now popped to: (@a)\n";
	unshift @a, 'two'; print "A unshifted to: (@a)\n";
	shift @a; print "after a Shift, A is: (@a)\n";
	#
	#prints:
	#
	#------------------- output start---
	# A = (1, two)
	# A is now popped to: (1)
	# A unshifted to: (two, 1)
	# after a Shift, A is: (1)
	#
	#------------------- output end  ---

=head2 Hashes

The final perl data structure we will see is a hash. A hash is very much
like a list, but it is indexed by strings (a list is indexed by
number). A hash is like a database indexed by a single key field. Hashes
are initialized by specifying the key and value in pairs. For example:

	%colors = ( 'red' => '#FF0000',   'green' => '#00FF00');
	%passwd = ( 'root' => 'ez2Krack', 'mysql' => 'se1ect!');

Hash keys are strings and hash values are scalars, so you can
refer to them in any place where you would need a scalar value. The
individual key is enclosed within curly braces to specify that we
are referring to a hash.

Here is an example of adding another element to one of the above hashes
by using a value stored in it: 

	$colors{'blue'} = $colors{'red'};


Here is how it works. C<%colors> is the hash. It's name is I<colors>. The 
key for which we want to create a value is I<blue>. So the actual value is 
at key 'blue', which is a scalar:

	Key          => 'blue'       =>         'blue'
	hash         => curly braces =>        {'blue'}
	Scalar value => $            => $colors{'blue'}

Here's another:

	print("Root password is too $passwd{root}\n");


=head1 Refresher on Operations on perl variables

Perl provides many basic operations to manipulate variables. However
these operations are I<more powerful> than in most other languages,
there are groups of operations that do I<similar things>, so you
have a choice of programming styles.

=head2 Scalar Ops: length, substr, tr, s, chomp, lc, uc, int, sprintf

Try each of the below statements and see if the result matches with the
comments (You can ignore anything followed by a '#' because those are
comments):

	$dozens = int( 97/12 );	# gets 8
	print "97/12 = $dozens\n";
	
	$_ = 'A single sentence.';
	$l = length($_);
	print "Length of '$_' = ($l)\n";
	
	$is = substr($_, 9, 4);	#$is is now 'is'
	print "Substr('$_',9,4) = $is\n";
	
	print "\$_ is now: '$_'\n";
	$_ =~ tr/st/tp/;	#$_ is now 'A tingle tenpence.';
	
	
	print "\$_ after tr/st/tp/: '$_'\n";
	$_ =~ s/t/s/;		#$_ is now 'A single tenpence.';

	print "\$_ after another s/t/s: '$_'\n";
	print "All upper case, '$_' is ", uc($_), "\n";

	$pi = sprintf("%.12f", atan2(1, 1)*4);
	print "PI = $pi\n";
	
	#prints:
	#
	#------------------- output start---
	# 97/12 = 8
	# Length of 'A single sentence.' = (18)
	# Substr('A single sentence.',9,4) = sent
	# $_ is now: 'A single sentence.'
	# $_ after tr/st/tp/: 'A tingle tenpence.'
	# $_ after another s/t/s: 'A single tenpence.'
	# All upper case, 'A single tenpence.' is A SINGLE TENPENCE.
	# PI = 3.141592653590
	#
	#------------------- output end  ---

=head2 List Ops: push, pop, shift, unshift, sort, splice

	@a = (1, 2, 3); print "A is: @a\n";
	$last = pop @a; print "last element of A is: $last\n";
	#
	@sorted = sort('jack', 'jill', 'fred', 'barney');
	print "@sorted\n";	#prints `barney fred jack jill'
	#
	#
	splice @sorted, 2, 2, 'wilma', 'betty';
	print "Spliced: @sorted\n";	#prints  `barney fred wilma betty'
	#
	#prints:
	#
	#------------------- output start---
	# A is: 1 2 3
	# last element of A is: 3
	# barney fred jack jill
	# Spliced: barney fred wilma betty
	#
	#------------------- output end  ---


=head2 Hashes: keys, values, each

	%h = (
		'linux' => 'Linus Benedict Torvalds', 
		'perl' => 'Larry Wall',  
		'hurd' => 'Richard M. Stallman',
		'unix' => 'Dennis and Ken',
		'TAOCP' => 'Don Knuth',
		);
	#
	@software = keys %h;
	@authors  = values %h;
	#
	while ( ($k, $v) = each %h) {
		printf "%-10s was the brainchild of $v\n", $k;
	}
	#
	#prints:
	#
	#------------------- output start---
	# perl       was the brainchild of Larry Wall
	# TAOCP      was the brainchild of Don Knuth
	# hurd       was the brainchild of Richard M. Stallman
	# unix       was the brainchild of Dennis and Ken
	# linux      was the brainchild of Linus Benedict Torvalds
	#
	#------------------- output end  ---

=head1 Perl Expressions, Statements and Context 

=head2 Expressions form Statements

Everything in perl is an I<expression>. An expression is a basic unit
of program in perl that returns a result. For example, the C<print>
statement in perl is actually an expression that returns a value.

	$result = print("this is the stament that prints 'Foo'\n");
	print "Result of previous stmt = $result\n";
	#
	#prints:
	#
	#------------------- output start---
	# this is the stament that prints 'Foo'
	# Result of previous stmt = 1
	#
	#------------------- output end  ---

A perl statement is merely an expression evaluated for side effects.

Expressions can not only B<return> results, but can also be B<assigned to>
under appropriate conditions.  When the return value of an expression is
merely used to assign it to something else, it is said to be used as an
B<rvalue>. In contrast, when you assign B<to> an expression, it is said
to be used in an B<lvalue> context.  Some perl functions/operations can
act as I<lvalues> which is nice.

	$_ = "ABC\n";
	$\="\n";
	print substr($_,1,1);	#prints 'B'
	substr($_, 1, 1) = 'C'; 
	print;                  #prints 'ACC'
	#
	#prints:
	#
	#------------------- output start---
	# B
	# ACC
	# 
	#
	#------------------- output end  ---

Expressions can also return different things based on the B<context> in
which they are called! The two major types of context are described below.
We will not discuss the I<void> context which is a special case.

=head2 Scalar context

A scalar context expects/returns a single scalar value. If you use an
expression in a scalar context, the expression I<or> it's return value(s)
are coerced into a scalar. For example:

  $count = @lines;

Here, @lines is an expression that returns a list of all elements contained
in the array @lines. This expression is forced into a scalar context by the
assignment statement. In a scalar context, this gives the number of elements
of the array @lines. Thus, $count will really contain the I<number of
elements> in the array @lines.

=head2 List context

A list context expects/returns a list of scalars. If you use an expression
in a list context, the expression I<or> it's return value(s) is/are coerced
into a list. For example:

  @lines = <STDIN>;

Here, @lines provides a list context to the expression <STDIN>. This in turn
makes the expression <STDIN> slurp the entire STDIN (until an eof or CTRL-Z)
and return it as a list of lines. Thus, if you were to type 10 lines in the
terminal followed by a CTRL-D after this statement, @lines will contain 10
elements, each of which will contain the respective line you entered.

This works for lists in general, but there is a special case of a
B<literal list> that you should be aware of: A literal list appears like a
"C comma operator" in a scalar context. Here is an example to illustrate
this important distinction:

	@a = (12, 0, 32, -23);
	$b = @a;
	print "b = $b\n";
	$c = (12, 0, 32, -23);
	print "c = $c\n";
	
	#this prints:
	#b = 4
	#c = -23

For more on context, see L<perldata>.

=head1 Loops in perl: for, foreach, while

Most common tasks are repetitive. Like most languages, perl allows you
to repeat a set of statements using I<looping> constructs. The two
most common looping constructs are I<foreach> and I<while>. 

=head2 Example for `foreach'

	#perl style
	foreach my $number (1..10) {
		print "foreach $number\n";
	}

=head2 Example loop using `for' 

	#C style
	for (my $number = 1; $number <= 10; $number++ ) {
		print "for     $number\n";
	}

=head2 Example loop using `while'

	my $number = 1;
	while ( $number <= 10 ) {
		print "while   $number\n";
		$number++;
	}

The looping constructs are actualy far more versatile. For the full
details, you should start at L<perlsyn>.

For more on this, see L<perlsyn>.

=head1 Perl builtin variables

Perl has builtin variables that take on certain B<`sensible'> values at runtime.  

As we noted before, I<statements are expressions that return value(s)>. 
In the absence of an explicit assignment, some of the expressions take
default arguments. 

I<In some expressions, perl may return the results unto certain default 
variables if you don't explicitly specify where they should be stored>. 
In other cases, changing the settings of some internal variables will
make the succeeding lines in the perl program snippet behave
differently (like B<pragma> or I<hints>).

Here are some examples without explanations:

=head2  @ARGV

	#!/usr/bin/perl -w
	use strict;
	my $arg;
	foreach (@ARGV) {
		$arg++;
		print "Argument $arg: $_\n";
	}

=head2  %ENV

	while (($key, $value) = each %ENV) {
		print "$key=$value\n";
	}

=head2  @INC

This is the include path for perl libraries. 

	foreach (@INC) {
		print "$_\n";
	}

	#
	$file = 'CPAN.pm';
	foreach (@INC) {
		print "Found $file under $_/$file\n" if ( -f "$_/$file");
	}
	#
	#prints:
	#
	#------------------- output start---
	# Found CPAN.pm under /usr/lib/perl5/5.8.3/CPAN.pm
	#
	#------------------- output end  ---

You can override this variable within your program. 
For example, if you have installed the latest cool whiz-bang version
of Foo::Bar under your $HOME/lib directory, here is what you
would do:

		use lib '/my/home/dir/lib';
		use Foo::Bar;
		{ #...whatever... }

=head2 $_ = default input and pattern search space

Example: 

	while ( <FH> ) {
		split;
	}

	Is the same as the more elaborate:

	while ( defined($var = <FH>) ) {
		@_ = split " ", $var;
	}

=head2 @_ = default arguments for subroutines, default destination of 'split'

As explained in the example above, the default destination of a split
is @_.

In the context of a subroutine call, @_ contains all the arguments to
the subroutine. Note that perl subroutines can have a variable number
of arguments on I<each> invocation. @_ will automatically be sized
accordingly.  Since B<@_> is a global variable, the I<old> value of @_
is restored as soon as the subroutine call ends!


=head2 $. $/ $\ : File I/O counter, record separators

When you use the E<lt>E<gt> operator to read data from a file, perl
automatically stores the I<current line number> in a variable named B<$.>.
How does perl know where a line ends and the next one begins? Well, that
is what the record separator variable, B<$/>, is for! As with most perl
predefined variables, this takes on a default value and $/ defaults to
"\n".

Here is a way to read in a whole file to a single scalar, if you have
lots of memory to burn:

	$/ = '';
	open(INPUT, 'tail -5 /var/log/messages|') 
		|| die "/var/log/messages: $!\n";
	$slurp = <INPUT>;
	close INPUT;
	print $slurp;
	#
	#prints:
	#
	#------------------- output start---
	# May  9 19:26:42 mithya.sarvam.com root: Test 1
	# May  9 19:26:50 mithya.sarvam.com root: Test 2
	# May  9 19:26:52 mithya.sarvam.com root: Test 3
	# May  9 19:26:54 mithya.sarvam.com root: Test 4
	# May  9 19:26:57 mithya.sarvam.com root: Test 5
	#
	#------------------- output end  ---

Similarly, every B<print> statement will tack on the value of the builtin
variable B< $\ > to every line/record you write. This variable is null
by default, but if you want to, you can change this. See the B<-p>
and <-l> variables in L<perlrun> for more usage information.


=head2 $0, $$ : program name, PID

Type the following example into a test program and run:

	#!/usr/bin/perl -w
	print "I am called as $0\n";
	print "My PID is      $$\n";
	#
	#prints:
	#
	#------------------- output start---
	# I am called as /tmp/codeliver.out
	# My PID is      27485
	#
	#------------------- output end  ---

=head2  $! : O/S Error string or Errno

	for (1..10) {
		$! = $_;
		print "$_ => $!\n";
	}
	print STDERR "File /etc/nosuchfile: $!\n" unless -f '/etc/nosuchfile';
	#
	#prints:
	#
	#------------------- output start---
	# 1 => Operation not permitted
	# 2 => No such file or directory
	# 3 => No such process
	# 4 => Interrupted system call
	# 5 => Input/output error
	# 6 => No such device or address
	# 7 => Argument list too long
	# 8 => Exec format error
	# 9 => Bad file descriptor
	# 10 => No child processes
	# File /etc/nosuchfile: No such file or directory
	#
	#------------------- output end  ---

=head2 $?, $@ - Errors from child/pipe/eval 

Example:

	`/etc/nowhere/hostname`;
	print "\$? = $?\n";
	eval qq{open(F, '/tmp/nosuchfile') or die "nosuchfile: $!"};
	print "\$@ =\n$@\n";
	#
	#prints:
	#
	#------------------- output start---
	# $? = -1
	# $@ =
	# nosuchfile: No such file or directory at (eval 1) line 1.
	# 
	#
	#------------------- output end  ---


=head2 $<, $>, $(, $) : real, effective uid/gid

	print "Real: $<, Effective: $>\n";

See L<perlvar> for more information.

=head1 Commonly used Operators in perl

=head2 Logical Operators

Logical operators return true or false. Perl has all standard logical
operators.  However, the meaning of true and false is different in
perl, because perl considers strings and numbers
to be the same data-type: Scalar. Here is a quick overview of truth
as it applies to perl scalars:

The empty string "" is false. Any string that evaluates to "0" is false.
Any number that evaluates to 0 is false. Any I<undefined> value is false.
All else is true. Sometimes, this is surprising:

	print "Yes,  string '0.0' is ''\n" if ( "0.0" == '');
	print "What, string '0.0' is 'true'??\n" if ( "0.0" );

In line 1, we see that the string "0.0" is converted to 0 in the
numeric context of the I<==> operator. The empty string on the
right side is similarly converted into false. However, in line 2,
the string "0.0" evaluates to I<TRUE> according to the rules. Thus,
the print statement does get executed.

Logical operators available in perl are I<&&>, I<||> and I<!>. The
logical I<&&> and I<||> operators are I<short circuit> operators,
like in C. This means that the second operand is evaluated only when
it's necessary. Here are some examples:

	$home = $ENV{HOME} || (getpwuid($<))[7] || die "No home directory!\n";

	print "Your machine is wide open!\n" 
		if ( $> && $< && -r "/etc/shadow");

For more on this, see L<perlop>.

=head2 Binding operators

When you need to match a string with a pattern or make changes to it using
a I<regular expression> match and replace, you use the I<binding>
operator, B<=~>. To negate the logical sense of a match, you use the B<!~>
operator. Here are some examples:

	for my $host (qw(www.google.com samba.net.au)) {
	if ( $host =~ /\./ ) {
		print "$host seems to be fully qualified!\n";

		if ($host !~ /\.(com|org|edu|mil|gov|net)$/ ) {
			$country = $host;
			$country =~ s#.*\.##;	#remove everything except the TLD marker
			print "It's country of origin is: $country\n";
		}
		else {
			($tld = $host) =~ s/.+\.//;
			print "Host is a canonical TLD [.$tld]\n";
		}
	}
	}

=head2 Additional logical operators not found in C

In addition to && and || for logical operations, perl provides B<and> and
Some new logical operators

In addition to && and || for logical operations, perl provides B<and> and
B<or>. These behave identically to the I<&&> and I<||> except that they
have very low precedence. I<Precedence> determines the order of evaluation
within a single statement. Here is an example where not knowing the
precedence might bite you (in fact, the perl and/or operators were
designed just so that people don't make this mistake). Perl allows you to
call functions without using parentheses around the arguments. If you need
to open a file, here is how you'd do it I<with> parentheses around the
arguments, without checking the return values:

	open(FOO, '/etc/passwd');

This can also be written conveniently as:

	open FOO, '/etc/passwd';

These two function calls work exactly the same way. Now, if you need to
add some error checking of the return value of the I<open> call, you would
do something like this:

	open(FOO, 'bar') || die "bar: $!\n";

The equivalent

	open FOO, 'bar'  || die "bar: $!\n";

parses as:

	open(FOO, 'bar' || die "bar: $!\n");

This is not what we want. In this situation the I<or> operator comes to the rescue:
Thus, it is better written as:

	open FOO, 'bar' or die "bar: $!\n";

=head2 Variables and Quoting operators

Variable names can contain B<alphabets, digits and underscores>.
The first character should not be a digit. To store a value
within a variable, you B<quote> the value if it is a string, or use
a B<literal number>.

In addition to the standard quoting characters, perl provides additional
syntax to allow you to simplify creation of strings with embedded quotes.
These are the B<q{}>, B<qq{}>, B<qx{}> and B<qr{}> operators. These
operators are flexible in that you can use I<ANY> character as the quoting
character. For example, instead of the curly braces, you can use the B<#>
character as quoting character:

	$something =  q#Single quoted#; 
	$nother    = qq#Not '$something'#;
	$crazy = 'Please don\'t use \'\' within this string';
	$ok    = q{Please don't use '' within this string};
	$foo = "<A HREF=\"mailto:$address\">Mail us</A>";
	$foobetter = qq{<A HREF="mailto:$address">Mail us</A>};
	for (qw(something nother crazy ok foo foobetter)) {
		eval qq{print "$_ = \$$_\n";}
	}
	$ip_patt = qr{^\d+\.\d+\.\d+\.\d+};
	print "127.0.0.1 matches $ip_patt!\n" if ( '127.0.0.1' =~ /$ip_patt/ );


For more on perl operators, see L<perlop>.

=head2 I/O Operations: Standard Filehandles

Following the Unix convention, perl provides three default Filehandles
that are direct analogues to C: I<STDIN>, I<STDOUT> and I<STDERR>. In
the absence of an explicit Filehandle, the magical spaceship
operator (E<lt>E<gt>) automatically reads from STDIN. In the absence
of an explicit Filehandle your I<print> statements automatically
print to STDOUT (You override this by using the L<select> function
call in perl). Some perl functions (namely I<warn> and I<die>) will
print automatically to STDERR with no need for a Filehandle argument
(pun intended). You can I<close> the standard file handles if needed
(say, a daemon process)  or redirect them I<within> perl. Here are
some examples where these Filehandles figure, even though you don't
see them:

	print "This prints to your standard output!\n";
	unlink("/") or warn "Can't unlink /: $!\n";
	warn("Please run manually!\n") unless ( -t STDIN );

=head2 I/O Operations: Opening and Closing files

Perl's I<open> function wears many hats. 

Here are some examples without commentary:

  open(PASSWD, '/etc/passwd');
  open(LOG, "> $logfile");
  open(RCMD, "rsh $host uname -a 2>&1 |");
  open(MAIL, "|/usr/lib/sendmail -oi -t");
  open(RW, "+< /read/and/write/later");

  close(ANYHANDLE);

Filehandles can be stored in scalars also, using many of the standard perl
modules available with the perl distribution. Here is a simple fragment
that uses the perl module B<IO::File> (see L<perlmod>  for more
explanation of modules, classes and objects in perl.


  #!/usr/bin/perl -w
  use IO::File;
  my $fh = new IO::File;
  $fh->open('/etc/resolv.conf');
  print STDOUT <$fh>;
  $fh->close;

=head2 I/O Operations: magical filehandles

B<There are certain file handles that perl will make available for you
without an explicit open>. If you run a perl program with some arguments,
perl removes all arguments it can understand, and makes the rest of them
available to your program as I<@ARGV>. Now, if your program doesn't use
these arguments in any way, and you use the diamond operator (<>) for
reading in data, perl will consider each of those arguments as files to be
opened, open them in order, and supply their contents when you use the <>
operator! Here is a simple example that emulates the Unix I<cat> command
in some ways:

	#!/usr/bin/perl -w
	while ( <> ) {
		print;
	}


What is the name of the currently opened Filehandle? $ARGV. Here is
how you test this:

	#!/usr/bin/perl -w
	while ( <> ) {
		next unless eof;
        print "File is $ARGV\n";
	}


There are occasions when your program needs some small amount of input
that you'd rather have in a file, but you don't want the script to
hard code the name of the file or you don't want to carry the file around
with the program. The Filehandle DATA is what you need in such cases. Perl
will read your program until it reaches the end of your program or the end
of the file. If perl reads a line which says B<__END__> (without any
other characters) it stops reading the program right there. Anything that
follows is available to your program with the I<DATA> Filehandle. Here is
an example:

	#!/usr/bin/perl -w
	print <DATA>;
	__END__
	This line three erros.
	This line ends input.

The I<open> and I<close> on the above Filehandles happens automatically,
so you don't need to do that explicitly.

For more on these topics, see L<perlfunc>.

=head1 System Interaction and perl shortcuts

=head2 Hostname

	#OLD:
	chomp( $hostname = qx{ hostname });
	print "Host = $hostname\n";
	#SHORTCUT:
	use Sys::Hostname;	#need to run h2ph after install
	print "Host = ", hostname, "\n";

Benchmark results for hostname

                          Rate perl_hostname_1000times    system_hostname_once
perl_hostname_1000times 1020/s                      --                    -63%
system_hostname_once    2778/s                    172%                      --

Net speedup: B<272 times>


=head2 Remove a file:

	#OLD:
	system("rm $file"); 
	system("mv $file1 $file2");
	#
	#SHORTCUT:
	#
	unlink $file; 
	rename($file1, $file2) || die "can't rename: $!\n";

=head2 Daemonize

A daemon is different from normal programs: it should not have a controlling
terminal, and it should be immune to signals that the launching
shell/program is sent. If you close all standard Filehandles, the process
will still have a controlling terminal. It will also inherit a working
directory which you want to set to B</>. Here is one way to do it:

	use POSIX qw/:setsid/;
	close(STDIN); close(STDOUT); close(STDIN);
	chdir('/');
	fork && exit;
	setsid();
	#reopen STDIN, STDOUT etc. if needed..

The setsid call is imported from the POSIX module (may not be fully
implemented in all O/S). C<setsid()> will make the program it's own
process group leader. The program will also have no controlling terminal.
For more on unix system programming, see L<References>.

=head1 Some standard library shortcuts in Perl

=head2 getpwnam, getpwent, getpwuid 

These functions allow you to get the password file/NIS entries from within
perl. You could get a value by specifying the key through C<getpwnam> and
C<getpwuid>. Or you could cycle through the entire list using C<getpwent>.

	$root_shell = (getpwuid(0))[7]; 
	print "Blech!\n" unless $root_shell =~ /bash/;

=head2 stat, lstat

These functions allow you to get at the file meta information. These have
similar semantics to the unix system calls of the same name.

	use File::stat;
	$s = stat("/etc/passwd");
	print "/etc/passwd Last modified at: ", scalar(localtime $s->mtime);

=head2 chown $uid, $gid, @files;

Example:

	chown 0, 0, '/etc/passwd', '/etc/shadow';
	chmod 0600, '/etc/shadow';

=head2 directory operations: opendir, readdir

Here is an example: find all text files within current directory:

	opendir(DIR, '.');
	while (defined($file = readdir(DIR)) ) {
		next unless -T $file;
		print "text file: $file\n";
	}
	closedir(DIR);


=head1 Regular Expression Fundamentals

Regular expressions are B<mini languages> that allow us to capture
our own custom B<grammar> to B<match/fail> specified I<patterns in
input space>. Now that the theory is out of the way for the moment,
here's another try:

Regular expressions are powerful tools that match a pattern in inputs
given to them. Some kinds of pattern matches allow us to extract parts
of information that are most relevant to us within the input data, and
also allow us to transform them into any other form we need.

Perl's support for regular expressions is built into the core
language, so it is fast and flexible. Regular expressions I<regex> are 
abstractions of general patterns you are looking for, so they can get a
bit terse and hairy to read. Perl's regex I<syntax> is however rich and
supports extensions that allow you to write perfectly readable I<regex>.

=head2 Perl REGEX Metacharacters

The following Metacharacters allow you to match different types and amount
of text: 

	.	match ANY character (except a newline)
	\s, \S	whitespace, non-whitespace
	\w, \W	word, non-word character (word = a-zA-Z_0-9)
	\d, \D	digit, non-digit
	^, $	beginning/end of line
	*	match zero or more of preceding expression
	+	match one or more of preceding expression
	?	match zero or once
	{n,m}	match from n to m repetitions of preceding expression
	()	grouping
	[]	character class (eg. a thru z is [a-z])
	|	alternation
	$1..99	matched groups

For exact descriptions see L<perlre>. For now, we will explain a few of
these Metacharacters with examples in the following sections.

=head2 Theory of regex engines: DFA, NFA

When you are matching a text against some pattern, there are two
ways to go about it:

	1. Map the matching pattern into the text at every character,
	return the longest possible match, starting at the leftmost
	position in the text. This is simple, fast, and produces
	answers in a definite time that can be DETERMINED. These are
	called DFA.

	2. Using the pattern as your directive, walk through the text.
	Don't stop until you've tried all possible ways to match, and
	report failure only otherwise. This is not-so simple, but it
	REMEMBERs all past states it passed. However, it is not 
	GUARANTEED to return success/failure within a known time.
	These engines belong to the NFA class.

B<DFA> is fast, but I<it does not support the ability to
remeber sub patterns in matched test>.

B<NFA> is slower, and potentially not return with an answer
at all, but I<it is more flexible and can easily be enhanced
to remember all matched sub-patterns to return at the end>.

B<Perl is an NFA-based engine with support for backreferences,
non-greedy matching, and lookahead>.

Here are the fundamental rules of REGEX matching in perl:

=head2 Simplest regex is a plain string

The simplest regex is a plain string. If you use it to I<match> something,
it will succeed only if your input data contains the exact same string as
the regex. However, within your pattern (regex) you can use
B<Metacharacters> to match huge amounts of data in a few characters of
the regex. Here is a simple example of some entries in a logfile:

	Jun 14 22:06:31 indus.fell.com in.ftpd[492]: connect from 146.223.45.6
	Jul 13 12:30:07 indus.fell.com in.telnetd[570]: connect from 10.0.15.21

Here is one way to find the client IP address in the second line.

	/connect from 10.0.15.21/

Unfortunately, this will only match connections originating from
10.0.15.21 (actually it will also match 1000115021, but we'll see later
how to change that). What if you want to match ANY ip address? This is where
Metacharacters come to the rescue. The Metacharacters B<\d> signifies a
I<digit>. The next regular expression will match any IP address:

	/connect from ([\d\.]+)/

The square brackets allow us to match a B<class> of characters. In our
case, this comprises of a digit (\d) and a I<literal dot (.)> character.
The plus (B<+>) following this character class asks the expression to
match a digit or a literal dot I<one or more times>. Unfortunately, our
expression not only matches valid IP addresses but spurious values as
well (example: 345.567.890111.11)! In our case, we are sure the logfile
will not contain such bogus matches, but in a general case, we will have
to specify the pattern to match I<as exactly> as possible.  You also see
the entire IP address pattern enclosed within brackets. Why?

=head2 Perl regex is non-regular: supports back-references

Regular expressions just match. However, in practice, you might want a
global match out of which you need only a subset of characters for further
processing. In such cases, I<back-references> allow you to store I<parts>
of matches and retrieve them I<after> a match. This is what makes perl
regexes really powerful. Perl stores each submatch enclosed within
brackets B<()> in internal variables named I<$1>, I<$2>.. etc. 

Back-references allow substitution and data reduction. In the above
example of matching an IP address, the bracketed sub-pattern contains the
IP address <when the whole pattern matches>! Thus, here is one way to make
a list of all unique IP addresses that connected to your machine:

  1	my($ip, %connections, $n);
  2	
  3	open(MESSAGES, '/var/log/secure') or die("can't open logfile: $!\n");
  4	while ( <MESSAGES> ) {
  5		next unless /in\.telnetd.+connect from ([\d\.]+)/;
  6		$connections{ $1 }++;
  7	}
  8	close(MESSAGES);
  9	foreach $ip ( keys %connections) {
  10		printf("%-15s connected %5d times\n", $ip, $connections{$ip});
  11	}


=head2 Perl regex: tries all possibilities for match to SUCCEED

The important concept with perl regular expressions is that perl tries
I<ALL> possibilities for a match to succeed. This is done through 
back-tracking and bumping-along which is very similar to what we do when we
solve a maze problem: if we hit a wall, we backtrack to the last place
where we had a choice of paths. After we backtrack to this point, we abandon 
our failed path and continue along another. In our example, when we match
the subexpression "in\.telnetd", perl does something like the following:

The first two characters of the hostname "indus.fell.com" match the first
two characters of our pattern. However, the next literal character B<d>
does I<NOT> match the literal "." in our pattern B<\.>! Now perl doesn't
declare a failure at this point! It now tries to bump along to the next
character in the target string (which happens to be 'n') and tries the
pattern. It fails immediately since the character I<n> does not match our
subexpression's first character, I<i>. This happens until it reaches the
right place "in.telnetd". At this point the first subexpression
I<in\.telnetd> matches exactly. Now the regex match proceeds to conclusion
because it does succeed for this line.

=head2 Match stops at the FIRST/earliest successful match 

Perl will not attempt to find all matches in a string. It will stop at the
very first match. In addition, even if the pattern will match multiple
places, perl will match at the earliest point in the target string. Here
is an example:

	Writing c-shell scripts is a sure way to go to hell!

If we try to match /hell/ in this example, it would NOT match the last
word in the example. It will match right in the middle of "c-shell",
because that is the B<earliest> place where the match succeeds! This is an
important issue that will help you avoid spurious matches. How do we match
the word "hell" in the above example? The pattern /\bhell/ will do. This
is because the B<\b> character matches a I<word-boundary> which means that
a B<\b> will NEVER match \w. Thus, the character "s" in "c-shell" will
fail to match \b and so the regex match algorithm will bump along until it
finds `hell'.

=head2 Matches can be GREEDY or non greedy: backtracking

When you specify a B<+> to match multiple characters, perl will match I<as
many characters as it can> in the beginning. If later parts of the pattern
cause the match to fail, perl will B<backtrack> into the submatch by one
character and retry the failed match from the same point. This is best
described by an example string and pattern:

	STRING: All that is gold does not grow old
	PATTERN1: /old/
	PATTERN2: /.+old/

Pattern 1 will match the "old" within the word "gold" in the string. This
follows from the explanation in the previous section. Pattern 2 will
however match the sub-pattern "old" at the very last word! This is because
the B<+> character is greedy. Thus, B<.+> gobbles up the entire string at
the beginning. The sub-pattern "old" now fails, so perl backtracks the
B<.+> to contain all but the last character. This fails too. Perl
backtracks again, and fails. The next backtracking places the start of
match before the "o" in "old". This matches with the sub-pattern "old" and
perl reports success. In this case, the "old" in the regex matches the
last word. 


=head2 Results depend on context

As with other things, regex match in perl returns different values
depending on the context in which you match. Here are the general rules:

	scalar context returns number of matches
	list context returns all matches within groups

When we introduce brackets in our regex, perl B<groups> the subtext that
matched each bracketed sub-expression and stores them in internal
variables $1, $2 etc.. However, this only happens in scalar context. In a
list context, all the bracketed matches are returned to the list context.
Here is an example:

	$_ = 'All that is gold does not grow old';

	print "SCALAR: $1\n" if /(.+)old/;
	@foo = /(old)(.+old)/;
	print "LIST: @foo\n";

prints:

	SCALAR: All that is gold does not grow 
	LIST: old does not grow old


=head1 Regular Expressions - Basic Examples

Here are some basic examples that use some simple patterns to match
various things you would commonly extract from input data:

=head2 Match a word: \w+

	if ( 'One word' =~ /\w+/ ) {
		print "Matched $&\n";
	}
	#Matched One

=head2 Match an integer: [-+]?\d+


	$_ = 'One value: +23.45';
	if ( /[-+]?\d+/ ) {
		print "Matched $&\n";
	}
	#"Matched +23"

=head2 Match a number that has 3 to 5 digits: \d{3,5}

	if ( 12345 =~ /^\d{3,5}$/ ) {
		print "Number within range\n";
	}


=head2 Match everything between foo and bar: greedy version

	$_ = 'brave fools embark on travel through bare desert';
	print $& if /foo.*bar/;

	#prints "fools embark on travel through bar"

=head2 Match everything between foo and bar: non greedy version

	$_ = 'brave fools embark on travel through bare desert';
	print $& if /foo.*?bar/;

	#prints "fools embar"

=head2 Match the host in /NFS server floozey not responding/:

	/NFS server\s+(\S+)\s+not responding/
	hostname can be retrieved as $1 (if match succeeds)

=head2 Surprise 1: '*' matches ZERO or more!

With greedy quantifiers in previous subexpressions, a later '*' will match 
zero times and still report success:

	$_ = 'Has a long number 12437';

	if ( /(.*)(\d*)/ ) { print "String: $1, number: $2\n"; }
	#gives  "String: Has a long number 12437, number: "

=head2 Surprise 2: greediness results in backtracking

Greediness, backtracking and 'first successful match' combine to produce
non-intuitive results, if you're not careful.

	$_ = 'Has a long number 12437';
	if ( /(.*)(\d+)$/ ) { print "String: $1, number: $2\n"; }
	#gives  "String: Has a long number 1243, number: 7" !!

The above expression is better written as

	 if ( /(.*?)(\d+)$/ ) { print "String: $1, number: $2\n"; }

=head2 Surprise 3: greediness is the default

	$_ = 'your food is in the bar under the barn';
	if ( /foo(.*)bar/ ) { print "matched: $1\n";}
	#gives "matched: d is in the bar under the"



=head1 Perl Regular Expressions - More details


Here is the complete specification for a perl regex match operation:

	m/expr/gsimox; 

You can choose to leave out the I<m> (which stands for I<match>, by the
way) and just use /pattern/ which is what you normally do. However, perl
allows you to use ANY character as the pattern delimiter, and allows you
to write the regex in a more readable manner. Here are some regexes, all
of which match the same pattern: finding the directory name of a file.

	1.   /(\/[^\s]+)\/[^\/\s]+/;

	2.  m,(/[^\s]+)/[^/\s]+,;

	3.  m{
		(/[^\s]+)	#a slash followed by any non space character
		/		#start of filename part
		[^/\s]+		#a filename (assume no spaces in the filename)
	}x;

As we see from regex 1, match patterns can be very hairy. The reason why
we had all those leaning toothpicks(B<\/>) was due to the fact that the
pattern was delimited by a B</>. In such cases, if you want to match a
literal forward slash, you need to quote/escape it with the B<\>
character. Regex 2 is clearer because it now uses comma characters to
delimit the pattern. This, you don't have to quote the B</>. Even after
this substantial improvement in readability, the pattern looks difficult.
Regex 3 is probably the easiest for I<humans> to parse. We don't offer any
explanation, as it is self-evident. See below for more details on the
B</x> modifier. With such powerful constructs perl allows you to match
almost any type of pattern (I<nested> patterns are one exception).

However, a match is not the only reason to use a regex. Once you perform a
match, you can actually substitute whatever you matched, with anything
else you may want to change it to. Here is the spec for the regex 
B<Substitution> operator:

	s{expr}{replacement}egsimox;

The modifiers B<e,g,i,m,o,s,x> specify different ways in which the match
can be directed. The one additional modifier you see is the B</e>
modifier. Here are examples that illustrate some of them:

=head2 i: Case insensitive

	$_ = "The path to my magic scripting language is /usr/bin/awk\n";

	s{/(awk|sed|sh|csh|bash|ed)\b}{/perl};

	print;

This prints "The path to my magic scripting language is /usr/bin/perl".

=head2 o: optimize (variables interpolated only ONCE)

	$val = 'something';
	$new = 'somthinels';
	while ( <> ) {
		print if s/$val/$new/o;
	}

=head2 x: use extended regular expressions (allow comments!)

Perl version 5 introduced the ability to include arbitrary comments
I<within> a regex by specifying the I<x> modifier. This allows you to
write crystal clear regexes that you would otherwise have a hard time
understanding on second glance. We have seen this in an example above.
Here is another, more hairy example:

	/^\w+\s+\d+\s+[\d:]+\s+.+?(in\.\w+)\[\d+\]:\s+connect\s+from\s+([\d\.]+)$/

Better written as:

	m{
		^\w+\s+\d+	#Date in year
		\s+
		[\d:]+		#Time
		\s+.+?		#ignore junk

		(in\.\w+)	#get the service daemon that was connected to

		\[\d+\]		#the PID within []

		:\s+connect\s+from\s+

		([\d\.]+)$	#the originating client IP..
	}x;

The clarity that you get with the /x modifier is well worth the effort of
increasing your lines of code.


=head2 $`, $&, $' = pre match, entire match and post-match strings

Example:

	if ( 'Pre match Post' =~ /\s+match\s+/ ) {
		print "Pre  match: $`\n";
		print "Match     : $&\n";
		print "Post match: $'\n";
	}

=head2 e: evaluate the replacement as a PERL expression!

The B</e> modifier allows you to substitute a matched pattern with the
B<results> of perl code within the substitution string! This is very
powerful. Here is a simple example:

	$_ = '2 candies at 35 cents = ';
	s{
		(\d+)\D+(\d+)	#get numbers
		.*$
	}{
		$& . 	#append to end
		($1 * $2) . ' cents'
	}ex;
	print;	#prints "2 candies at 35 cents = 70 cents";

Here is another example: if you want to change the IP address of a host,
and you have a table of the new IP addresses for each old IP, here is a
simple way to change it:

	%new_ip = ( '10.0.0.1' => '10.1.1.1', '192.168.100.2' => '172.16.45.2');

	@old = ('10.0.0.1', '10.3.14.3', '192.168.100.2', '192.168.100.3');
	@new = @old;

	foreach ( @new ) {
		s/([\d\.]+)/$new_ip{$1} ? $new_ip{$1} . ' <--- ' : $1 /e;
	}
	print join("\n", @new), "\n";

This code snippet prints:

	10.1.1.1 <---
	10.3.14.3
	172.16.45.2 <---
	192.168.100.3

We have crafted the regex to add the "<--" for clarity. This makes you
clearly see where the changes have taken place in our example. 

This brief introduction to regular expressions should help you craft
simple regular expressions. For more details, consult L<perlre> or the
regular expressions book listed in L<"Books">.

=head1 Subroutines 

Perl allows you to write free form code, just like any other language.
However, if you write large programs, or programs that I<behave> in a
variety of ways, you would like to bunch similar tasks together, and also
re-use same code fragments over and over again. Perl subroutines are
designed for this type of abstraction.

Subroutines are a way of dividing a perl programming task into manageable
chunks of abstraction.  They are analogous to I<functions> in
C. 

=head2 Defining a Subroutine

You declare a subroutine in perl as follows:

	sub my_sub {
		my(@arguments) = @_;
		#statements;
		$" = ", ";
		print "My arguments are: @arguments\n";
	}
	my_sub("foo", 6.023e-3, 42);
	#
	#prints:
	#
	#'My arguments are: foo, 0.006023, 42'

=head2 Subroutine Arguments 

Subroutines arguments in perl are different from equivalent 
implementations in most commonly used languages in some 
important aspects: 

B<subroutines in perl have variable number of arguments by default>

B<subroutine arguments are I<NOT> named or prototyped> (I<perl 5 does have
a mechanism for enforcing runtime checking of argument types, and the upcoming perl-6 will have type checking of parameters>).

=head2 Subroutine return value, 

A perl subroutine's return value is typically the values returned
using an explicit B<return> statement.

If a subroutine does not explicitly return a value, and the calling
statement/expression uses the subroutine in a context requiring a return
value, the subroutine's B<LAST evaluated expression> becomes the return
value. Here is an example:

	sub sum_two_numbers {
		$_[0] + $_[1];
	}
	print "sum: ", sum_two_numbers(2, 3), "\n";
	#
	#prints:
	#'sum: 5'

=head2 Subroutine arguments are references

All parameters passed to the subroutine are passed automatically
through the B<@_> variable. However, these parameters are passed B<by
reference>, i:e they are not copied into the subroutine's stack. Instead,
any modifications to these values directly affect the original values
in the calling expression's name-space.

If you need to change the subroutine's argument values, and not affect
the calling namespace, then you would B<copy the arguments> into 
variables within your subroutine, as follows:

	sub hypotenuse_bad {
		my $sum = 0;
		for (@_ ) {
			$_ *= $_;
			$sum += $_;
		}
		return sqrt($sum);
	}
	sub hypotenuse {
		my($arg1, $arg2) = @_;
		$arg1 *= $arg1;
		$arg2 *= $arg2;
		return sqrt($arg1 + $arg2);
	}
	my($vert, $horiz) = (5,12);
	print "BEFORE             : Vertical = $vert, Horizontal =  $horiz\n";
	print "\nHypotenuse 1: ", hypotenuse($vert, $horiz), "\n";
	print "AFTER sub-with-copy: Vertical = $vert, Horizontal =  $horiz\n";
	print "\nHypotenuse 2: ", hypotenuse_bad($vert, $horiz), "\n";
	print "AFTER sub-mangling : Vertical = $vert, Horizontal =  $horiz\n";
	#
	#prints:
	#
	#------------------- output start---
	# BEFORE             : Vertical = 5, Horizontal =  12
	# 
	# Hypotenuse 1: 13
	# AFTER sub-with-copy: Vertical = 5, Horizontal =  12
	# 
	# Hypotenuse 2: 13
	# AFTER sub-mangling : Vertical = 25, Horizontal =  144
	#
	#------------------- output end  ---

=head1 Variables and Scope

Variables are placeholders for values. You may want these stored
values to be accessed in certain places throughout your program, or
at certain times during the execution of your program. This is called
the B<Scope> of a variable. The B<Scope> of a perl variable is 
specified by declaring it to be a B<lexical> or B<dynamic> variable.

=head2 Scope

The following scopes are available in perl (as of 5.8):

	1. Limited to the nearest enclosing block, subroutine, eval or file.
	Cannot be seen in called subroutines within the current
	scope unless passed explicitly.

	2. Limited to the current block and any subroutines called
	within this block. The old value of the variable is automatically
	restored after exiting from current scope.

	3. Limited to current block, may be visible across
	package boundaries.

=head2 Dynamically Scoped variables (local)

Dynamic scoping (declared using the C<local> modifier) happens by default 
unless you declare variables as lexical. Dynamic scoped variables are global 
variables, accessible to the entire running program from the declaration,
including subroutinees called within that scope.

Subroutines are free to overwrite dynamic variables, causing values
to be changed in unpredictable ways.  

Example of B<local> variables:

	local $person = 'King';
	print "Person in outer block, begin: $person\n";
	{
		print "Person in inner block, BEFORE overwriting: $person\n";
		local $person = 'Beggar';
		who_isthis();
		print "Person in inner block, after overwriting: $person\n";
	}
	sub who_isthis {
		print "Inside a subroutine: $person\n";
	}
	print "Person in outer block, END  : $person\n";
	#
	#prints:
	#
	#------------------- output start---
	# Person in outer block, begin: King
	# Person in inner block, BEFORE overwriting: King
	# Inside a subroutine: Beggar
	# Person in inner block, after overwriting: Beggar
	# Person in outer block, END  : King
	#
	#------------------- output end  ---


More complex example of B<local> variable:

	%hash = ('apple' => 'fruit');
	print_hash("before");
	{
		local $hash{'apple'} = 'red';
		$hash{'tomato'} = 'vegetable';
		print_hash("local ");
	}
	print_hash("after");
	sub print_hash {
		my $prefix = shift;
		print "\n", "-" x 40, "\n";
		for (sort keys %hash) {
			printf "$prefix: %10s: %s\n", $_, $hash{$_};
		}
	}
	#
	#prints:
	#
	#------------------- output start---
	# 
	# ----------------------------------------
	# before:      apple: fruit
	# 
	# ----------------------------------------
	# local :      apple: red
	# local :     tomato: vegetable
	# 
	# ----------------------------------------
	# after:      apple: fruit
	# after:     tomato: vegetable
	#
	#------------------- output end  ---


=head2 Lexically scoped variables (my)

Lexical scoping is the second variety of scope as described in the
beginning of this section. Variables declared with lexical scope are
generated at compile time. A lexical variable is declared using the
B<my> keyword.

Example:

	my $a = 'this';
	my $c = "that";
	{
		my $c = "all";
		print "inside block Scope, A=$a, C=$c\n";
	}
	print "inside file  Scope, A=$a, C=$c\n";
	#
	#prints:
	#
	#------------------- output start---
	# inside block Scope, A=this, C=all
	# inside file  Scope, A=this, C=that
	#
	#------------------- output end  ---

More complete example:

	my $a = 'this';
	my $c = "that";
	{
		my $c = "all";
		print "inside block Scope, A=$a, C=$c\n";
		print_ac();
	}
	print "inside file  Scope, A=$a, C=$c\n";
	print_ac();
	sub print_ac {
		print "Within sub, A=$a, C=$c\n";
	}
	#
	#prints:
	#
	#------------------- output start---
	# inside block Scope, A=this, C=all
	# Within sub, A=this, C=that
	# inside file  Scope, A=this, C=that
	# Within sub, A=this, C=that
	#
	#------------------- output end  ---


I<Lexical scope increases data privacy>. When you declare a variable
using the I<my> keyword, it creates a variable and grants it a scope of
the closest enclosing block or file or eval or subroutine.

A B<my> variable goes I<out of scope> as soon as the nearest enclosing
scope is no longer in the execution path. No other part of your program
can access these values directly (I<with some exceptions>).

=head2 Deciding between local/my

When do you use I<my> as opposed to I<local>?

Almost always. There are very few situations in your code
where declaring I<local> variables makes better sense. Try
to use B<local> declarations in the following circumstances:

=over 4

=item *

When you want a temporary value to be held in
a variable that is predefined in perl ($_, @_, $/ etc.)

=item *

When you want temporary semantics within your
runtime scope of the current block and all called
subroutines:

	{ local $SIG{'INT'} = \&do_something;.... }

=item *

When you want to alias other variables

	@array = ('foo', 'bar');
	call_it( \@array );
	sub call_it {
		local *a  = $_[0];
		print "A = @a\n";
	}
	#
	#prints:
	#
	#------------------- output start---
	# A = foo bar
	#
	#------------------- output end  ---

=item *

When you want your variable to be visible to every calling
program.

For example, the entire I<Module export> mechanism in
perl is built on use of I<local> variables.

=back


For most other cases, I<my> is better than B<local>.

=head1 Perl variable References 

To make complex datastructures possible, perl 5 contains a mechanism
to store B<references> to variables as values. I<This are similar 
to pointers in C>.

Perl allows two types of references: B<symbolic> references
and B<hard> references. Symbolic references are like B<aliases>
to other variables, or spring up whenever a B<hard reference>
cannot be inferred.

=head2 Symbolic Reference

Here is an example of a symbolic reference:

	$name = "foo";
	$$name = 1;
	print "variable \$name = $name\n";
	print "variable \$foo  = $foo\n";
	print "Foo is really: ", *{foo}, "\n";
	#
	#prints:
	#
	#------------------- output start---
	# variable $name = foo
	# variable $foo  = 1
	# Foo is really: *main::foo
	#
	#------------------- output end  ---

Hard references are available from perl 5 onwards. Hard references
can be used to refer to other variables and values I<without a need to 
know what the other name is>. These are very similar to hard links in
file systems.

B<use hard references only>. Soft references are powerful, but dangerous
to code clarity.

=head2 Hard references

=over 4

=item * Dereference an existing variable

	my $bar = 'Grill';
	my $foo = \$bar;
	my @toys = qw(tigger pooh barnie);
	my $animalref = \@toys;
	print "\$foo  = $foo\n";
	print "\$\$foo = $$foo\n";
	$"= ", ";
	print "\$animalref = $animalref\n";
	print "array \$animalref = @{ $animalref }\n";
	#
	#prints:
	#
	#------------------- output start---
	# $foo  = SCALAR(0x9c92158)
	# $$foo = Grill
	# $animalref = ARRAY(0x9c921ac)
	# array $animalref = tigger, pooh, barnie
	#
	#------------------- output end  ---

=item * Create an ANONYMOUS array:

	$array_ref = [ 'a', 'b', 'c' ];
	$"= ", ";
	print "\$array_ref = $array_ref\n";
	print "array \$array_ref = @{ $array_ref }\n";
	#
	#prints:
	#
	#------------------- output start---
	# $array_ref = ARRAY(0x99e5c18)
	# array $array_ref = a, b, c
	#
	#------------------- output end  ---

=item * Create an ANONYMOUS hash:

	use Data::Dumper;
	#
	$hash_ref = { 'shiva' => 'parvati', 'bitter-half' => 'better-half' };
	#
	$"= ", ";
	print "\$hash_ref = $hash_ref\n";
	print "HASH \$hash_ref = ", 
		Data::Dumper->Dump([ $hash_ref ], [ '*hash_ref' ]), "\n";
	#
	#prints:
	#
	#------------------- output start---
	# $hash_ref = HASH(0x97ecc18)
	# HASH $hash_ref = %hash_ref = (
	#               'bitter-half' => 'better-half',
	#               'shiva' => 'parvati'
	#             );
	#------------------- output end  ---


=item * Create an ANONYMOUS subroutine:

	$sub_ref = sub { 
		print "'anon sub says Hello!'\n";
		for (0..$#_) {
			print(" ->arg $_ => $_[$_]\n");
		}
	};
	print "SUBroutine \$sub_ref = $sub_ref\n";
	print "SUB executing.. output follows...\n";
	$sub_ref->();
	$sub_ref->("With", "some", "arguments");
		
	#
	#prints:
	#
	#------------------- output start---
	# SUBroutine $sub_ref = CODE(0x88ee17c)
	# SUB executing.. output follows...
	# 'anon sub says Hello!'
	# 'anon sub says Hello!'
	#  ->arg 0 => With
	#  ->arg 1 => some
	#  ->arg 2 => arguments
	#
	#------------------- output end  ---

=item * Automatic derefence (perl builtin feature):

	use Data::Dumper;
	$a = [ ];
	$a->[3]{'foo'} = 'bar';
	print Data::Dumper->Dump( [ $a ], [ '*a' ]);
	#
	#prints:
	#
	#------------------- output start---
	# @a = (
	#        undef,
	#        undef,
	#        undef,
	#        {
	#          'foo' => 'bar'
	#        }
	#      );
	#
	#------------------- output end  ---


=item * Calling an Object constructor.

	use IO::File;
	my $p = IO::File->new();
	print "\$p = $p\n";
	#
	#prints:
	#
	#------------------- output start---
	# $p = IO::File=GLOB(0x8ed44c4)
	#
	#------------------- output end  ---

=back


=head1 Complex/multi-dimensional data-structures in perl

B<All perl datastructures are really 1-dimensional>. All perl
multi-dimensional datastructures are generated with B<References>.
As explained in L<"References in perl">, the automagical initialization
of all complex references takes place within perl to ensure that you
can create a set of nested references that emulate multi-dimensionality.

Here are some examples of various multi-dimensional datastructures
and their use, without much comment:

=head2 Array of Arrays

	use Data::Dumper;
	@a = ( 
		[ 'unix', 'OS', '1970'], 
		[ 'dos', 'emulator', 1980] 
	);
	#
	print "Here is an Array of Arrays:\n";
	print Data::Dumper->Dump( [ \@a ], [ '*a' ]);
	print "\nHere's how you would use it:\n";
	my $code = qq{
	for (\@a) {
		print "\$_->[0] is an \$_->[1] which was created circa \$_->[2]\\n";
	}
	};
	print "CODE: $code\bRESULT:\n";
	eval $code;
	
	#
	#prints:
	#
	#------------------- output start---
	# Here is an Array of Arrays:
	# @a = (
	#        [
	#          'unix',
	#          'OS',
	#          '1970'
	#        ],
	#        [
	#          'dos',
	#          'emulator',
	#          1980
	#        ]
	#      );
	# 
	# Here's how you would use it:
	# CODE: 
	# 	for (@a) {
	# 		print "$_->[0] is an $_->[1] which was created circa $_->[2]\n";
	# 	}
	# 	RESULT:
	# unix is an OS which was created circa 1970
	# dos is an emulator which was created circa 1980
	#
	#------------------- output end  ---

=head2 A Hash of Arrays

	use Data::Dumper;
	open(G, '/etc/group');
	while ( <G> ) {
		chomp;
		next unless /bin|daemon|lp/o;
		my($g, @rest) = split /:/;
		$grent{$g} = [ @rest ];
	} 
	close G;
	#
	#print Data::Dumper->Dump( [ \%grent ], [ '*grent' ]);
	#
	for (sort keys %grent) {
		printf "Group: $_ has GID [%s], Members [%s]\n",
			$grent{$_}->[1],
			$grent{$_}->[2];
	}
	#
	#prints:
	#
	#------------------- output start---
	# Group: adm has GID [4], Members [root,adm,daemon]
	# Group: bin has GID [1], Members [root,bin,daemon]
	# Group: daemon has GID [2], Members [root,bin,daemon]
	# Group: lp has GID [7], Members [daemon,lp]
	# Group: sys has GID [3], Members [root,bin,adm]
	#
	#------------------- output end  ---


=head2 More complex datastuctures

These are limited only by your needs. Here's an example of
a self-referential datastructure which is quite complex:


	use Data::Dumper;
	$Data::Dumper::purity = 0;
	$ft{'GP'}{'name'} = 'grandpa';
	$ft{'GM'}{'name'} = 'grandma';
	$ft{'GP'}{'kids'} = [ \$ft{'Son'}, \$ft{'Daughter'} ];
	$ft{'Son'}{'mom'} = \$ft{'GM'};
	$ft{'Son'}{'name'} ='sunny';
	$ft{'Son'}{'friends'}  = [ 'shiva', 'mark', 'qiongling'];
	$ft{'Daughter'} = {
		'dad' => \$ft{'GP'},
		'dob' => '1/1/1970',
		'name' => 'Unix :-)'
	};
	print Data::Dumper->Dump( [ \%ft ], [ '*family_tree' ] );

	#
	#prints:
	#
	#------------------- output start---
	# %family_tree = (
	#      'Son' => {
	#           'mom' => \{
	#            'name' => 'grandma'
	#          },
	#           'friends' => [
	#              'shiva',
	#              'mark',
	#              'qiongling'
	#            ],
	#           'name' => 'sunny'
	#         },
	#      'GM' => ${$family_tree{'Son'}{'mom'}},
	#      'Daughter' => {
	#          'dob' => '1/1/1970',
	#          'dad' => \{
	#                 'name' => 'grandpa',
	#                 'kids' => [
	#                 \$family_tree{'Son'},
	#                 \$family_tree{'Daughter'}
	#               ]
	#               },
	#          'name' => 'Unix :-)'
	#        },
	#      'GP' => ${$family_tree{'Daughter'}{'dad'}}
	#    );
	#
	#------------------- output end  ---


for more details, refer to L<perldsc>.


=head1 Powerful perl builtins - grep

Perl's B<grep> is a powerful tool to create subsets of your data.
B<NOTE: the name grep comes from the 'ed' equivalent syntax for a general
regular expression match-and-print>:

	ed <<EOF /etc/passwd
	g/re/p
	EOF

Perl's grep is far more powerful than the external B<grep> programs that
you may use, in the following ways:

	1. It is builtin. This means it is very fast.

	2. You can utilize the full power of perl regex. Perl's regex is 
	probably the most widely used and optimized regex engine in the 
	world today. [It comes directly from Henry Spencer's original regex 
	package, with a lot of feature additions and speed improvements]

=head2 Perl manpage for 'grep'

     grep BLOCK LIST
     grep EXPR,LIST
             This is similar in spirit to, but not the same as,
             grep(1) and its relatives.  In particular, it is not
             limited to using regular expressions.

             Evaluates the BLOCK or EXPR for each element of LIST
             (locally setting $_ to each element) and returns the
             list value consisting of those elements for which
             the expression evaluated to true.  In scalar
             context, returns the number of times the expression
             was true.

                 @foo = grep(!/^#/, @bar);    # weed out comments

             or equivalently,

                 @foo = grep {!/^#/} @bar;    # weed out comments

             Note that $_ is an alias to the list value, so it
             can be used to modify the elements of the LIST.

=head2 Example usage:

	@array = ('This is a test', 'Testing times', 'Times Square');
	@subset = grep /test/i, @array;
	@word = grep /test\b/i, @array;
	
	$" = ", ";
	print "Array           : @array\n";
	print "'Test' anywhere : @subset\n";
	print "'Test' as a word: @word\n";
	#
	#prints:
	#
	#------------------- output start---
	# Array           : This is a test, Testing times, Times Square
	# 'Test' anywhere : This is a test, Testing times
	# 'Test' as a word: This is a test
	#
	#------------------- output end  ---


=head1 Powerful perl builtins - map

B<map> is a powerful tool to create B<filters/transforms> of your data.

=head2 Perl manpage for map

     map BLOCK LIST
     map EXPR,LIST
             Evaluates the BLOCK or EXPR for each element of LIST
             (locally setting $_ to each element) and returns the
             list value composed of the results of each such
             evaluation.  In scalar context, returns the total
             number of elements so generated.  Evaluates BLOCK or
             EXPR in list context, so each element of LIST may
             produce zero, one, or more elements in the returned
             value.

                 @chars = map(chr, @nums);

             translates a list of numbers to the corresponding
             characters.  And

                 %hash = map { getkey($_) => $_ } @array;

             is just a funny way to write

                 %hash = ();
                 foreach $_ (@array) {
                     $hash{getkey($_)} = $_;
                 }

=head2 Example for map:

	#URL-ifying a username
	(%users) = ('linus' => 'Linux', 'lwall' => 'Perl', 'rms' => 'Emacs');
	@users = sort keys %users;	
	@urls = map { qq{<A HREF="mailto:$_\@opensource.org">$_</A>} }
		@users;
	
	for (0..$#users) {
		print "Contact $urls[$_] if there are questions in $users{ $users[$_] }\n";
	}
	#
	#prints:
	#
	#------------------- output start---
	# Contact <A HREF="mailto:linus@opensource.org">linus</A> if there are questions in Linux
	# Contact <A HREF="mailto:lwall@opensource.org">lwall</A> if there are questions in Perl
	# Contact <A HREF="mailto:rms@opensource.org">rms</A> if there are questions in Emacs
	#
	#------------------- output end  ---

=head1 Powerful perl builtins - sort

B<sort>, as it's name implies, is the perl routine to sort a set of values. This is
based on the system B<qsort> routine (and is revised in perl5.7 onwards to give you
the option of using a stable mergesort>.

=head2 Syntax of 'sort' function

     sort SUBNAME LIST
     sort BLOCK LIST
     sort LIST
             In list context, this sorts the LIST and returns the
             sorted list value.  In scalar context, the behaviour
             of "sort()" is undefined.

=head2 Examples of sort

	@alpha = ('a'..'z');
	print "Alphabets = @alpha\n";
	@reversed = sort { $b cmp $a } @alpha;
	print "Reversed = @reversed\n";
	#
	#prints:
	#
	#------------------- output start---
	# Alphabets = a b c d e f g h i j k l m n o p q r s t u v w x y z
	# Reversed = z y x w v u t s r q p o n m l k j i h g f e d c b a
	#
	#------------------- output end  ---


	@decimals = (0..9);
	print "Decimal digits: @decimals\n";
	@rev = sort { $b <=> $a } @decimals;
	print "Reversed      : @rev\n";
	#
	#prints:
	#
	#------------------- output start---
	# Decimal digits: 0 1 2 3 4 5 6 7 8 9
	# Reversed      : 9 8 7 6 5 4 3 2 1 0
	#
	#------------------- output end  ---

=head2 Advanced example, using custom subroutine

	@old_ip = ('100.10.0.1', '19.168.0.1', '12.127.0.1', '172.0.0.2', '224.0.0.1');
	sub by_ip {
		my(@a) = split /\./, $a;
		my(@b) = split /\./, $b;
		$a[0] <=> $b[0] or
		$a[1] <=> $b[1] or
		$a[2] <=> $b[2] or
		$a[3] <=> $b[3]
	}
	$, = " : ";
	print "List of IPs          ", @old_ip, "\n";
	print "Sorted (default order", (sort @old_ip), "\n";
	print "\n";
	print "Sorted (using 'by_ip'", (sort by_ip @old_ip), "\n";
	#
	#prints:
	#
	#------------------- output start---
	# List of IPs           : 100.10.0.1 : 19.168.0.1 : 12.127.0.1 : 172.0.0.2 : 224.0.0.1 : 
	# Sorted (default order : 100.10.0.1 : 12.127.0.1 : 172.0.0.2 : 19.168.0.1 : 224.0.0.1 : 
	# 
	# Sorted (using 'by_ip' : 12.127.0.1 : 19.168.0.1 : 100.10.0.1 : 172.0.0.2 : 224.0.0.1 : 
	#
	#------------------- output end  ---

B<NOTE>: In any sort routine, the number I<COMPARISON> are determined
by the number of data entries, AND the spread of the data entries.

The worst case comparisons for a B<quicksort> is sometimes O(N^2) (quadratic).
Thus, if the I<number of IP addresses> is 1000, you may potentially be doing
anywhere between B<7000> to 500,000 calls to the B<by_ip> function.

How do we eliminate this overhead? Because of the datastructures and features
in perl, you have a few options.

=head1 Fundamental perl transforms for sorting

You can create powerful data transformations using the perl builtin 
B<sort>, B<map> and B<grep> functions. The following major canonical
transformations are essential knowledge for advanced programming tasks
using complex sorting/grouping/filtering.

=head2 The Orcish Maneuver

If you use a custom B<comparison> routine that does some heavy duty
computations to compare the keys, then I<it makes sense to reduce
the number of times the computation is done as a first order of
approximation>.

The B<Orcish> maneuver is an easy way to optimize this part of code
for I<any kind of comparison routine>. Here is a sequence of transforms
that lead to an orcish maneuver:

B<Step 1: Cache the computation>:

	%cache = ();
	@webpages = ('Mark', 'Seth', 'Shiva', 'Qiongling');
	@myranks{ @webpages } = (10, 7, 5, 4);
	
	for (@webpages) {
		$cache{$_} = get_pagerank($_);
	}
	@ranked = sort { $cache{$a} <=> $cache{$b} } @webpages;
	print "User websites   : @webpages\n";
	print "Websites by rank: @ranked\n";
	#
	sub get_pagerank { return $myranks{$_[0]} };
	

	#prints:
	#
	#------------------- output start---
	# User websites   : Mark Seth Shiva Qiongling
	# Websites by rank: Qiongling Shiva Seth Mark
	#
	#------------------- output end  ---

B<Step 2: move the computation into the comparison routine>

	%cache = ();
	@webpages = ('Mark', 'Seth', 'Shiva', 'Qiongling');
	@myranks{ @webpages } = (10, 7, 5, 4);
	sub pagerank { return $myranks{$_[0]} };
	#
	@ranked = sort { 
		($cached{$a} ||= pagerank($a))
			<=> 
		($cached{$b} ||= pagerank($b))
	} @webpages;
	print "User websites   : @webpages\n";
	print "Websites by rank: @ranked\n";
	#
	#prints:
	#
	#------------------- output start---
	# User websites   : Mark Seth Shiva Qiongling
	# Websites by rank: Qiongling Shiva Seth Mark
	#
	#------------------- output end  ---

=head2 The Scwartzian Transform (ST)

The B<ST> or I<Schwartzian Transform>, is named after Randal 
Schwartz who was the author of the first Usenet posting documenting
this approach. 

This approach can be illustrated by the following series of steps:

=head3 Optimized code with temporary variables

	@values = (some_values_function());
	#
	@transform = ();
	for (@values) {
		push @transform, compute($_);
	}
	@results_index = sort 
		{ $transform[$a] cmp $transform[$b] }
	0..$values;
	for (@results_index) {
		push @result, $values[ $_ ];
	}
	#
	#
	#now you have the result;

=head3 refactoring step 1: consolidate datastructures

	@values = (some_values_function());
	#
	for (@values) {
		push @transform, [ $_, compute($_) ];
	}
	@results_index = sort 
		{ $transform[$a]->[1] cmp $transform[$b]->[1] }
		0..$#values;
	for (@results_index) {
		push @result, $transform[$_]->[0];
	}


=head3 refactoring step 2: Use perl builtins 

	@values = (some_values_function());
	@transform = map { [ $_, compute($_) ] } @values;
	@result_xform = sort { $a->[1] cmp $b->[1] } @transform;
	@result = map { $_->[0] } @result_xform;


=head3 Final refactoring: remove all temporary stores

	@result = 
		map { $_->[0] }
		sort { $a->[1] cmp $b->[1] }
		map { [ $_, compute($_) ] }
			@values;

B<The above canonical form of sorting is known as the Schwartzian
Transform>. It eliminates temporary variables, makes the code
cleaner, and also results in minor speed improvements.

=head2 GRT: Guttman-Rosler Transform

B<ST> is very efficient, and improves the sort transformations to
O(N) computation (I<the same improvement as the Orcish maneuver>)
and in addition, eliminates intermediate variables. However, it
still requires a B<custom sort subroutine>. 

B<GRT> replaces the custom sort routine with perl's builtin sort,
and improves the speed of execution considerably.

The key to GRT is to I<find a transform that can convert the original
value into a FIXED length ASCII value that can be padded TO the original
value>.

Here's the GRT variation of the previous B<ST> example:

	#
	$transform_bytes = get_bytes();
	@result = 
		map { substr($_, $transform_bytes) }
		sort
		map { compute($_) . $_ }
		@values;

This is very powerful, and fast. For more information, see
L<"Web links">.

=head1 Code Refactoring Example 1: Sort IP-s by subnet

To illustrate the fact that B<there's more than one way to do it> in
perl, we will take a very simple example: given some IP addresses, sort 
them by network and host number. The approaches described here are not the
only ones.. they were chose for their gradation in complexity of algorithm
design and how easy it is to grow your algorithms as you go.

We will use the following list as an example list:

	@ip = ('223.1.3.4', '127.0.0.1', '192.168.100.1', '223.1.3.1');

After sorting them in `IP address' order, the output should look like:

	127.0.0.1 192.168.100.1 223.1.3.1 223.1.3.4

Where do we start?

The perl B<sort> function accepts an optional subroutine reference
or BLOCK of code as argument, which it uses every time it needs to
compare any two elements of the input array/list. The subroutine /
BLOCK may be anything you like, except that it should assume the
following: the comparison keys are available to your subroutine as
the I<global> variables B<$a> and B<$b>!

=head2 Using numeric sorting:

This method uses the standard B<split> command to extract the individual
numbers comprising the IP address. It then compares the respective bytes
numerically. The short-circuit nature of the B<or> operator ensures that
the sort terminates at the very first byte that is different.


	sub numeric {
		my($a1, $a2, $a3, $a4) = split /\./, $a;
		my($b1, $b2, $b3, $b4) = split /\./, $b;
		$a1 <=> $b1  or  $a2 <=> $b2 or $a3 <=> $b3 or $a4 <=> $b4;
	}
	@ip = ('223.1.3.4', '127.0.0.1', '192.168.100.1', '223.1.3.1');
	@result = sort numeric @ip;
	print "Sorted: @result\n";
	#
	#prints:
	#
	#------------------- output start---
	# Sorted: 127.0.0.1 192.168.100.1 223.1.3.1 223.1.3.4
	#
	#------------------- output end  ---

=head2 Using pack:

The B<pack> function in perl will allow you to compact values into a tight
structure which you can unpack later for use. This allows you to conserve
space AND also gain a measure of efficiency in passing data around.

	sub packed {
	   pack('C4', split(/\./, $a)) cmp pack('C4', split(/\./, $b));
	}
	@ip = ('223.1.3.4', '127.0.0.1', '192.168.100.1', '223.1.3.1');
	@result = sort packed @ip;
	print "Sorted: @result\n";
	#
	#prints:
	#
	#------------------- output start---
	# Sorted: 127.0.0.1 192.168.100.1 223.1.3.1 223.1.3.4
	#
	#------------------- output end  ---

=head2 Using the Orchish Maneuver

This is the same idea as above, but builds a I<cache> of already seen IP
addresses. This optimization will save you I<computation> time when you
have large sets of elements to sort.

	{
	   my %cache;
	   sub cached {
	      ($cache{$a} ||= pack('C4', split /\./, $a))
	         cmp
	      ($cache{$b} ||= pack('C4', split /\./, $b));
	   }
	}
	@ip = ('223.1.3.4', '127.0.0.1', '192.168.100.1', '223.1.3.1');
	@result = sort cached @ip;
	print "Sorted: @result\n";
	#
	#prints:
	#
	#------------------- output start---
	# Sorted: 127.0.0.1 192.168.100.1 223.1.3.1 223.1.3.4
	#
	#------------------- output end  ---


=head2 Benchmark results for IP sorting methods

Let's compare the three methods discussed so far:

	1. Do in-place computation within subroutine
	2. Use a precomputed cache
	3. Use the Orcish maneuver
	4. ST: Schwartzian transform
	5. GRT: Guttman-Rosler transform

=head3 Comparison of algorithms on Intel32/Redhat-7.2, 5.6Ghz/4GB

	------------------------------------------------------------------
	          Rate      Normal Pre-cache        ST    Orcish       GRT
	------------------------------------------------------------------
	Normal     195/s        --      -81%      -81%      -81%      -87%
	Orcish    1010/s      418%        --       -2%       -2%      -32%
	ST        1031/s      429%        2%        --        0%      -31%
	Pre-cache 1031/s      429%        2%        0%        --      -31%
	GRT       1493/s      666%       48%       45%       45%        --
	------------------------------------------------------------------

=head3 Comparison of algorithms on Intel32/Redhat-AS, 5.6Ghz/4GB

	------------------------------------------------------------------
	          Rate      Normal Pre-cache        ST    Orcish       GRT
	------------------------------------------------------------------
	Normal     199/s        --      -81%      -82%      -82%      -88%
	Pre-cache 1053/s      429%        --       -5%       -6%      -36%
	Orcish    1111/s      459%        6%        --       -1%      -32%
	ST        1124/s      465%        7%        1%        --      -31%
	GRT       1639/s      725%       56%       48%       46%        --
	------------------------------------------------------------------

=head3 Comparison of algorithms on Opteron/RH-AS3.0, 3.6Ghz/4GB

	------------------------------------------------------------------
	          Rate      Normal Pre-cache        ST    Orcish       GRT
	------------------------------------------------------------------
	Normal     204/s        --      -82%      -82%      -83%      -90%
	ST        1136/s      456%        --       -2%       -3%      -42%
	Pre-cache 1163/s      469%        2%        --       -1%      -41%
	Orcish    1176/s      475%        4%        1%        --      -40%
	GRT       1961/s      859%       73%       69%       67%        --
	------------------------------------------------------------------

=head3 Comparison of algorithms on Itanium/Linux, 1.8Ghz/8GB

	------------------------------------------------------------------
	          Rate      Normal Pre-cache        ST    Orcish       GRT
	------------------------------------------------------------------
	Normal    64.7/s        --      -83%      -84%      -84%      -89%
	ST         372/s      475%        --       -6%       -8%      -38%
	Orcish     394/s      509%        6%        --       -3%      -34%
	Pre-cache  406/s      527%        9%        3%        --      -32%
	GRT        596/s      821%       60%       51%       47%        --
	------------------------------------------------------------------

=head3 Comparison of algorithms on Solaris, 3.6Ghz/32GB

	------------------------------------------------------------------
	          Rate      Normal Pre-cache        ST    Orcish       GRT
	------------------------------------------------------------------
	Normal    51.2/s        --      -83%      -86%      -86%      -89%
	ST         306/s      498%        --      -14%      -14%      -35%
	Pre-cache  355/s      593%       16%        --       -1%      -25%
	Orcish     357/s      598%       17%        1%        --      -24%
	GRT        472/s      822%       54%       33%       32%        --
	------------------------------------------------------------------

=head1 Code refactoring Example 2: Histogram of Spreadsheet column data

System information summary data has patterns of information that are
useful for I.T/Engineering staff. For this problem, we can assume that
the following data is available in a spreadsheet format:

	Machine-name	OSname	OSversion	Patchlevel	IP-address	Network Users

Here is a set of reports someone may need:
	number of linux/solaris/other-os machines
	number of machines per network
	number of machines at each patch-level

B<In short, this is a frequency histogram> of the data. So here are the
specifications:

	1. Input is a file of Tab-delimited columns per line
	2. Parameter is "field number" (starting at 0)
	3. Output required: frequency distribution by unique values in field

=head2 example 2 using arrays

Here is the pseudocode:

	Copy contents into array
	foreach element of array
		split fields by TAB, find required field value
		push this output value into an array
	find all unique values	
	foreach unique value
		find total input lines matching unique value, PRINT


Here is the code:

   sub colprint_using_array {
	
      my(@contents) = @input;
   
      @devnull = ();
   
      foreach (@contents){
         chomp;
         $var = (split /\t/)[$field];
         if ( defined $var ) {
				push(@output, $var);
         }
      }
		foreach $out (@output) {
			$count{$out}++;
			next if $count{$out} > 1;
			push @unique, $out;
		}
		foreach $out (@unique) {
         $num=0;
         $num = (grep /^$out$/, @output);
         push @devnull, "$out\t$num\n";
      }
   }


=head2 example2, using hashes 

The above code is a good starting point, but it does the work twice. In the
next iteration, we would like to use the hash itself as a counter, and just
extract the unique values from the keys.

Here's the pseudocode:

	foreach line
		split fields by TAB, find required field value
		increment value of hash with this key
	foreach key in hash
		print key and counter

Here is the actual code:

   sub hash_optimized {
      my(@contents) = @input;
      @devnull = ();
      %Uniq = ();
      foreach (@contents){
         chomp;
         $var = (split /\t/)[$field];
         $Uniq{ $var }++ if defined $var;
      }
      for my $value (sort keys %Uniq) {
         push @devnull, "$value\t$Uniq{$value}\n";
      }
   }

=head2 example 2, using hash and map

In the final version, we will take out a loop and replace it with map:

   
   sub perlish {
      my(@contents) = @input;
      @devnull = ();
      %Uniq = ();
      $Uniq{ (split /\t/)[$field] || '' }++ for (@contents);
      @devnull = map { "$_\t$Uniq{$_}\n" } sort keys %Uniq;
   }

The code is more I<idiomatic> perl, and is readable easily, since we
have eliminated the temporary variable used to store the field value.

=head2 Benchmark results for example 2

 	                    Rate original_Array Hash_optimized Perl_idiomatic
	original_Array    7.06/s             --           -89%           -89%
	Hash_optimized    61.7/s           774%            --             -2%
	Perl_idiomatic    63.3/s           796%             3%             --

=head1 CGI.pm Examples

=head2 CGI.pm hello world

	use CGI::Pretty qw/:standard -no_xhtml/;
	print start_html("Test"),
	h1("Hello world!\n"), end_html;

=head2 CGI.pm tables

	use CGI::Pretty qw/:standard -no_xhtml *table/;
	open(F, '/etc/passwd') or die "passwd: $!\n";
	my(@rows);
	while ( <F> ) {
		chomp;
		next unless /sys|root|server/i;
		push @rows, split /:/; #really (split /:/, $_)
	}

	#just the table, no header/title
	print start_table;
	for (@rows) {
		@columns =  @{ $_ };
		print TR( 
				td(
					{ bgcolor=>'#99CCFF'},  
					[ @columns ] 
				)
				) . "\n";
	}
	print end_table();

=head2 CGI.pm tables with map

	use CGI::Pretty qw/:standard -no_xhtml *table/;
	open(F, '/etc/passwd') or die "passwd: $!\n";
	print table(
	TR( [
		map { td( [ split /:/ ] ) }
		grep /sys|root|server/i, <F>
		]
	)
	);


=head1 Resources

As mentioned before, this document is merely a primer. If you need a
better, deeper and thorough understanding, throw this away and turn to the 
following resources. Give yourself atleast a year. It may save
you decades of grunt work! It might change your career. You may end
up saving the world!

=head2 Web links

Schwartzian transform: http://www.perl.com/doc/FMTEYEWTK/sort.html

Perl documentation: http://www.perl.com/


=head2 Documents

	perl manual pages
	perlfaq (perldoc perlfaq)
	perldoc perlstyle (for style issues)

=head2 Books

	Programming Perl (Larry Wall, Tom Christiansen, Randal Schwartz)
	Learning Perl (Randal Schwartz, Tom Christiansen)
	Perl Cookbook (Tom Christiansen, Nathan Torkington)
	Mastering Regular Expressions (Jeffrey Friedl)
	Perl The programmer's Companion (Nigel S. Chapman)
	Effective Perl programming (Joseph N. Hall, Randal L. Schwartz)
	Cross Platform Perl (Eric F. Johnson)
	Programming with CGI.pm, By Lincoln D. Stein
	Object Oriented Perl (Prof. Damian Conway)
	Unix Network Programming, by W. Richard Stevens
	Advanced Perl Programming (Sriram Srinivasan)

=head2 Newsgroups/mailing lists

	comp.lang.perl.misc, comp.lang.perl.moderated

=head2 Links

	Perl home page:   http://www.perl.com/
	CPAN multiplexer: http://www.perl.com/CPAN/
	The Perl Journal: http://tpj.com/
	Perl Month      : http://perlmonth.com/
	Apache perl     : http://perl.apache.org/
	Perl Mongers    : http://www.pm.org/
	Perl testers    : http://testers.cpan.org/
	Perl History    : http://history.perl.org/

=head1 Author, Copyright and Credits

E<copy> Ramki Balasubramanian (ramki@pinjax.com), 2004. All rights reserved.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with L<"Author, Copyright and Credits"> section being the Invariant Sections, 
no Front-Cover Texts, and no Back-Cover Texts.

Irrespective of the copyright, all the code contained in this
document is in the public domain. You may use it as such.

=head2 Credits

Thanks to: Larry Wall for perl; Tom Christiansen and Randal Schwartz for
making perl accessible in the form of documentation, japhs and articles;
The perl porters, CPAN maintainers, and the perl community for valuable
information available through books, articles and CPAN modules.

=head2 Disclaimer

This information is offered in the hope that it may be of use, but
is not guaranteed to be correct, up to date, or suitable for any
particular purpose whatsoever.  I accept no liability in respect of
the correctness of this information or its use.

$Id: interperl.txt,v 1.15 2004/05/14 05:49:01 ramki Exp $