I’m evaluating Python and Ruby as replacements for Perl . I’ve been using Perl for several years and am very comfortable with it, although I’m definitely not an expert. Perl is a powerful language, but I think it’s ugly and encourages writing bad code, so I want to get rid of it. Python and Ruby both come with Mac OS X 10.2, both have BBEdit language modules, and both promise a cleaner approach to scripting. Over the past few weeks I read the Python Tutorial and the non-reference parts of Programming Ruby , however as of this afternoon I’d not written any Python or Ruby code yet.
Here’s a toy problem I wanted to solve. eSellerate gives me a tab-delimited file containing information about the people who bought my shareware . I wanted a script to extract from this file the e-mail addresses of people who asked to be contacted when I release the new versions of the products.
I decided to solve this problem in each language and then compare the resulting programs. The algorithm I chose was just the first one that came to mind. I coded it first in Ruby, and then ported the code to Python and Perl, changing it as little as possible. Thus, the style is perhaps not canonical Python or Perl, although since I’m new to Ruby it’s probably not canonical Ruby either. If I were just writing this in Perl, I might have tried to avoid Perl’s messy syntax for nested arrays and instead used an array of strings.
Here’s the basic algorithm:
record
.records
and fill it with all the record
s.contactRecords
, that contains arrays of just the fields we care about: SKUTITLE, CONTACTME, EMAIL.contactRecords
by SKUTITLE.contactRecords
where CONTACTME is not 1.contactRecords
to standard output, with the fields separated by tabs and the records separated by newlines.And here’s the code:
#!/usr/bin/perl -w use strict; my @records = (); foreach my $line ( <> ) { my @record = map {s/"//g; $_} split("\t", $line); push(@records, \@record); } my $EMAIL = 17; my $CONTACTME = 27; my $SKUTITLE = 34; my @contactRecords = (); foreach my $r ( @records ) { push(@contactRecords, [$$r[$SKUTITLE], $$r[$CONTACTME], $$r[$EMAIL]]); } @contactRecords = sort {$$a[0] cmp $$b[0]} @contactRecords; @contactRecords = grep($$_[1] eq "1", @contactRecords); foreach my $r ( @contactRecords ) { print join("\t", @$r), "\n"; }
The punctuation and my
’s make this harder to read than it should be.
#!/usr/bin/python import fileinput records = [] for line in fileinput.input(): record = [field.replace('"', '') for field in line.split("\t")] records.append(record) EMAIL = 17 CONTACTME = 27 SKUTITLE = 34 contactRecords = [[r[SKUTITLE], r[CONTACTME], r[EMAIL]] for r in records] contactRecords.sort() # default sort will group by sku title contactRecords = filter(lambda r: r[1] == "1", contactRecords) for r in contactRecords: print "\t".join(r)
I think the Python version is generally the cleanest to read—that is, it’s the most English-like. I had to look up how join
and filter
worked, because they weren’t methods of list
as I had guessed.
#!/usr/bin/ruby records = [] while gets record = $_.split('\t').collect! {|field| field.gsub('"', '') } records << record end EMAIL = 17 CONTACTME = 27 SKUTITLE = 34 contactRecords = records.collect {|r| [r[SKUTITLE], r[CONTACTME], r[EMAIL]] } contactRecords.sort! # default sort will group by sku title contactRecords.reject! {|a| a[1] != "1"} contactRecords.each {|r| print r.join("\t"), "\n" }
This is actually the shortest version, and I think it’s the easiest to read if you aren’t put off by the block syntax. I like how the sequence of operations in the first line of the while
isn’t “backwards” as it is in the Perl and Python versions. Also, I correctly guessed which classes “owned” the methods and whether they were mutators.