Question

One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.

Since my colleague didn't know Perl, he asked me for a critique of some code written (off-site) by that potential hire, so I had a look and told him my concerns (the main one was that it originally had no comments and it's not like we gave them enough time).

However, the code works so I'm loath to say no-go without some more input. Another concern is that this code basically looks exactly how I'd code it in C. It's been a while since I did Perl (and I didn't do a lot, I'm more a Python bod for quick scripts) but I seem to recall that it was a much more expressive language than what this guy used.

I'm looking for input from real Perl coders, and suggestions for how it could be improved (and why a Perl coder should know that method of improvement).

You can also wax lyrical about whether people who write one language in a totally different language should (or shouldn't be hired). I'm interested in your arguments but this question is primarily for a critique of the code.

The spec was to successfully process a CSV file as follows and output the individual fields:

User ID,Name , Level,Numeric ID
pax, Pax Morgan ,admin,0
gt,"  Turner, George" rubbish,user,1
ms,"Mark \"X-Men\" Spencer","guest user",2
ab,, "user","3"

The output was to be something like this (the potential hire's code actually output this):

User ID,Name , Level,Numeric ID:
   [User ID]
   [Name]
   [Level]
   [Numeric ID]
pax, Pax Morgan ,admin,0:
   [pax]
   [Pax Morgan]
   [admin]
   [0]
gt,"  Turner, George  " rubbish,user,1:
   [gt]
   [  Turner, George  ]
   [user]
   [1]
ms,"Mark \"X-Men\" Spencer","guest user",2:
   [ms]
   [Mark "X-Men" Spencer]
   [guest user]
   [2]
ab,, "user","3":
   [ab]
   []
   [user]
   [3]

Here is the code they submitted:

#!/usr/bin/perl

# Open file.

open (IN, "qq.in") || die "Cannot open qq.in";

# Process every line.

while (<IN>) {
    chomp;
    $line = $_;
    print "$line:\n";

    # Process every field in line.

    while ($line ne "") {
        # Skip spaces and start with empty field.

        if (substr ($line,0,1) eq " ") {
            $line = substr ($line,1);
            next;
        }

        $field = "";
        $minlen = 0;

        # Detect quoted field or otherwise.

        if (substr ($line,0,1) eq "\"") {
            $line = substr ($line,1);
            $pastquote = 0;
            while ($line ne "") {
                # Special handling for quotes (\\ and \").

                if (length ($line) >= 2) {
                    if (substr ($line,0,2) eq "\\\"") {
                        $field = $field . "\"";
                        $line = substr ($line,2);
                        next;
                    }
                    if (substr ($line,0,2) eq "\\\\") {
                        $field = $field . "\\";
                        $line = substr ($line,2);
                        next;
                    }
                }

                # Detect closing quote.

                if (($pastquote == 0) && (substr ($line,0,1) eq "\"")) {
                    $pastquote = 1;
                    $line = substr ($line,1);
                    $minlen = length ($field);
                    next;
                }

                # Only worry about comma if past closing quote.

                if (($pastquote == 1) && (substr ($line,0,1) eq ",")) {
                    $line = substr ($line,1);
                    last;
                }
                $field = $field . substr ($line,0,1);
                $line = substr ($line,1);
            }
        } else {
            while ($line ne "") {
                if (substr ($line,0,1) eq ",") {
                    $line = substr ($line,1);
                    last;
                }
                if ($pastquote == 0) {
                    $field = $field . substr ($line,0,1);
                }
                $line = substr ($line,1);
            }
        }

        # Strip trailing space.

        while ($field ne "") {
            if (length ($field) == $minlen) {
                last;
            }
            if (substr ($field,length ($field)-1,1) eq " ") {
                $field = substr ($field,0, length ($field)-1);
                next;
            }
            last;
        }

        print "   [$field]\n";
    }
}
close (IN);

Answer 1

I advise people to never hire Perl programmers, or C programmers, or Java programmers, and so on. Just hire good people. The programmers who I've hired to write Perl were also skilled in various other languages. I hired them because they were good programmers, and good programmers can deal with multiple languages.

Now, that code does look a lot like C, but I think it's fine Perl too. If you're hiring a good programmer, with a little Perl practice under his belt he'll catch up just fine. People are complaining about the lack of regexes, which would make things simpler in ancillary areas, but I wouldn't wish on anyone a regex solution on parsing that dirty CSV data. I wouldn't want to read it or maintain it.

I often find that the reverse problem is more troublesome: hire a good programmer who writes good Perl code, but the rest of the team only knows the basics of Perl and can't keep up. This has nothing to do with poor formatting or bad structure, just a level of skill with advanced topics (e.g. closures).

Things are getting a bit heated in this debate, so I think I should explain more about how I deal with this sort of thing. I don't see this as a regex / no-regex problem. I wouldn't have written the code the way the candidate did, but that doesn't really matter.

I write quite a bit of crappy code too. On the first pass, I'm usually thinking more about structure and process than syntax. I go back later to tighten that up. That doesn't mean that the candidate's code is any good, but for a first pass done in an interview I don't judge it too harshly. I don't know how much time he had to write it and so on, so I don't judge it based on something I would have had a long time to work on. Interview questions are always weird because you can't do what you'd really do for real work. I'd probably fail a question about writing a CSV parser too if I had to start from scratch and do it in 15 minutes. Indeed, I wasted more than that today being a total bonehead with some code.

I went to look at the code for Text::CSV_PP ^[1], the Pure Perl cousin to Text::CSV_XS ^[2]. It uses regular expressions, but a lot of regular expressions that handle special cases, and in structure isn't that different from the code presented here. It's a lot of code, and it's complicated code that I hope I never have to look at again.

What I tend to disfavor are interview answers that only address the given input. That's almost always the wrong thing to do in the real world where you have to handle cases that you may not have discovered yet and you need the flexibility to deal with future issues. I find that missing from a lot of answers on Stackoverflow too. The thought process of the solution is more telling to me. People become skilled at a language more easily than they change how they think about things. I can teach people how to write better Perl, but I can't change their wetware for the most part. That comes from scars and experience.

Since I wasn't there to see the candidate code the solution or ask him follow-up questions, I won't speculate on why he wrote it the way he did. For some of the other solutions I've seen here, I could be equally harsh in an interview.

A career is a journey. I don't expect everyone to be a guru or to have the same experiences. If I write-off people because they don't know some trick or idiom, I'm not giving them the chance to continue their journey. The candidate's code won't win any prizes, but apparently it was enough to get him into the final three for consideration for an offer. The guy got up there and tried, did much better than a lot of code I've seen in my life, and that's good enough for me.

[1] http://search.cpan.org/dist/Text-CSV
[2] http://search.cpan.org/dist/Text-CSV_XS

Answer 2

His code is a little verbose. Perl is all about modules, and avoiding them makes your life hard. Here is an equivalent to what you posted that I wrote in about two minutes:

 #!/usr/bin/env perl

 use strict;
 use warnings;

 use Text::CSV;

 my $parser = Text::CSV->new({
     allow_whitespace   => 1,
     escape_char        => '\\',
     allow_loose_quotes => 1,
 });

 while(my $line = <>){
     $parser->parse($line) or die "Parse error: ". $parser->error_diag;
     my @row = $parser->fields;
     print $line;
     print "\t[$_]\n" for @row;
 }

Answer 3

I would argue writing C in Perl is a much better situation than writing Perl in C. As is often brought up on the SO podcast, understanding C is a virtue that not all developers (even some good ones) have nowadays. Hire them and buy a copy of Perl Best Practices ^[1] for them and you will be set. After best practices a copy of Intermediate Perl ^[2] and they could work out.

[1] https://rads.stackoverflow.com/amzn/click/com/0596001738
[2] https://rads.stackoverflow.com/amzn/click/com/0596102062

Answer 4

It isn't dreadfully idiomatic Perl, but it isn't completely dreadful Perl either (though it could be much more compact).

Two warning bells - the shebang line doesn't include '-w' and there is neither 'use strict;' nor 'use warnings;'. This is very old-style Perl; good Perl code uses both warnings and strict.

The use of old-style file handles is no longer recommended, but it isn't automatically bad (it could be code written more than 10 years ago, perhaps).

The non-use of regular expressions is a bit more surprising. For example:

# Process every field in line.
while ($line ne "") {
    # Skip spaces and start with empty field.

    if (substr ($line,0,1) eq " ") {
        $line = substr ($line,1);
        next;
    }

That could be written:

while ($line ne "") {
    $line =~ s/^\s+//;

This chops off all leading spaces using a regex, without making the code iterate around the loop. A good deal of the rest of the code would benefit from carefully written regular expressions too. These are a characteristically Perl idiom; it is surprising to see that they are not being used.

If efficiency was the proclaimed concern (reason for not using regexes), then the questions should be "did you measure it" and "what sort of efficiency are you discussing - machine, or programmer"?

Working code counts. More or less idiomatic code is better.

Also, of course, there are modules Text::CSV and Text::CSV_XS that could be used to handle CSV parsing. It would be interesting to enquire whether they are aware of Perl modules.

There are also multiple notations for handling quotes within quoted fields. The code appears to assume that backslash-quote is appropriate; I believe Excel uses doubled up quotes:

"He said, ""Don't do it"", but they didn't listen"

This could be matched by:

$line =~ /^"([^"]|"")*"/;

With a bit of care, you could capture just the text between the enclosing quotes. You'd still have to post-process the captured text to remove the embedded doubled up quotes.

A non-quoted field would be matched by:

$line =~ /^([^,]*)(?:,|$)/;

This is enormously shorter than the looping and substringing shown.

Here's a version of the code, using the backslash-double quote escape mechanism used in the code in the question, that does the same job.

#!/usr/bin/perl -w

use strict;

open (IN, "qq.in") || die "Cannot open qq.in";

while (my $line = <IN>) {
    chomp $line;
    print "$line\n";

    while ($line ne "") {
        $line =~ s/^\s+//;
        my $field = "";
        if ($line =~ m/^"((?:[^"]|\\.)*)"([^,]*)(?:,|$)/) {
            # Quoted field
            $field = "$1$2";
            $line = substr($line, length($field)+2);
            $field =~ s/""/"/g;
        }
        elsif ($line =~ m/^([^,]*)(?:,|$)/) {
            # Unquoted field
            $field = "$1";
            $line = substr($line, length($field));
        }
        else {
            print "WTF?? ($line)\n";
        }
        $line =~ s/^,//;
        print "   [$field]\n";
    }
}
close (IN);

It's under 30 non-blank, non-comment lines, compared with about 70 in the original. The original version is bigger than it needs to be by some margin. And I've not gone out of my way to reduce this code to the minimum possible.

Answer 5

No use strict/use warnings, systematic use of substr instead of regexp, no use of modules. This is definitely not someone who has "very good Perl experience". At least not for real-life Perl projects. Like you, I suspect that it's probably a C programmer with a basic knowledge of Perl.

That doesn't mean that they can't learn, especially as there are other Perl people around. It does seem to mean that they overstated their qualification for the job though. A few more questions about how exactly they acquired that very good Perl experience would be in order.

Answer 6

I don't care whether he used regular expressions or not. I also don't care whether his Perl looks like C or not. The question that really matters is: is this good Perl? And I'd say it's not:

He didn't use use strict
He didn't enable warnings.
He's using the old-fashioned two-argument version of open.
The "open file" comment hurts and gives me the impression that code he usually writes doesn't contain any comments.
The code is hard to maintain
Was he allowed to use CPAN modules? A good Perl programmer would look at that option first.

Answer 7

I have to (sort of) disagree with most views expressed here.

Since the code in question could be expressed much more compact and maintainable in idiomatic Perl, you really need to pose the question how much time the candidate spend developing this solution and how much time would have been spent by someone halfway proficient using idiomatic Perl.

I think you'll find that this coding style may be a huge waste of time (and thus the company's money).

I don't argue that every Perl programmer needs to grok ^[1] the language – that, unfortunately, would be far-fetched – but they should know enough to not spend ages re-implementing core language features in their code over and over again.

EDIT Looking at the code again, I've got to be more drastic: although the code looks very clean, it's actually horrible. Sorry. This isn't Perl. Do you know the saying “you can program Fortran in any language”? Yes, you can. But you shouldn't.

[1] http://en.wikipedia.org/wiki/Grok

Answer 8

This is a case where you need to follow up with the programmer. Ask him why he wrote it this way.

There may be a very good reason.. perhaps this needed to follow the same behavior as existing code and therefore he did a line by line translation on purpose for full compatability. If so, give him points for a decent explaination.

Or perhaps he doesn't know Perl, so he learned it that afternoon to answer the question. If so, give him points for fast and nimble learning skills.

The only disqualifying comment may be "I always program Perl this way. I don't understand that regexp stuff."

Answer 9

Does it work? Did he write it in an acceptable period of time? Do you think it's maintainable?

If you can answer me these questions three, they you may pass the bridge of death ( * ^[1]).

[1] http://www.youtube.com/watch?v=4b4bGAoVR7g

Answer 10

I'd say his code is an adequate solution. It works, doesn't it? And there's an advantage to maintainability by writing "longhand" instead of in as few characters of code as you can.

The motto of Perl is " There's More Than One Way To Do It ^[1]." Perl doesn't really get on your case about coding style, as some languages do (I like Python too, but you've got to admit that people can get kind of snobbish when evaluating whether code is "pythonic" or not).

[1] http://catb.org/~esr/jargon/html/T/TMTOWTDI.html

Answer 11

One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.

If this person thinks he has very good Perl experience and he writes Perl like this, he is probably a victim of the Dunning-Kruger effect ^[1].

So, that's a no-hire.

[1] http://en.wikipedia.org/wiki/Dunning-Kruger_effect

Answer 12

I think the biggest problem is that he or she didn't show any knowledge of regex. And that is key in Perl.

The question is, can they learn? There is so much to look for in a candidate past this piece of code.

Answer 13

I wouldn't accept the candidate. He or she isn't comfortable with Perl's idioms, which will result in suboptimal code, less work efficieny (all those unnecessary lines have to be written!) and a inablilty to read code written by an experienced Perl coder (who of course uses regexes etc. at large).

But it works... ^[1]

[1] http://thedailywtf.com

Answer 14

Just the initial block indicates that he has missed the fundamentals about Perl.

    while ($line ne "") {
    # Skip spaces and start with empty field.

    if (substr ($line,0,1) eq " ") {
        $line = substr ($line,1);
        next;
    }

That should at least be written using a regular expression to remove leading white space. I like the answer from jrockway best ^[1], modules rock. Though I would have used regular expressions to do it, something like.

#!/usr/bin/perl -w
#
# $Id$
#

use strict;

open(FD, "< qq.in") || die "Failed to open file.";
while (my $line = <FD>) {
    # Don't like chomp.
    $line =~ s/(\r|\n)//g;
    # ".*?[^\\\\]"  = Match everything between quotations that doesn't end with
    # an escaped quotation, match lazy so we will match the shortest possible.
    # [^",]*?       = Match strings that doesn't have any quotations.
    # If we combine the two above we can match strings that contains quotations
    # anywhere in the string (or doesn't contain quotations at all).
    # Put them together and match lazy again so we can match white-spaces
    # and don't include them in the result.
    my $match_field = '\s*((".*?[^\\\\]"|[^",]*?)*)\s*';
    if (not $line =~ /^$match_field,$match_field,$match_field,$match_field$/) {
        die "Invalid line: $line";
    }
    # Put values in nice variables so we don't have to deal with cryptic $N
    # (and can use $1 in replace).
    my ($user_id, $name, $level, $numeric_id) = ($1, $3, $5, $7);
    print "$line\n";
    for my $field ($user_id, $name, $level, $numeric_id) {
        # If the field starts with a quotation,
        # strip everything after the first unescaped quotation.
        $field =~ s/^"(.*?[^\\\\])".*/$1/g;
        # Now fix all escaped variables (not only quotations).
        $field =~ s/\\(.)/$1/g;
        print "   [$field]\n";
    }
}
close FD;

[1] https://stackoverflow.com/questions/968441/should-we-hire-someone-who-writes-c-in-perl/969775#969775

Answer 15

Forgive this guy. I would not have dared to parse CSV with a regex even though it can be done.

The DFA in structured code is more obvious than the regex here and DFA -> regex translation is nontrivial and prone to stupid mistakes.

Answer 16

Maybe ask him to write more versions of the same code? When in doubt about hiring, ask more questions to candidate.

Answer 17

The fact that he hasn't used a single piece of regex in the code should make you ask him a lot of questions about why he did write like that.

Maybe he's Jamie Zawinski or a fan and he didn't want to have more problems?

I'm not necessarily saying that the whole parsing should be a huge amount of unreadable CSV parsing regex like ("([^"]*|"{2})*"(,|$))|"[^"]*"(,|$)|[^,]+(,|$)|(,) or one of the many similar regex around, but at least to traverse the lines or instead of using substring().

Answer 18

Not only does the code suggest that the candidate doesn't really know Perl, but all those lines that say $line = substr ($line,1) are dreadful in any language. Try parsing a long line (say a few thousand fields) using that type of approach and you will see why. It just goes to show the sort of problem that Joel Spolsky discussed in this post ^[1].

[1] http://www.joelonsoftware.com/articles/fog0000000319.html

Answer 19

An obvious question might be, if you don't use Perl at your company in the first place, does it matter how pretty his Perl code is?

I'm not sure the elegance of his Perl code says much about his skills in whatever language you're actually using.

Answer 20

Code looks clean and readable. For that size, it does not require that much comments (perhaps none at all.) It's not just about good comments, but also good code, and the later is more important than the former.

If we were looking at a more complex/larger piece of code, I would say that comments are needed. But for that (specially the way it was written - well written), I don't think so.

I think it is unfair and vain to put doubt on the applicant given the piece of code submitted by him/her is quite acceptable and did the job.

Answer 21

As a non Perl (?programmer?), I have to say, that is probably the most legible Perl I have ever read! :)

Hiring someone over something like a scripting language that can be learned in days to weeks (if it's a worthwhile scripting language!) seems highly flawed in the first place.

Personally I would probably hire this person for different reasons. The code is well structured and reasonably well commented. Language specifics can easily be taught later.

Answer 22

The crucial point here is - naturally after assuring that it works as expected - whether the code is maintainable.

Did you understand it?
Would you feel comfortable fixing a bug in it?

Perl programs have a tendency for looking like what a cat types by accident when walking on the keyboard. If this person knows how to write readable Perl code that fits to the team, this is actually a good thing.

Then again, you may want to teach him about regular expressions, but only carefully :-)

Answer 23

Hmm, I don't see anything in the request that quotes should be removed, and words should be removed. The input file had the word " rubbish", and it's not in the output.

I've seen CSV files, exported with quotes, would expect those same quotes back. If your specification had been to remove quotes and extraneous words past quotes, maybe this work would be required.

I'd watch that, and the verbosity. Look for somebody lazier (compliment in Perl).

open (IN, "csv.csv");
while (<IN>) {
    #print $_;
    chomp;
    @array = split(/,/,$_);
    print "[User Id] =  $array[0]  [Name] = $array[1]  [Level] =  $array[2] [Numeric ID] = $array[3]\n";    
}