share
Stack OverflowShould we hire someone who writes C in Perl?
[+57] [23] paxdiablo
[2009-06-09 06:20:42]
[ perl ]
[ https://stackoverflow.com/questions/968441/should-we-hire-someone-who-writes-c-in-perl ]

One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.

Since my colleague didn't know Perl, he asked me for a critique of some code written (off-site) by that potential hire, so I had a look and told him my concerns (the main one was that it originally had no comments and it's not like we gave them enough time).

However, the code works so I'm loath to say no-go without some more input. Another concern is that this code basically looks exactly how I'd code it in C. It's been a while since I did Perl (and I didn't do a lot, I'm more a Python bod for quick scripts) but I seem to recall that it was a much more expressive language than what this guy used.

I'm looking for input from real Perl coders, and suggestions for how it could be improved (and why a Perl coder should know that method of improvement).

You can also wax lyrical about whether people who write one language in a totally different language should (or shouldn't be hired). I'm interested in your arguments but this question is primarily for a critique of the code.

The spec was to successfully process a CSV file as follows and output the individual fields:

User ID,Name , Level,Numeric ID
pax, Pax Morgan ,admin,0
gt,"  Turner, George" rubbish,user,1
ms,"Mark \"X-Men\" Spencer","guest user",2
ab,, "user","3"

The output was to be something like this (the potential hire's code actually output this):

User ID,Name , Level,Numeric ID:
   [User ID]
   [Name]
   [Level]
   [Numeric ID]
pax, Pax Morgan ,admin,0:
   [pax]
   [Pax Morgan]
   [admin]
   [0]
gt,"  Turner, George  " rubbish,user,1:
   [gt]
   [  Turner, George  ]
   [user]
   [1]
ms,"Mark \"X-Men\" Spencer","guest user",2:
   [ms]
   [Mark "X-Men" Spencer]
   [guest user]
   [2]
ab,, "user","3":
   [ab]
   []
   [user]
   [3]

Here is the code they submitted:

#!/usr/bin/perl

# Open file.

open (IN, "qq.in") || die "Cannot open qq.in";

# Process every line.

while (<IN>) {
    chomp;
    $line = $_;
    print "$line:\n";

    # Process every field in line.

    while ($line ne "") {
        # Skip spaces and start with empty field.

        if (substr ($line,0,1) eq " ") {
            $line = substr ($line,1);
            next;
        }

        $field = "";
        $minlen = 0;

        # Detect quoted field or otherwise.

        if (substr ($line,0,1) eq "\"") {
            $line = substr ($line,1);
            $pastquote = 0;
            while ($line ne "") {
                # Special handling for quotes (\\ and \").

                if (length ($line) >= 2) {
                    if (substr ($line,0,2) eq "\\\"") {
                        $field = $field . "\"";
                        $line = substr ($line,2);
                        next;
                    }
                    if (substr ($line,0,2) eq "\\\\") {
                        $field = $field . "\\";
                        $line = substr ($line,2);
                        next;
                    }
                }

                # Detect closing quote.

                if (($pastquote == 0) && (substr ($line,0,1) eq "\"")) {
                    $pastquote = 1;
                    $line = substr ($line,1);
                    $minlen = length ($field);
                    next;
                }

                # Only worry about comma if past closing quote.

                if (($pastquote == 1) && (substr ($line,0,1) eq ",")) {
                    $line = substr ($line,1);
                    last;
                }
                $field = $field . substr ($line,0,1);
                $line = substr ($line,1);
            }
        } else {
            while ($line ne "") {
                if (substr ($line,0,1) eq ",") {
                    $line = substr ($line,1);
                    last;
                }
                if ($pastquote == 0) {
                    $field = $field . substr ($line,0,1);
                }
                $line = substr ($line,1);
            }
        }

        # Strip trailing space.

        while ($field ne "") {
            if (length ($field) == $minlen) {
                last;
            }
            if (substr ($field,length ($field)-1,1) eq " ") {
                $field = substr ($field,0, length ($field)-1);
                next;
            }
            last;
        }

        print "   [$field]\n";
    }
}
close (IN);
(22) It looks clean and structured to me. These are good things. - quant_dev
(46) This code barely qualifies as Perl code. - Brad Gilbert
(14) Having written a CSV parser myself before, and swore never to do again and just use libraries, I'm almost certain his parser will die if it sees a linefeed in the middle of a text field. You can't read CSV linewise.It only passes trivial examples of CSV, try it on a large dataset and watch it explode. - Kent Fredric
(8) LOL - writing C in Perl is like writing Shakespeare in Piglatin. - Steven A. Lowe
Thanks heaps, guys. My colleague certainly has a bit of reading to do. I've accepted Brian's answer (since I believe the community should decide rather than me - I've never really liked the accepted-by-user/best-by community dichotomy that much though I can see why it's needed). Everyone else that provided a useful answer will get a vote. - paxdiablo
(1) Whoops, out of votes, I'll have to come back tomorrow for the rest of you. - paxdiablo
(56) Readable Perl? I never thought I'd see the day. :-) - Thomas
(4) It does not look like Perl, and that is a good thing! - Escualo
(1) @Brad, You say that like thats a bad thing ;-) - Martin Beckett
(1) This guy's code made me laugh so hard! Who can write Perl like that and honestly say they know Perl? Saint Larry Wall would be disappointed. - amphetamachine
@Brad Gilbert which is great :) - dvhh
(1) Apparently @Thomas, @Arrieta, and @dvhh, don't know very much Perl. Every part of that piece of code could be written to be more legible to people who don't program in Perl. Not only only that, it would be significantly shorter if it was rewritten by someone that actually knows how to write Perl. - Brad Gilbert
This is a good question, but belongs on Programmers SE, not Stack Overflow. - Andrew Grimm
The most important aspect of this code is that it has comments. - Will Sheppard
(2) This person knows how to program, and that's a good thing! They don't know what tools they have available to them in Perl, but that can be learned. - ikegami
I came back across this, and decided to write a Perl 6 program that has the same output for practice. gist.github.com/b2gills/6ef4b37353fb10a99a955fabd843c210 - Brad Gilbert
[+167] [2009-06-09 06:52:57] brian d foy [ACCEPTED]

I advise people to never hire Perl programmers, or C programmers, or Java programmers, and so on. Just hire good people. The programmers who I've hired to write Perl were also skilled in various other languages. I hired them because they were good programmers, and good programmers can deal with multiple languages.

Now, that code does look a lot like C, but I think it's fine Perl too. If you're hiring a good programmer, with a little Perl practice under his belt he'll catch up just fine. People are complaining about the lack of regexes, which would make things simpler in ancillary areas, but I wouldn't wish on anyone a regex solution on parsing that dirty CSV data. I wouldn't want to read it or maintain it.

I often find that the reverse problem is more troublesome: hire a good programmer who writes good Perl code, but the rest of the team only knows the basics of Perl and can't keep up. This has nothing to do with poor formatting or bad structure, just a level of skill with advanced topics (e.g. closures).


Things are getting a bit heated in this debate, so I think I should explain more about how I deal with this sort of thing. I don't see this as a regex / no-regex problem. I wouldn't have written the code the way the candidate did, but that doesn't really matter.

I write quite a bit of crappy code too. On the first pass, I'm usually thinking more about structure and process than syntax. I go back later to tighten that up. That doesn't mean that the candidate's code is any good, but for a first pass done in an interview I don't judge it too harshly. I don't know how much time he had to write it and so on, so I don't judge it based on something I would have had a long time to work on. Interview questions are always weird because you can't do what you'd really do for real work. I'd probably fail a question about writing a CSV parser too if I had to start from scratch and do it in 15 minutes. Indeed, I wasted more than that today being a total bonehead with some code.

I went to look at the code for Text::CSV_PP [1], the Pure Perl cousin to Text::CSV_XS [2]. It uses regular expressions, but a lot of regular expressions that handle special cases, and in structure isn't that different from the code presented here. It's a lot of code, and it's complicated code that I hope I never have to look at again.

What I tend to disfavor are interview answers that only address the given input. That's almost always the wrong thing to do in the real world where you have to handle cases that you may not have discovered yet and you need the flexibility to deal with future issues. I find that missing from a lot of answers on Stackoverflow too. The thought process of the solution is more telling to me. People become skilled at a language more easily than they change how they think about things. I can teach people how to write better Perl, but I can't change their wetware for the most part. That comes from scars and experience.

Since I wasn't there to see the candidate code the solution or ask him follow-up questions, I won't speculate on why he wrote it the way he did. For some of the other solutions I've seen here, I could be equally harsh in an interview.

A career is a journey. I don't expect everyone to be a guru or to have the same experiences. If I write-off people because they don't know some trick or idiom, I'm not giving them the chance to continue their journey. The candidate's code won't win any prizes, but apparently it was enough to get him into the final three for consideration for an offer. The guy got up there and tried, did much better than a lot of code I've seen in my life, and that's good enough for me.

[1] http://search.cpan.org/dist/Text-CSV
[2] http://search.cpan.org/dist/Text-CSV_XS

(2) Your last point contradicts your initial argument. If you have a good Perl programmer and you hire good programmers to your team, they should have no problems becoming good Perl programmers, and having a rock star one to lead them can only produce faster and better results. - Artem Russakovskii
(27) My last point contradicts nothing. A rock star doesn't mean they are effective communicators or mentors, or that they aren't assholes to people who they think are inferior. A good programmer isn't the same thing as a good team member. - brian d foy
(7) Brian: there's a problem here: namely, your opinion that the mess above is easier to maintain than a clear regular expression – just because it's ostensibly clean code. Clean code isn't everything: the above is 100% pure visual clutter that hides the semantics in a lot of syntax noise. - Konrad Rudolph
(9) True CSV almost certainly can't be parsed by a regex that anyone would ever be able to understand after writing it. Although it is tempting to believe that every problem solved with Perl needs a big regex, there really is more than one way to do it. brian d foy (note no caps) is right on point here. - RBerteig
I agree with Konrad - imho a regex, even a highly complex one, is an outright better solution to this, the readability problem (if it is indeed a problem, which I doubt: I find the if/else logic much more jarring) is solved by a comment block explaining it. - annakata
(3) If someone as respected as brian is within the Perl community says that that's decent Perl code I'm inclined to agree. With that said, I wouldn't rate the developer as "very good." While there's nothing wrong with the code (particularly the types of things novices tend to do) it isn't idiomatic and it contains some outdated constructs. The applicant is competent at Perl but clearly not an advanced user. - Michael Carman
RBerteig: sorry but I completely disagree. CSV formats, even complicated ones, are still very easy to parse, almost naturally using regular expressions. Also, regular expressions in Perl can span multiple lines, contain formatting and comments. All in all, this allows for concise, efficient and well-maintainable code. - Konrad Rudolph
(16) Konrad, you're just wrong. Properly parsing CSV requires keeping track of the parse state as you go (am I in a quoted field, did I just see an escape). That sort of stuff makes for messy or (near-)impossible regexes. I say near-impossible because Perl does in fact have features for maintaining state inside regexes. Nonetheless, the proper way to parse CSV in Perl is to do it a character at a time. That said, I've asked similar questions in an interview, and by far the best answer is "I would use Text::CSV(_XS) from CPAN." - Dave Rolsky
(1) Dave: sorry, it's the other way round: you are wrong. Parsing quoted strings with escaped quote marks in regex is trivial, no extra state has to be maintained: /"(\\"|[^"])*"/ (= either we've got an escaped quote, or any character except a quote; multiple times). It really is that easy. By the way, nothing against using a module here. But someone who doesn't even use Perl features for stripping the input off whitespace is clearly nowhere near professional in Perl. - Konrad Rudolph
(2) Everything in that script is a Perl feature. - brian d foy
(2) @Konrad, it sounds simple, but if you've ever written one, you'll see quickly regex become impossibly complicated, especially when you discover quoted fields can have line feeds in them. For instance, the OP's parser will have problems with these because it works linewise, and will consider \n an end of a line. So you cant process it linewise, so you have 2 options: a regex on a possibly huge string representing the whole file, or a state machine. - Kent Fredric
Kent: aside from the fact that the Perl code in question also doesn't represent a state machine: there's a 1:1 correlation between regex and state machines. In general, there's no reason to prefer the latter to the former. Additionally, I don't even say that everything should be written in one huge regex, only that for the current case it's rather trivial. I really want to stress this, it's not complicated at all. And certainly adapting a manually implemented state machine isn't any easier than adapting a regex. - Konrad Rudolph
(20) Konrad: since regexes are so easy to use to parse CSV, why not just post one? - jrockway
(4) This answer is a non sequitur: The OP didn't ask a philosophical question about who can blossom into being a great Perl programmer, the OP was talking about a hiring decision for a current Perl programmer. This code is not the code of someone who is "very good" at Perl. Granted, it does show a basic understanding of Perl's substr function; it does not demonstrate understanding of modularization, data structures or even Perl idiom. For code written "off site" it shows only the bare minimum of producing an answer and thus should be a NO HIRE for someone who claims "real" experience. - jaredor
(2) @jaredor: even if the person claims "real experience", they might just not have the actual experience to judge their experience. That doesn't mean that they can't quickly learn or jump to the next level. The NO HIRE police seem to be motivated more on revenge at some past wrong than anything else. My answer, although not directly aimed at the asked question, was apparently the answer needed. Notice that it was accepted. I try to answer the questions people have, not the questions they ask. The need to ask the question means the people don't have the grasp of the situation to ask the right one. - brian d foy
(2) @'brian d foy' Notice that "Accepted" is not a synonym for "Correct"; per the pop-up that appeared when I downvoted this "best answer", it can still benefit from criticism. I do think you've been given too much of a pass on how relevant your observations are to this hiring situation. In general, I mostly agree with you on hiring programmers, but the OP was talking about input NOW for hiring NOW. In particular, I think that this answer didn't honor the technical reviewer's instincts, which were clearly that something was amiss. Reframing the question was not a service in this instance. - jaredor
(5) @jaredor: think what you like, but it's still the answer the OP liked the most. Maybe you didn't like it, and that's fine. You aren't everyone, you're not the OP, and you aren't most people who voted on this. Your minority opinion is noted and filed. - brian d foy
@Dave Rolsky: The problem is not really the concept of regexes, which in principle can operate on any stream, but rather their implementation in Perl (and most other languages), which artificially limits them to operating on in-memory strings. OTOH the Boost Regex implementation in C++ requires just a pair of bidirectional iterators, which could be implemented directly on top of a file. - j_random_hacker
(2) @brian - just stumbled upon this specific Q, and I must add my respectful disagreement. The OP explicitly stated "very good Perl experience" coupled with "off-site" (e.g. not under super-time pressure) code. This code is not awful, it's downright not-too-damn-bad. BUT... it's nowhere NEAR the quality of code that a real Perl developer with good experience produces nearly automatically (e.g. adding $! at the end of IO-related die). So this gentleman likely mis-represented himself) - DVK
@DVK don't assume malice in this person, especially since you don't know him and have never spoken to him. As I said in an earlier comment, people without experience often can't properly and absolutely rate their own skill level. He might be a Perl god in his circles and just hasn't seen the bigger world out there. I don't know the situation, so I'm not going to make accusations, like you and so many other people are quick to make. - brian d foy
(4) @brian - even if your generous interpretation (misinformed big fish in small pond with overinflated self opinion) is correct, do YOU want to hire a guy who's so bad at his profession that he failed to continue learning to the point he's not aware there's "bigger world out there"? I sure would be very cautious - again, it's NOT the quality of his code that would give me pause but the dissonance, whatever the reason/excuse for that dissonance is. - DVK
1
[+85] [2009-06-09 12:25:12] jrockway

His code is a little verbose. Perl is all about modules, and avoiding them makes your life hard. Here is an equivalent to what you posted that I wrote in about two minutes:

 #!/usr/bin/env perl

 use strict;
 use warnings;

 use Text::CSV;

 my $parser = Text::CSV->new({
     allow_whitespace   => 1,
     escape_char        => '\\',
     allow_loose_quotes => 1,
 });

 while(my $line = <>){
     $parser->parse($line) or die "Parse error: ". $parser->error_diag;
     my @row = $parser->fields;
     print $line;
     print "\t[$_]\n" for @row;
 }

(26) The only correct answer for parsing CSV in Perl is to use a module. CSV is nasty, and it's easy to make little mistakes. Let someone else deal with it (they already have). - Dave Rolsky
(12) This is imho the only correct answer. Wastes less time, and will actually work. And strangers reading your code wont die from sheer confusion when they have to debug your CSV parser later ( because they will ). - Kent Fredric
(1) Anybody who claimed to know Perl well, should know about Text::CSV and regex's, that would be my only concern. Did he ask if he could use a library? - Chris Huang-Leaver
(5) Perhaps the exercise was esp. to implement it from scratch? - Albert
(25) To be fair to the poor sucker who got this sprung on him at an interview. Was the test set in such a way that it was clear using modules was OK? Was the environment set up well enough that he could find and use modules like Text::cvs? (This is a common perl problem -- you get told perl is installed and then you find out that only the interpeter is installed, all the "standard" modules are missing and cpan can't get past the corporate firewall). - James Anderson
2
[+43] [2009-06-09 06:51:04] Copas

I would argue writing C in Perl is a much better situation than writing Perl in C. As is often brought up on the SO podcast, understanding C is a virtue that not all developers (even some good ones) have nowadays. Hire them and buy a copy of Perl Best Practices [1] for them and you will be set. After best practices a copy of Intermediate Perl [2] and they could work out.

[1] https://rads.stackoverflow.com/amzn/click/com/0596001738
[2] https://rads.stackoverflow.com/amzn/click/com/0596102062

(4) I wish i could +2 or something. this is sound advice. - Ape-inago
(2) I agree that C is a part of a good foundation for any developer. - Akers
3
[+42] [2009-06-09 06:45:57] Jonathan Leffler

It isn't dreadfully idiomatic Perl, but it isn't completely dreadful Perl either (though it could be much more compact).

Two warning bells - the shebang line doesn't include '-w' and there is neither 'use strict;' nor 'use warnings;'. This is very old-style Perl; good Perl code uses both warnings and strict.

The use of old-style file handles is no longer recommended, but it isn't automatically bad (it could be code written more than 10 years ago, perhaps).

The non-use of regular expressions is a bit more surprising. For example:

# Process every field in line.
while ($line ne "") {
    # Skip spaces and start with empty field.

    if (substr ($line,0,1) eq " ") {
        $line = substr ($line,1);
        next;
    }

That could be written:

while ($line ne "") {
    $line =~ s/^\s+//;

This chops off all leading spaces using a regex, without making the code iterate around the loop. A good deal of the rest of the code would benefit from carefully written regular expressions too. These are a characteristically Perl idiom; it is surprising to see that they are not being used.

If efficiency was the proclaimed concern (reason for not using regexes), then the questions should be "did you measure it" and "what sort of efficiency are you discussing - machine, or programmer"?

Working code counts. More or less idiomatic code is better.

Also, of course, there are modules Text::CSV and Text::CSV_XS that could be used to handle CSV parsing. It would be interesting to enquire whether they are aware of Perl modules.


There are also multiple notations for handling quotes within quoted fields. The code appears to assume that backslash-quote is appropriate; I believe Excel uses doubled up quotes:

"He said, ""Don't do it"", but they didn't listen"

This could be matched by:

$line =~ /^"([^"]|"")*"/;

With a bit of care, you could capture just the text between the enclosing quotes. You'd still have to post-process the captured text to remove the embedded doubled up quotes.

A non-quoted field would be matched by:

$line =~ /^([^,]*)(?:,|$)/;

This is enormously shorter than the looping and substringing shown.


Here's a version of the code, using the backslash-double quote escape mechanism used in the code in the question, that does the same job.

#!/usr/bin/perl -w

use strict;

open (IN, "qq.in") || die "Cannot open qq.in";

while (my $line = <IN>) {
    chomp $line;
    print "$line\n";

    while ($line ne "") {
        $line =~ s/^\s+//;
        my $field = "";
        if ($line =~ m/^"((?:[^"]|\\.)*)"([^,]*)(?:,|$)/) {
            # Quoted field
            $field = "$1$2";
            $line = substr($line, length($field)+2);
            $field =~ s/""/"/g;
        }
        elsif ($line =~ m/^([^,]*)(?:,|$)/) {
            # Unquoted field
            $field = "$1";
            $line = substr($line, length($field));
        }
        else {
            print "WTF?? ($line)\n";
        }
        $line =~ s/^,//;
        print "   [$field]\n";
    }
}
close (IN);

It's under 30 non-blank, non-comment lines, compared with about 70 in the original. The original version is bigger than it needs to be by some margin. And I've not gone out of my way to reduce this code to the minimum possible.


(6) Well, it's under 30 now, but when you have to go back to add /x after the team review it will balloon again. Then, after /x makes it look messy, you move the regexes out of the way by putting them into scalars with qr//, you add a bit more. But, maybe you get some of that back when you use \G so you don't have to modify $line, but then nobody remembers how \G works. :) - brian d foy
(3) I'd be worried if someone made a 30-character regex extend over 30 lines with /x -- I'm sure it could be done, but it wouldn't be more readable (not at that extreme). But I agree - the compactness is a variable quantity (or quality). - Jonathan Leffler
(2) now make it work on CSV with line feeds in quoted fields, go on, dare you to. ;). And it has to work with 2G csv files. - Kent Fredric
(1) Not using regexes - at that point, it becomes a task for one of the CSV modules. - Jonathan Leffler
4
[+32] [2009-06-09 09:16:31] mirod

No use strict/use warnings, systematic use of substr instead of regexp, no use of modules. This is definitely not someone who has "very good Perl experience". At least not for real-life Perl projects. Like you, I suspect that it's probably a C programmer with a basic knowledge of Perl.

That doesn't mean that they can't learn, especially as there are other Perl people around. It does seem to mean that they overstated their qualification for the job though. A few more questions about how exactly they acquired that very good Perl experience would be in order.


5
[+28] [2009-06-09 09:25:38] innaM

I don't care whether he used regular expressions or not. I also don't care whether his Perl looks like C or not. The question that really matters is: is this good Perl? And I'd say it's not:

  1. He didn't use use strict
  2. He didn't enable warnings.
  3. He's using the old-fashioned two-argument version of open.
  4. The "open file" comment hurts and gives me the impression that code he usually writes doesn't contain any comments.
  5. The code is hard to maintain
  6. Was he allowed to use CPAN modules? A good Perl programmer would look at that option first.

(7) I think the answer to #6 is (or was perceived to be) "no." The spec is so simple and pointless that I think this is a more advanced version of the FizzBuzz question. Personally, I would have submitted two versions: One that showed that I had the necessary knowledge to solve the problem myself (hand-rolled CSV parsing) and one that showed how I would really do it in a production environment (leveraging CPAN). - Michael Carman
(4) you conclude too much from the "open file" comment. I often outline what I want to write by putting those sorts of comments in place first, then filling in the code. I get the steps down, then I code. - brian d foy
and 8. He didn't use any modules. Period. Seriously? Perl user? - Kent Fredric
(1) Pardon the noobish question, but what is the alternative to the two-argument open? Everywhere I look that's the only opening command being used (aside from the diamond operator, although that's intended for command-line use). - Cooper
@Cooper: The problem with the two-argument open() is that it will interpret special characters in the second arg which tell it how to open the file (for reading, for writing etc.). It's modeled after how a Unix shell works, but is dangerous if you don't control the second parameter (e.g. if it's user-supplied). Thus the 3-arg-form is recommended, where no such "magic parsing" happens to the filename. See e.g. perldoc.perl.org/perlopentut.html for details. - sleske
6
[+22] [2009-06-09 07:10:55] Konrad Rudolph

I have to (sort of) disagree with most views expressed here.

Since the code in question could be expressed much more compact and maintainable in idiomatic Perl, you really need to pose the question how much time the candidate spend developing this solution and how much time would have been spent by someone halfway proficient using idiomatic Perl.

I think you'll find that this coding style may be a huge waste of time (and thus the company's money).

I don't argue that every Perl programmer needs to grok [1] the language – that, unfortunately, would be far-fetched – but they should know enough to not spend ages re-implementing core language features in their code over and over again.

EDIT Looking at the code again, I've got to be more drastic: although the code looks very clean, it's actually horrible. Sorry. This isn't Perl. Do you know the saying “you can program Fortran in any language”? Yes, you can. But you shouldn't.

[1] http://en.wikipedia.org/wiki/Grok

You can't do anything great in a language like this until you are ready to let go of the familiar and embrace the constructs fully - Tom Leys
You can pretty much stop reading the original code after the first line (the 'open', not the shebang). Why open a file when it will do just fine to use <>? Why use a bare file-handle? Why die without $! in the error message? This is not the code of an experienced perl programmer. - William Pursell
7
[+14] [2009-06-09 06:51:59] SPWorley

This is a case where you need to follow up with the programmer. Ask him why he wrote it this way.

There may be a very good reason.. perhaps this needed to follow the same behavior as existing code and therefore he did a line by line translation on purpose for full compatability. If so, give him points for a decent explaination.

Or perhaps he doesn't know Perl, so he learned it that afternoon to answer the question. If so, give him points for fast and nimble learning skills.

The only disqualifying comment may be "I always program Perl this way. I don't understand that regexp stuff."


8
[+9] [2009-06-09 06:41:08] thijs

Does it work? Did he write it in an acceptable period of time? Do you think it's maintainable?

If you can answer me these questions three, they you may pass the bridge of death ( * [1]).

[1] http://www.youtube.com/watch?v=4b4bGAoVR7g

(6) Laden or unladen swallow? ... Yes, yes and borderline. - paxdiablo
(3) aaaaaaaaaaaaaarghh.. (you just fell into the ravine) - thijs
(7) Python fans (Monty, not Guido, style) should never be allowed to communicate with each other - it always devolves into quoting entire movies :-) - paxdiablo
9
[+9] [2009-06-09 06:45:16] Bill Karwin

I'd say his code is an adequate solution. It works, doesn't it? And there's an advantage to maintainability by writing "longhand" instead of in as few characters of code as you can.

The motto of Perl is " There's More Than One Way To Do It [1]." Perl doesn't really get on your case about coding style, as some languages do (I like Python too, but you've got to admit that people can get kind of snobbish when evaluating whether code is "pythonic" or not).

[1] http://catb.org/~esr/jargon/html/T/TMTOWTDI.html

there is a difference between having maintainable code and having code that spells out every little bit. If you are maintaining perl code, you should know perl enough not to need it written out like that. I'd have a hard time maintaining it as it is, because, well, it doesn't look like normal perl. - Ape-inago
10
[+9] [2009-06-10 00:40:00] Unknown

One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.

If this person thinks he has very good Perl experience and he writes Perl like this, he is probably a victim of the Dunning-Kruger effect [1].

So, that's a no-hire.

[1] http://en.wikipedia.org/wiki/Dunning-Kruger_effect

11
[+8] [2009-06-09 06:44:50] Artem Russakovskii

I think the biggest problem is that he or she didn't show any knowledge of regex. And that is key in Perl.

The question is, can they learn? There is so much to look for in a candidate past this piece of code.


It's actually down to three people, all who met the core requirements (including keenness and ability to learn). We're just looking for differentiators between them now. That's a good point about regex. - paxdiablo
12
[+5] [2009-06-09 06:55:37] Erich Kitzmueller

I wouldn't accept the candidate. He or she isn't comfortable with Perl's idioms, which will result in suboptimal code, less work efficieny (all those unnecessary lines have to be written!) and a inablilty to read code written by an experienced Perl coder (who of course uses regexes etc. at large).

But it works... [1]

[1] http://thedailywtf.com

Those are several unwarranted assumptions based on such little evidence. - brian d foy
(2) brian: If you hire someone, you must base your judgement on whatever information you have. I might be wrong, but it's better to dismiss a good programmer than to hire a bad one. - Erich Kitzmueller
(2) +1 for your "But it works..." link. So true. - jrockway
@ammoQ: you don't have to hire someone with limited information. That's why you should have good interviewers and follow up on references. In this case, apparently both are lacking. - brian d foy
brian: Well, you shouldn't hire someone with limited information, but on the other hand, you probably don't want to spend lots of money to thoroughly investigate a candidate who already looks bad at the initial screening. But of course it depends on the number of candidates for the opening. The more you have, the more you dismiss. - Erich Kitzmueller
13
[+5] [2009-06-09 15:09:39] Johan Soderberg

Just the initial block indicates that he has missed the fundamentals about Perl.

    while ($line ne "") {
    # Skip spaces and start with empty field.

    if (substr ($line,0,1) eq " ") {
        $line = substr ($line,1);
        next;
    }

That should at least be written using a regular expression to remove leading white space. I like the answer from jrockway best [1], modules rock. Though I would have used regular expressions to do it, something like.

#!/usr/bin/perl -w
#
# $Id$
#

use strict;

open(FD, "< qq.in") || die "Failed to open file.";
while (my $line = <FD>) {
    # Don't like chomp.
    $line =~ s/(\r|\n)//g;
    # ".*?[^\\\\]"  = Match everything between quotations that doesn't end with
    # an escaped quotation, match lazy so we will match the shortest possible.
    # [^",]*?       = Match strings that doesn't have any quotations.
    # If we combine the two above we can match strings that contains quotations
    # anywhere in the string (or doesn't contain quotations at all).
    # Put them together and match lazy again so we can match white-spaces
    # and don't include them in the result.
    my $match_field = '\s*((".*?[^\\\\]"|[^",]*?)*)\s*';
    if (not $line =~ /^$match_field,$match_field,$match_field,$match_field$/) {
        die "Invalid line: $line";
    }
    # Put values in nice variables so we don't have to deal with cryptic $N
    # (and can use $1 in replace).
    my ($user_id, $name, $level, $numeric_id) = ($1, $3, $5, $7);
    print "$line\n";
    for my $field ($user_id, $name, $level, $numeric_id) {
        # If the field starts with a quotation,
        # strip everything after the first unescaped quotation.
        $field =~ s/^"(.*?[^\\\\])".*/$1/g;
        # Now fix all escaped variables (not only quotations).
        $field =~ s/\\(.)/$1/g;
        print "   [$field]\n";
    }
}
close FD;
[1] https://stackoverflow.com/questions/968441/should-we-hire-someone-who-writes-c-in-perl/969775#969775

(1) From a lot of the posts it seems many are against using regexp, but looking at the sample data and the result I think regexp are good. Since the sample isn't a valid csv-file Text::CSV will break on 'gt," Turner, George " rubbish,user,1'. It will display: [gt] [" Turner] [George " rubbish] [user] [1] It should however strip the ' rubbish' text from the string. Go with regexp, live a little! - Johan Soderberg
14
[+5] [2009-06-09 15:19:21] Joshua

Forgive this guy. I would not have dared to parse CSV with a regex even though it can be done.

The DFA in structured code is more obvious than the regex here and DFA -> regex translation is nontrivial and prone to stupid mistakes.


15
[+3] [2009-06-09 06:51:44] sbidwai

Maybe ask him to write more versions of the same code? When in doubt about hiring, ask more questions to candidate.


16
[+3] [2009-06-09 06:52:32] Vinko Vrsalovic

The fact that he hasn't used a single piece of regex in the code should make you ask him a lot of questions about why he did write like that.

Maybe he's Jamie Zawinski or a fan and he didn't want to have more problems?

I'm not necessarily saying that the whole parsing should be a huge amount of unreadable CSV parsing regex like ("([^"]*|"{2})*"(,|$))|"[^"]*"(,|$)|[^,]+(,|$)|(,) or one of the many similar regex around, but at least to traverse the lines or instead of using substring().


17
[+3] [2010-03-01 22:43:15] itub

Not only does the code suggest that the candidate doesn't really know Perl, but all those lines that say $line = substr ($line,1) are dreadful in any language. Try parsing a long line (say a few thousand fields) using that type of approach and you will see why. It just goes to show the sort of problem that Joel Spolsky discussed in this post [1].

[1] http://www.joelonsoftware.com/articles/fog0000000319.html

18
[+1] [2009-06-09 07:00:54] jalf

An obvious question might be, if you don't use Perl at your company in the first place, does it matter how pretty his Perl code is?

I'm not sure the elegance of his Perl code says much about his skills in whatever language you're actually using.


(1) That's a good point, except for the fact that we are using Perl. This position is a backfill for someone who had to, shall we say, leave in a hurry. I may have confused you. I haven't done Perl for a while, but my colleagues group does use it. - paxdiablo
Ah, fair enough then. :) Since you and the interviewer apparently don't use perl, I assumed it just wasn't relevant for the job. - jalf
19
[+1] [2009-06-10 00:26:03] Serapth

As a non Perl (?programmer?), I have to say, that is probably the most legible Perl I have ever read! :)

Hiring someone over something like a scripting language that can be learned in days to weeks (if it's a worthwhile scripting language!) seems highly flawed in the first place.

Personally I would probably hire this person for different reasons. The code is well structured and reasonably well commented. Language specifics can easily be taught later.


p3rl.org/Moose , makes Perl really readable IMO. - Kent Fredric
(4) -1 for being wrong about scripting languages and language idioms. - Sean McMillan
20
[+1] [2009-12-23 14:00:49] Thorbjørn Ravn Andersen

The crucial point here is - naturally after assuring that it works as expected - whether the code is maintainable.

  • Did you understand it?
  • Would you feel comfortable fixing a bug in it?

Perl programs have a tendency for looking like what a cat types by accident when walking on the keyboard. If this person knows how to write readable Perl code that fits to the team, this is actually a good thing.

Then again, you may want to teach him about regular expressions, but only carefully :-)


21
[+1] [2010-10-26 14:36:53] luis.espinal

Code looks clean and readable. For that size, it does not require that much comments (perhaps none at all.) It's not just about good comments, but also good code, and the later is more important than the former.

If we were looking at a more complex/larger piece of code, I would say that comments are needed. But for that (specially the way it was written - well written), I don't think so.

I think it is unfair and vain to put doubt on the applicant given the piece of code submitted by him/her is quite acceptable and did the job.


22
[0] [2009-10-04 05:49:28] chopppen5

Hmm, I don't see anything in the request that quotes should be removed, and words should be removed. The input file had the word " rubbish", and it's not in the output.

I've seen CSV files, exported with quotes, would expect those same quotes back. If your specification had been to remove quotes and extraneous words past quotes, maybe this work would be required.

I'd watch that, and the verbosity. Look for somebody lazier (compliment in Perl).

open (IN, "csv.csv");
while (<IN>) {
    #print $_;
    chomp;
    @array = split(/,/,$_);
    print "[User Id] =  $array[0]  [Name] = $array[1]  [Level] =  $array[2] [Numeric ID] = $array[3]\n";    
}

The word 'rubbish' in the data appeared in a stretch of data like: ,"George Turner" rubbish,. This is ill-formed CSV data. CSV is supposed to have either plain text between commas or quoted text between commas; this has some of both. One legitimate way to treat malformed data is to ignore the invalid part (hence 'George Turner' as chosen by the answer); another is to consume it and treat the result as legitimate (hence 'George Turner rubbish', as you seem to prefer). The tag 'rubbish' suggests that the former was appropriate; the issue can be discussed with the interviewer. - Jonathan Leffler
23