share
Stack OverflowCriticize my code, please
[+32] [15] Micky
[2010-03-14 01:17:52]
[ c self-improvement career-development ]
[ http://stackoverflow.com/questions/2440803] [DELETED]

Hey, I was applying for a position, and they asked me to complete a coding problem for them. I did so and submitted it, but I later found out I was rejected from the position. Anyways, I have an eclectic programming background so I'm not sure if my code is grossly wrong or if I just didn't have the best solution out there. I would like to post my code and get some feedback about it. Before I do, here's a description of a problem:

You are given a sorted array of integers, say, {1, 2, 4, 4, 5, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 11, 13 }. Now you are supposed to write a program (in C or C++, but I chose C) that prompts the user for an element to search for. The program will then search for the element. If it is found, then it should return the first index the entry was found at and the number of instances of that element. If the element is not found, then it should return "not found" or something similar. Here's a simple run of it (with the array I just put up):

Enter a number to search for: 4

4 was found at index 2.
There are 2 instances for 4 in the array.

Enter a number to search for: -4.

-4 is not in the array.

They made a comment that my code should scale well with large arrays (so I wrote up a binary search). Anyways, my code basically runs as follows:

EX: 4, 4, 4, 4, 4

The bold 4 is the value the binary search landed on. One loop will check to the left of it, and another loop will check to the right of it. Their sum will be the total number of instances of the the number four.

Anyways, I don't know if there are any advanced techniques that I am missing or if I just don't have the CS background and made a big error. Any constructive critiques would be appreciated!

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>

/* function prototype */

int get_num_of_ints(
            const int* arr,
            size_t r,
            int N,
            size_t* first,
            size_t* count
    );

int main()
{   
    int N;                                 /* input variable */
    int arr[]={1,1,2,3,3,4,4,4,4,5,5,7,7,7,7,8,8,8,9,11,12,12}; /* sorted */
    size_t r = sizeof(arr)/sizeof(arr[0]); /* right bound */
    size_t first;                          /* first match index */
    size_t count;                          /* total number of matches */

    /* prompts the user to enter input */

    printf( "\nPlease input the integer you would like to find.\n" );
    scanf( "%d", &N );

    int a = get_num_of_ints( arr, r, N, &first, &count );

    /* If the function returns -1 then the value is not found.
        Else it is returned */ 

    if( a == -1)    
        printf( "%d has not been found.\n", N );

    else if(a >= 0){
        printf( "The first matching index is %d.\n", first );
        printf( "The total number of instances is %d.\n", count );
    }

    return 0;
}

/* function definition */

int get_num_of_ints(
        const int* arr,
        size_t r,
        int N,
        size_t* first,
        size_t* count)
{
    int lo=0;    /* lower bound for search */
    int m=0;     /* middle value obtained */
    int hi=r-1;  /* upper bound for search */

    int w=r-1;   /* used as a fixed upper bound to calculate the number of
                    right instances of a particular value. */


    /* binary search to find if a value exists */

    /* first check if the element is out of bounds */

    if( N < arr[0] || arr[hi] < N ){
        m = -1;
    }
    else{ /* binary search to find value if it exists within given params */

        while(lo <= hi){
            m = (hi + lo)/2;

            if(arr[m] < N)
                lo = m+1;
            else if(arr[m] > N)
                hi = m-1;
            else if(arr[m]==N){
                m=m;
                break;
            }
        }
        if (lo > hi)    /* if it doesn't we assign it -1 */
         m = -1;
    }   


    /* If the value is found, then we compute left and right instances of it */ 

    if( m >= 0 ){   

        int j = m-1;    /* starting with the first term to the left */
        int L = 0;  /* total number of left instances */

        /* while loop computes total number of left instances */

        while( j >= 0 && arr[j] == arr[m] ){
            L++;
            j--;
        }


        /* There are six possible outcomes of this. Depending on the outcome,
           we must assign the first index variable accordingly */

        if( j > 0 && L > 0 )
            *first=j+1;
        else if( j==0 && L==0)
            *first=m;
        else if( j > 0 && L==0 )
            *first=m;
        else if(j < 0 && L==0 )
            *first=m; 
        else if( j < 0 && L > 0 )
            *first=0;
        else if( j=0 && L > 0 )
            *first=j+1;

        int h = m + 1;  /* starting with the first term to the right */
        int R = 0;      /* total number of right instances */

        /* while loop computes total number of right instances */
        /* we fixed w earlier so that it's value does not change */ 

        while( arr[h]==arr[m] && h <= w ){
            R++;
            h++;
        }

        *count = (R + L + 1); /* total num of instances stored into count */
        return *first;        /* first instance index stored here */
    }

    /* if value does not exist, then we return a negative value */

    else if( m==-1)
        return -1;
}   
(20) As a starting advice: "commnent less" - Andrei Ciobanu
(1) Too bad you picked C. std::equal_range makes this one really easy. - Billy ONeal
(2) plus for aspiring to improve. - Martin
Couldn't three of the cases in that six-way bit be refactored into if( L == 0 ) *first = m;? And the first and last cases look like they could be collapsed also, so that leaves only three cases -- much nicer. - Michael Myers
(1) This code is fine, absolutely fine. Rest assured you didn't do anything godawful. The only worthwhile suggestion I've seen on this page is that the linear-time searches for the start and endpoints can be made log-time, but even that I would call nitpicking. You had exactly the right idea. Take heart, there are a lot of "programmers" who get FizzBuzz wrong in interviews (google it) and you're clearly not one of them. :) - j_random_hacker
[+16] [2010-03-14 02:40:24] Todd Gardner

I would have a few concerns about hiring someone who submitted this for a code sample. Here is what I see.

First, addressing overall design, the algorithm is suboptimal, and is worst case linear instead of worst case logarithmic, because it doesn't use a binary search to find the amount of elements, but a linear one.

Second, (and this is what would have really killed it for me) the variable names. Most of these are either one or two letters, and the code is very unreadable because of it. Giving your variables descriptive names is important for maintainability.

Third, ignoring standard libraries. Unless instructed not to use them, you should prefer standard libraries which have implementations of binary search (e.g. the stl or bsearch [1])

Fourth, why have get_num_of_ints return -1 has a magic value feeling to me; better to just set count = 0 and check that.

Fifth, get_num_of_ints is just far too long, and tries to do way too much. It badly needs to be broken up.

Sixth (and this is a personal choice), I think C++, with the STL, is a far better choice in this instance.

In the spirit of "show, don't tell", here is how I would have written the assignment (untested, uncompiled) (edited to match the required function signature):

#include <iostream>
#include <algorithm>

using namespace std;

// This function signature is required:
int get_num_of_ints(const int* searchBegin, size_t searchSize, int input,
    size_t* first, size_t* count) {
  const int* searchEnd = searchBegin + searchSize;
  const int* result = lower_bound(searchBegin, searchEnd, input);

  if (searchEnd == result || *result != input)
    return -1;

  *first = result - searchBegin;
  *count = upper_bound(result, searchEnd, input) - result;
  return 0;
}

void print_search_results(const int* searchBegin, size_t searchSize, int input) {
  size_t first;
  size_t count; 

  if (get_num_of_ints(searchBegin, searchSize, input, &first, &count) < 0) {
    cout << input << " is not in the array." << endl;
    return;
  }

  cout << input << " was found at index " << first << ". "
       << "There are " << count << " instances for " << input << " in the array."
       << endl;
}

bool read_input(int* input) {
  cout << "Enter a number to search for: ";
  bool succeeded = cin >> *input;
  cout << endl;    
  return succeeded;
}

int main (int argc, char** argv) {
  const int searchNumbers[] = {1, 2, 4, 4, 5, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 11, 13};
  const int searchNumbersSize = sizeof(searchNumbers)/sizeof(searchNumbers[0]);

  while(1) {
     int input;
     if(!read_input(&input)) {
       count << "Bad input, exiting" << endl;
       return 1;
     }

     print_search_results(searchNumbers, searchNumbersSize, input);
  }
}
[1] http://www.cplusplus.com/reference/clibrary/cstdlib/bsearch/

"Fifth, get_num_of_ints is just far too long, and tries to do way too much. It badly needs to be broken up" Hah! They told me I was supposed to write my solution with that function prototype. :) - Micky
That does put some restriction on the interface, but you could still break up the internals; e.g., have a binary_search function instead putting those implementations inside of get_num_of_ints. - Todd Gardner
(7) A lot of these points can be found in code complete by Steve McConnell, it covers a lot of ground on coding style and has lots of references to dive deeper if needed. - Emile Vrijdags
(1) Despite being right about the computational complexity, I have to -1 this. Complaining about variable names in a short standalone piece of code is pretentious IMHO. I have no trouble keeping a small handful of short variable names in my head, and in fact find the reduced visual clutter makes it much easier to read and understand. Also while I agree with the idea of using standard library functions where possible in real code, it's not clear whether they were trying to test engineering knowledge or algorithmic ability here and it's safer to bet on the latter IMHO. - j_random_hacker
1
[+15] [2010-03-14 12:00:20] Tomislav Nakic-Alfirevic

In my experience, the

if (condition)
    consequence;
statementx;
...

style is a land mine just waiting for another (or even the same) developer to extend it to:

if (condition)
    consequence1;
    consequence2;
statementx;
...

Some of you might see what the problem is, but for most programmers, it is a virtually invisible bug because developers tend to interpret code by indentation even though the curly braces are missing, making consequence2 unconditional.


(1) +1 - Not sure why this got a down vote, I have seen way too many instances of exactly this happening to never personally not use brackets on an if statement. I agree, as developers we generally read code by indentation and not the presence of brackets. - mynameiscoffey
(3) If only there was a language that used indentation to represent a nested block, then "what the code looks like" would always match "what the code means", as far as this answer goes. - community_owned
Nice one, Roger. ;) - Tomislav Nakic-Alfirevic
I agree with your advice, but I don't think this is important enough to justify not hiring someone. +0. - j_random_hacker
@random_hacker I agree, by itself, it's no reason not to hire someone, but the question was a request to criticize the code and so I did: I described what I thought was bad practice and suggested a way to improve. It was quite a long time since I wrote some C code so I scan-read the code and provided the comments which sprang to mind. - Tomislav Nakic-Alfirevic
2
[+8] [2010-03-14 06:25:33] Alok

In addition to many of the other comments, the following:

m = (hi + lo)/2;

is the wrong way to find the middle index. This can overflow. You should do:

m = lo + (hi - lo) / 2;

Second, the line:

m=m;

has no effect, and can be eliminated.


(1) True, but I doubt that anyone would fail to get a job because they forgot this bug. - Edan Maor
@Edan: I am not so sure. A "simple" question like this has multiple layers complexity. The interviewer may have asked this question to figure out the level of the applicant. And even if it wasn't the intension, writing correct code helps anyway. Maybe with a comment to the effect of "do this instead of (hi + lo)/2 to avoid overflow" - and the interviewer would then realize that the applicant thought of something that the interviewer didn't! :-) - Alok
(1) I'm sure that bringing this up will score you huge points. But isn't this the bug that was discovered about 4 years ago to have been a bug in pretty much every implementation of binary search for the last 20 years? I'm just saying, if this was a bug in the Java library for 15 years before it was discovered, I wouldn't hold it against anyone not to think of it themselves. - Edan Maor
On the other hand, I did use this bug to show off to a University prof once... mentioning it will definitely get you points :) - Edan Maor
(2) @Edan Maor: It's even better than that -- according to googleresearch.blogspot.com/2006/06/…, Jon Bentley, a famous computer scientist, implemented a binary search in Programming Pearls that contained this bug. And if you wouldn't hire Bentley for a programming position, who would you hire? - j_random_hacker
3
[+5] [2010-03-14 01:29:29] small_duck

I think the last part of your algorithm is sub-optimal. You could carry on your binary search to find the lowest and highest elements equal to the one you are looking for, and then subtract the addresses to find how many there are.

This would scale better if there are many identical elements, but, more importantly for the interview question, it would make your algorithm simpler. For example, the "6 outcomes" code is pretty hairy, and having such a bunch of if-else is often considered a code smell.


(1) Yes, for example if the array consists of only O(1) distinct elements, the algorithm degenerates to O(n) performance. - GregS
@small_duck. Thanks. I felt the same way about the multiple if..else if statements I was making, but being pressed for time, I just did it the messy fashion I wrote up. Can you clarify your "carry on your binary search.." comment a bit more? - Micky
(1) If you did binary_search(requested_item-1) and binary_search(requested_item+1) and recorded the positions where those are (or the location where the search fails if they do not exist) then you could scan backwards/forwards from each until you located the element that you were actually looking for. You'd then have the first and last positions of it, which can be used to trivially count the number of instances. This is O(nlogn). There are a few sub-optimal cases, but I don't think they significantly effect the runtime. - SoapBox
@SoapBox I might be missing something, but why would you go to the trouble of finding a different element and then linearly scanning until the end of that sequence, rather than just scan the original sequence? - David Kanarek
(2) +1 SoapBox (although complexity would be O(log(n)), right?). Micky, the trick is to change your binary search in finding not just one element of the section, but the lowest of the range. Then you can logarithmically find the first and the last element the sequence, a simple subtraction giving you the length. In C++, the standard library has "equal_range", which does exactly this. - small_duck
4
[+5] [2010-03-14 02:25:47] Norman Ramsey

Random observations:

  • The binary search should be decoupled from "find the longest run of identical elements containing this index."

  • It's possible they want time logarithmic in the length of the run, rather than linear.

  • I'd reject any candidate who submitted a six-way case analysis for a simple problem like finding a run. If you split that out as its own routine, you probably find a simpler way to do it like

    for (first = m; first > arr && *first == arr[m]; first--)
        ;
    for (last = m; last < arr+r-1 && *last == arr[m]; last++)
        ;
    

    This code costs linear in the length of the run; if they want logarithmic it becomes a tricky little problem—but in my opinion, over the top as an interview problem.


Thanks, I knew the multiple statements were not the best way to write it up, but I couldn't figure out how to do it any better with the time I was allotted. Btw, can you suggest a practical resource I could use to help me write better code? - Micky
@Micky: There are lots of good books out there. For you I might recommend Jon Bentley's Programming Pearls since it is pretty focused on code and on thinking about code. Get the second edition. - Norman Ramsey
5
[+4] [2010-03-14 01:28:29] fabrizioM

Your code is too complex for what it should do. There are too many comments, variables poor named, and not a clear definitions of functions roles. Some code to show what I would expect as a response:

#include <stdio.h>

int binary_search( const int value, const int *arr, size_t start, size_t end ){
    if( value < arr[start] || value > arr[end] ){
        return -1;
    }

    while(start <= end){
        int pivot = (start+end) >> 1; 
        if(arr[pivot] == value){
            return pivot;
        }else if(arr[pivot] < value){
            start = pivot+1;
        } else if(arr[pivot] > value){
            end = pivot-1;
        } 
    }
    return -1;
}

int get_occurences( int begin, const int *arr, size_t max){
    int counter = 1;
    int cursor = begin;
    while ( (cursor+1) < max && arr[cursor] == arr[cursor+1]) {
        counter++;
        cursor++;
    }
    cursor = begin;
    while ( (cursor-1) > 0 && arr[cursor] == arr[cursor-1]) {
        counter++;
        cursor--;
    }
    return counter;
}


#define MAX 22
int main()
{   
    int value;
    int arr_sorted []={1,1,2,3,3,
                       4,4,4,4,5,
                       5,7,7,7,7,
                       8,8,8,9,11,
                       12,12};    
    size_t arr_size = MAX; // works also the other way               

    printf( "\nPlease input the integer you would like to find.\n" );
    scanf( "%d", &value );
    printf("Searching %d\n", value);
    int pos = binary_search( value, arr_sorted, 0, arr_size-1);

    if( pos == -1) {    
        printf( "%d has not been found.\n", value );
    } else{
        int howmany = get_occurences( pos, arr_sorted, arr_size);
        printf( "The first matching index is %d.\n", pos );
        printf( "The total number of instances is %d.\n", howmany );
    }

    return 0;
}

(1) While you can write a recursive binary search algorithm an iterative one is fine and if C is being used probably better. - Yacoby
@fabrizioM: Can you suggest a (teaching) book, material or lecture note to correct my mistake? - Micky
The code looks reasonably concise and understandable, so I doubt lack of recursion was a factor. - GregS
@GregS is kind of personal preference. As a reviewer I would not accept that code. As a second thought I agree, recursion wasn't a factor. - fabrizioM
Your occurrences function does not work. binary search will not always return the first occurrence in the array, so you will need to search backward as well as forward. - SoapBox
@SoapBox, thanks, I think I fixed that. (forgive me its 3:31 am here) - fabrizioM
@fabrizioM: Thank you taking the time to write up some more detailed code. Not a fun learning lesson but necessary. Just to recap. You described my problems with my code as follows: commenting, poor variable name selection, and the binary search algorithm wasn't written as cleanly as I should have? - Micky
Recursion would make the code worse as it burns log(n) stack space unnecessarily. Recursion is pretty, recursion is elegant, recursion is not always better. - j_random_hacker
6
[+3] [2010-03-14 07:22:51] Robert S. Barnes

Is it just me, or am I the only one who would implement this in an entirely different fashion?

Looking at the requirements of the problem:

  1. Determine if element is present
  2. If present return array index of first occurrence
  3. If present return number of occurrences

You have to think outside the box on a problem like this. Specifically, I would implement a solution to this by dumping the array into a hash at the beginning of the program. The hash key is the number, and the value is a struct containing the index of the first occurance and the total occurance count. You set up this hash as part of the program start up / initilization and then any subsequent look ups by the user are always constant time operations - BigO(1).

You would have no problem implementing this in C++ with the STL.

unordered_map (C++) [1]

Binary Search is just the wrong solution to this problem.

Edit

Regarding the cost of setting up the hash:

Setup is a constant cost incurred only once. While I won't say it doesn't matter, there are a limited number of cases where it does; usually when you have either very small data sets or you execute the algorithm a very small number of times. Other than that setup cost is amortized across all the executions of the algorithm. An algorithm that requires n setup but executes in BigO(1) still beats an algorithm that requires 0 setup but executes in BigO(log n) time unless you're executing it a very small number of times.

[1] http://en.wikipedia.org/wiki/Unordered_map_%28C%2B%2B_class%29

(2) "Binary Search is just the wrong solution to this problem." - It appears the program is supposed to look just for one value when run. With a hash-table you'd have to look up several values to compensate for the expense of creating the table. - UncleBens
@UncleBens: The OP says that one of the things they specifically asked for was a scalable solution. BigO(n) is significantly less scalable that BigO(1) even when setup time is fairly large. - Robert S. Barnes
(2) It really depends on the requirements. If you are asked to basically reimplement std::equal_range, as seems to be the case here, that is a function that accepts a sorted range and a value, then constructing a hash-table might not be an acceptable solution (because inevitably you'd have to create a new table for each look-up). - If the requirements were different and you knew you were working with the same array, then things might be different. - Also, constructing a std::map from a sorted array to perform binary search on it, seems like a complete waste of time and resources. - UncleBens
@UncleBens: You're right - it all depends on the requirements :-) - Robert S. Barnes
(1) the setup would take O(n), that is worse than the O(log n) of binary search - swegi
@swegi: Setup is a constant cost incurred only once. While I won't say it doesn't matter, there are a limited number of cases where it does; usually when you have either very small data sets or you execute the algorithm a very small number of times. Other than that setup cost is amortized across all the executions of the algorithm. An algorithm that requires n setup but executes in BigO(1) still beats an algorithm that requires 0 setup but executes in BigO(log n) time unless you're executing it a very small number of times. - Robert S. Barnes
The OP mentioned in a comment to my answer that they said the function should match the prototype "int get_num_of_ints( const int* arr, size_t r, int N, size_t* first, size_t* count );", which suggests they weren't looking for the hash solution (which certainly would be faster, depending on requirements). - Todd Gardner
-1 for "Binary search is just the wrong solution to this problem". As UncleBens pointed out, either approach could be (much) faster depending on information we don't have -- that being how many searches will be performed, so this comment is unfounded. (There's also the fact that a hashtable will (at least) double memory requirements, since both keys and values must now be stored.) Will +1 if you get rid of this remark. - j_random_hacker
In fact nothing in the problem description suggests performing more than 1 search, so I would assume a single search was being made. (The fact that the numbers are already sorted is another clue about the solution they are looking for.) In that case, a binary search will be dramatically faster for large n. - j_random_hacker
7
[+3] [2010-03-15 01:43:53] msandiford

I came a bit late to this, but this is something like what I might expect to see in C++.

// Interface defined by the problem statement has been adjusted
size_t get_num_of_ints( const int* arr, size_t r, int N, size_t* first )
{
    std::pair<const int *, const int *> range = std::equal_range(arr, arr+r, N);
    size_t count = range.second - range.first;
    if (count > 0 && first != 0)
        *first = range.first - arr;
    return count;
}

I think the interface is broken as defined, so I've fixed it :)


Wow, now THAT is a solution. My hats off to you sir. That's the kind of eloquent and simple solution that you hope to see. Beats the solution I worked out on my own. - wheaties
Nice clean solution. But it's always hard to know whether they want to know (a) that you have good enough library knowledge to let equal_range() do all the work for you or (b) that you have the algorithmic abilities to code a non-broken binary search yourself, since both abilities are useful. - j_random_hacker
(1) Point taken about having the algorithmic chops to implement, but if that's what was required, why not just ask? Knowing the right tool to use for a job is something that can make us an order of magnitude more effective in practice. - msandiford
Absolutely agree, it's something I would always ask about if I could, and as the interviewer I would award some maturity points just for asking. I just got the feeling that this coding challenge was done offline and that may not have been an option. - j_random_hacker
8
[+2] [2010-03-14 08:27:59] Ron Savage

This is what I would have turned in, simple clean and easy to read:

I modified it to test the timing with an array of 300 Million ints and even on my older system it found the "worst case" values (the ones at the end of the array) in about a second (took 6 seconds to fill the array at startup).

So IMHO all this binary search blather falls under the category of "pre-mature optimization" for this simple interview question. :-)

Do the simplest thing that works. Write the code so people can read it and not have to decipher it. Of course, be prepared to discuss performance alternatives if needed.

Latest edit: Addition of FindStartingPoint() function to optimize for speed.

/*****************************************************************
   Module: FindInts.cpp
   Author: Ron Savage
     Date: 03/13/2010

  Description:
  This program prompts the user for an integer, searches for it
  in a given sorted array of integers and prints the first location 
  and number of matches in the array.

  Modification History:
  Date       Init Comment
  03/14/2010 RS   Added a recursive FindStartingPoint function to
                  find a good starting point for the linear search.
  03/14/2010 RS   Added a 300 million int array to test timing.
  03/13/2010 RS   Created.
*****************************************************************/
#include <stdafx.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>

/* function prototype */
void   FillArray( int* searchList, size_t listSize);
int*   FindStartPoint( int* searchList, size_t listSize, int numberToFind );
size_t FindMatches( int* searchList, size_t listSize, int numberToFind, size_t* firstMatchIndex, size_t* matchCount );

/*****************************************************************
   Function: main()
     Author: Ron Savage
       Date: 03/13/2010

  Description:
  Entry point to the program.
*****************************************************************/
int main()
    {
    int userInput = 0;
    int *valueList = 0;

    // Allocate an array of 300 million ints
    size_t listSize = 300000000;
    if (valueList = (int*) malloc(listSize * sizeof(int)))
        {
        size_t firstMatchIndex = 0;
        size_t matchCount = 0;

        /* Fill the array with some values */
        FillArray(valueList, listSize);

        /* prompt the user to enter input */
        printf( "\nPlease input the integer you would like to find.\n" );
        scanf_s( "%d", &userInput );

        /* Search the list for the value entered */
        size_t iterations = 0;
        iterations = FindMatches( valueList, listSize, userInput, &firstMatchIndex, &matchCount );

        /* Display the results if found */
        if ( matchCount > 0 )
            {
            printf("\n%d matches for [%d] were found, starting at index %d in %ld iterations.\n", matchCount, userInput, firstMatchIndex, iterations);
            }
        else
            {
            printf("\nNo matches for [%d] were found.\n", userInput);
            }
        }
    else
        {
        printf("\nCouldn't allocate memory for [%ld] ints.\n", listSize);
        }
    }

/*****************************************************************
   Function: FindMatches()
     Author: Ron Savage
       Date: 03/13/2010

  Description:
  This function searches the searchList for the value entered
  by the user (numberToFind) and returns the first index and 
  count of the matches.
*****************************************************************/
size_t FindMatches( int* searchList, size_t listSize, int numberToFind, size_t* firstMatchIndex, size_t* matchCount )
    {
    // Set the default values if no matches are found
    *firstMatchIndex    = 0;
    *matchCount         = 0;

    // Find the start point of the number
    int* startPoint   = FindStartPoint( searchList, listSize, numberToFind );

    // Loop through the list looking for matches, exit early if we pass the value in the sorted list.
    size_t searchIndex       = 0;
    size_t searchInterations = 0;
    size_t startIndex        = startPoint - searchList;
    for (searchIndex = startIndex; searchIndex < listSize; searchIndex++)
        {
        searchInterations++;

        if ( searchList[searchIndex] == numberToFind )   if ( ++(*matchCount) == 1 ) *firstMatchIndex = searchIndex;

        if ( searchList[searchIndex] > numberToFind ) break;
        }

    return(searchInterations);
    }

/*****************************************************************
   Function: FindStartPoint()
     Author: Ron Savage
       Date: 03/14/2010

  Description:
  This function checks to see if the numberToFind is in the top
  or bottom half of the searchList, then recursively calls itself
  on each subsequently smaller sub-list until a starting point
  at or slightly before the first numberToFind is found.
*****************************************************************/
int* FindStartPoint( int* searchList, size_t listSize, int numberToFind )
    {
    // Check the value in the middle, to determine which half the number is in
    size_t middleIndex  = listSize / 2;

    // Recurse into this function for just the first half of the list
    if ( searchList[middleIndex] >= numberToFind && middleIndex > 1 ) 
        {
        return(FindStartPoint(searchList, middleIndex, numberToFind ));
        }

    // Recurse into this function for just the last half of the list
    if ( searchList[middleIndex] < numberToFind && middleIndex > 1 ) 
        {
        return(FindStartPoint(&searchList[middleIndex], middleIndex, numberToFind ));
        }

    return(searchList);
    }

/*****************************************************************
   Function: FillArray()
     Author: Ron Savage
       Date: 03/13/2010

  Description:
  This function fills the array with a sorted list of values.
*****************************************************************/
void FillArray( int* searchList, size_t listSize)
    {
    size_t searchIndex  = 0;
    int    value        = 0;

    for (searchIndex = 0; searchIndex < listSize; searchIndex++)
        {
        searchList[searchIndex] = value++/100;
        }
    }

This is just a linear search through the array requiring O(n) time. You can definitely do O(lg(n)) time for this using binary search. Given that in the original post, Micky wrote "They made a comment that my code should scale well with large arrays," I feel like this would not at all be something that would make a good impression. - MBennett
I agree that the code should be readable, but a binary search algorithm can be readable too. Writing readable code does not mean writing the dumbest, least efficient thing that works. If the problem said to sort a huge array, would you choose bubble sort just because it's the simplest algorithm? That's essentially what you are doing here. - Ricket
The point is do the simplest thing first (linear search). That took 5 minutes to code, and works easily for the up to 300 million int array I tested it with. If that's "good enough", i.e. their definition of "large array" is less than that - you're done. If not, refine the algorithm (spend more time on it) which I did with the latest edit since I was bored. :-) - Ron Savage
// Allocate an array of 1 billion ints next line: size_t listSize = 300000000; err... Your commenting scares me. - Wallacoloo
(1) @Ron Savage - If I were working on code as part of a job, I'd agree with you about doing the simplest thing first. But if you get asked a question in an interview, and they make a point of asking for it to scale with large arrays, it seems pretty clear to me that they want you to implement a binary search. The least I would do is ask them whether an O(n) solution would be appropriate for their requirements. - Edan Maor
I agree, I'd ask for clarification myself, to define what a "large array" means. Fixed the 1 billion ints comment, that's what I started with until I realized that my meager 2 gigs of memory with Win 7 and VS 2008 didn't have that much left over. :-) Most interview questions in my experience, are to see "how you think", not explicitly "can you write a binary search" - which can be picked up from google in few minutes. - Ron Savage
Also note, the FindStartPoint() function does implement the Binary Search algorithm - I just didn't realize that was the name for it. :-) (self taught programmer here) - Ron Savage
9
[+2] [2010-03-14 11:04:20] Phil

I came up with a solution that is O(log n). It involves a modified binary search that can be told to always take exactly log n comparisons (no early breaking if a match is found), and depending on a parameter, it will either keep trying to find the value to the left or to the right in the array until exhaustion.

In this manner, we can run the binary search at most 2 times - no need to run it the second time if the first reports the value wasn't found - and give back the number of matches (last index - first index + 1). I pass in the pos variable as a pointer so we can effectively "return" 2 values - the number of matches, and the index of the first match in the run.

// returns the number of matches, and sets pos to the index of
// the first match, or -1 if none found
int findMatches(int array[], int size, int searchNum, int* pos)
{
   *pos = binarySearch(array, size, searchNum, -1);
   // if it was found, pos points to a positive number
   if(*pos >= 0)
   {
      int lastIdx = binarySearch(array, size, searchNum, 1);
      return lastIdx - *pos + 1;
   }
   return 0;
}

Then I've got a binary search method that takes an extra argument, direction, which indicates whether you want the first binary search index (0), the earliest (-1) or the last (+1).

int binarySearch(int array[], int size, int searchNum, int direction)
{
   int left = 0;
   int right = size - 1;
   int center;
   int pos = -1;

   while(left <= right)
   {
      center = (right + left) >> 1;

      if(array[center] == searchNum)
      {
         pos = center;
         // break early if we want to find the exact match
         if(direction == 0) break;
      }

      // adding 1 to searchNum means we will find the last in the run
      if(array[center] < searchNum + ((direction > 0) ? 1 : 0))
      {
         left = center + 1;
      }
      else
      {
         right = center - 1;
      }
   }

   return pos;
}

10
[+2] [2010-03-14 19:38:15] Larry Watanabe

Another possibility is that the choice of C++ and C was a test in itself :) Maybe they were willing to settle for someone with C experience, but would have preferred someone with C++ experience, so that could have been a factor.

If you are really curious, you could even contact them again and ask for some feedback.

However, programming style is something that you should work on continuously, not just in preparation for an interview. You aren't going to change things much in that time. I would take a longer range view and try to get some C++, C#, and Java experience under your belt, maybe PHP or perl, or ruby or python, work continouously on programming better, maybe read a book or article on interviewing and resume writing, and keep applying.

I really think maybe they just didn't like the shirt you were wearing :)


11
[+2] [2010-03-14 01:25:01] Larry Watanabe

I would factor the binary search out as a separate procedure.

Also, I would input the array, even if this is not called for, or factor it out into an input procedure that could be converted into something that reads from a file.

I prefer camelCase convention, and more meaningful variable names.

Your algorithm, however, seems fine. Have you thought about the possibility that the test was not a big factor in the hiring decision? Several applicants may have written equally good answers, and the decision made on other criteria.


I thought about writing a while loop to input and sort the array. The problem description was agnostic as to whether I could assume it was already inputted & sorted. I should have been more careful on that point. You're probably right about the "other factors" point. I had to submit some essays regarding various topics--that could have played a role as well. I am just suspecting that I must have done something wrong with this part of the app since I have such a hodge podge CS background. I major in Math & Bio so I'm unsure about my CS skills. I don't know what to expect from employers. Thanks - Micky
Don't sweat it too much. It could even be something stupid, like he doesn't like the else{ and prefers else {. Or maybe the shirt you wore :) - Larry Watanabe
12
[+2] [2010-03-14 01:27:27] Mawg

It looks reasonable enough to me. Yes, maybe some STL methods might help. The only thing I would criticize is that you don't verify that the input is actually a number.

Still I don't see enough here to reject you base don this assignment. Maybe it was another part of the interview. Or maybe they did not reject you - they accepted someone else.


(1) I totally agree with this. Rest assured you (Micky) didn't do anything godawful. The only worthwhile suggestion I've seen on this page is that the linear-time searches for the start and endpoints can be made log-time, but even that I would call nitpicking. You had exactly the right idea. There are a lot of "programmers" who get FizzBuzz wrong in interviews (google it) and you're clearly not one of them. :) - j_random_hacker
13
[+1] [2010-03-14 01:23:27] Carl Manaster

It's been a long while since I used C++, so I couldn't write it write now, but I know there were algorithms in the STL that would make very short work of this. Implementing your own binary search is probably a sign to the prospective employer that you're something of a beginner. The fact that you can implement it is good - but that you would choose to, when such algorithms are already implemented for you, might be discouraging to them.


Argh. I think there's a binary search algorithm but not a "count the number of instances" part. And more importantly it wouldn't return the 1st index as well. Anyways, I'll take another look. But thanks anyways! It will be something I have to think about for my next job app. - Micky
(2) Maybe, but then it is a trick question. If he'd used the STL, someone else would say that it proves he doesn't really understand the basic algorithms of CS and would be lost if he needed something that wasn't in the STL. - GregS
I disagree. Most coding tests require that you do not use standard library functions. If I were the tester I would state that explicitly; using a std::vector does not tell me anything about what you can do. - Ed S.
(1) @Micky: If you are using C++, look at std::lower_bound. If the iterator returned by the search points to the value you searched for, you know that value is in the container. You can then walk forward and count the number of instances of that value in the container. - James McNellis
(1) @James: Or incorporate std::upper_bound, too, then subtract them. If my rusty STL is right, at least. - Carl Manaster
(2) Or just use std::equal_range - UncleBens
Thanks, @UncleBens - that's what I meant when I said my STL was rusty; std::equal_range would make a simple one-liner of this whole question, I think. Good call. - Carl Manaster
If you read his post, it was supposed to be implemented in either C++ or C, and he chose C. so any C++ specific comments wouldn't be relevant. - Larry Watanabe
@Larry, you remind me why I stopped reading C and C++ newsgroups long, long ago. - Carl Manaster
14
[+1] [2010-03-14 07:21:10] jdizzle

Was there a specific requirement that you do a binary search on the list? If there wasn't, why are you doing it? Unnecessarily overcomplicating a solution is one strike against you. * Edit: Whoops you said you had to do it this way in the question! Never mind, carry on *

Why are you using single character variable names? There's plenty of room in the symbol table for more descriptive names. That's another strike.

Too many comments. I can hear you all now: "is he crazy??". Seriously. The code should speak for itself as much as possible. Every time you start writing a comment think "is there a way I could make the code clear enough to avoid this comment?". That's strike three.

Coding is as much communicating to other people as it is communicating to a machine. Your coding style should reflect that. Code Complete [1] is a great book with lots of useful coding style advice. Read the first few chapters and code up the solution again, applying the techniques he describes. Compare your two solutions while asking yourself "which one would I rather maintain?".

[1] http://rads.stackoverflow.com/amzn/click/0735619670

(3) He's doing the binary search on the list because "They made a comment that my code should scale well with large arrays." Properly executed, a binary search will handle this in O(lg(n)) time as opposed to the O(n) time required to just check through linearly. In this case, using a binary search isn't unnecessarily over-complicating the solution; it's using an appropriate level of complexity to get a significantly more efficient solution for the given requirements. - MBennett
15