Hey, I was applying for a position, and they asked me to complete a coding problem for them. I did so and submitted it, but I later found out I was rejected from the position. Anyways, I have an eclectic programming background so I'm not sure if my code is grossly wrong or if I just didn't have the best solution out there. I would like to post my code and get some feedback about it. Before I do, here's a description of a problem:
You are given a sorted array of integers, say, {1, 2, 4, 4, 5, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 11, 13 }. Now you are supposed to write a program (in C or C++, but I chose C) that prompts the user for an element to search for. The program will then search for the element. If it is found, then it should return the first index the entry was found at and the number of instances of that element. If the element is not found, then it should return "not found" or something similar. Here's a simple run of it (with the array I just put up):
Enter a number to search for: 4 4 was found at index 2. There are 2 instances for 4 in the array. Enter a number to search for: -4. -4 is not in the array.
They made a comment that my code should scale well with large arrays (so I wrote up a binary search). Anyways, my code basically runs as follows:
EX: 4, 4, 4, 4, 4
The bold 4 is the value the binary search landed on. One loop will check to the left of it, and another loop will check to the right of it. Their sum will be the total number of instances of the the number four.
Anyways, I don't know if there are any advanced techniques that I am missing or if I just don't have the CS background and made a big error. Any constructive critiques would be appreciated!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
/* function prototype */
int get_num_of_ints(
const int* arr,
size_t r,
int N,
size_t* first,
size_t* count
);
int main()
{
int N; /* input variable */
int arr[]={1,1,2,3,3,4,4,4,4,5,5,7,7,7,7,8,8,8,9,11,12,12}; /* sorted */
size_t r = sizeof(arr)/sizeof(arr[0]); /* right bound */
size_t first; /* first match index */
size_t count; /* total number of matches */
/* prompts the user to enter input */
printf( "\nPlease input the integer you would like to find.\n" );
scanf( "%d", &N );
int a = get_num_of_ints( arr, r, N, &first, &count );
/* If the function returns -1 then the value is not found.
Else it is returned */
if( a == -1)
printf( "%d has not been found.\n", N );
else if(a >= 0){
printf( "The first matching index is %d.\n", first );
printf( "The total number of instances is %d.\n", count );
}
return 0;
}
/* function definition */
int get_num_of_ints(
const int* arr,
size_t r,
int N,
size_t* first,
size_t* count)
{
int lo=0; /* lower bound for search */
int m=0; /* middle value obtained */
int hi=r-1; /* upper bound for search */
int w=r-1; /* used as a fixed upper bound to calculate the number of
right instances of a particular value. */
/* binary search to find if a value exists */
/* first check if the element is out of bounds */
if( N < arr[0] || arr[hi] < N ){
m = -1;
}
else{ /* binary search to find value if it exists within given params */
while(lo <= hi){
m = (hi + lo)/2;
if(arr[m] < N)
lo = m+1;
else if(arr[m] > N)
hi = m-1;
else if(arr[m]==N){
m=m;
break;
}
}
if (lo > hi) /* if it doesn't we assign it -1 */
m = -1;
}
/* If the value is found, then we compute left and right instances of it */
if( m >= 0 ){
int j = m-1; /* starting with the first term to the left */
int L = 0; /* total number of left instances */
/* while loop computes total number of left instances */
while( j >= 0 && arr[j] == arr[m] ){
L++;
j--;
}
/* There are six possible outcomes of this. Depending on the outcome,
we must assign the first index variable accordingly */
if( j > 0 && L > 0 )
*first=j+1;
else if( j==0 && L==0)
*first=m;
else if( j > 0 && L==0 )
*first=m;
else if(j < 0 && L==0 )
*first=m;
else if( j < 0 && L > 0 )
*first=0;
else if( j=0 && L > 0 )
*first=j+1;
int h = m + 1; /* starting with the first term to the right */
int R = 0; /* total number of right instances */
/* while loop computes total number of right instances */
/* we fixed w earlier so that it's value does not change */
while( arr[h]==arr[m] && h <= w ){
R++;
h++;
}
*count = (R + L + 1); /* total num of instances stored into count */
return *first; /* first instance index stored here */
}
/* if value does not exist, then we return a negative value */
else if( m==-1)
return -1;
}
I would have a few concerns about hiring someone who submitted this for a code sample. Here is what I see.
First, addressing overall design, the algorithm is suboptimal, and is worst case linear instead of worst case logarithmic, because it doesn't use a binary search to find the amount of elements, but a linear one.
Second, (and this is what would have really killed it for me) the variable names. Most of these are either one or two letters, and the code is very unreadable because of it. Giving your variables descriptive names is important for maintainability.
Third, ignoring standard libraries. Unless instructed not to use them, you should prefer standard libraries which have implementations of binary search (e.g. the stl or bsearch [1])
Fourth, why have get_num_of_ints return -1 has a magic value feeling to me; better to just set count = 0 and check that.
Fifth, get_num_of_ints is just far too long, and tries to do way too much. It badly needs to be broken up.
Sixth (and this is a personal choice), I think C++, with the STL, is a far better choice in this instance.
In the spirit of "show, don't tell", here is how I would have written the assignment (untested, uncompiled) (edited to match the required function signature):
#include <iostream>
#include <algorithm>
using namespace std;
// This function signature is required:
int get_num_of_ints(const int* searchBegin, size_t searchSize, int input,
size_t* first, size_t* count) {
const int* searchEnd = searchBegin + searchSize;
const int* result = lower_bound(searchBegin, searchEnd, input);
if (searchEnd == result || *result != input)
return -1;
*first = result - searchBegin;
*count = upper_bound(result, searchEnd, input) - result;
return 0;
}
void print_search_results(const int* searchBegin, size_t searchSize, int input) {
size_t first;
size_t count;
if (get_num_of_ints(searchBegin, searchSize, input, &first, &count) < 0) {
cout << input << " is not in the array." << endl;
return;
}
cout << input << " was found at index " << first << ". "
<< "There are " << count << " instances for " << input << " in the array."
<< endl;
}
bool read_input(int* input) {
cout << "Enter a number to search for: ";
bool succeeded = cin >> *input;
cout << endl;
return succeeded;
}
int main (int argc, char** argv) {
const int searchNumbers[] = {1, 2, 4, 4, 5, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 11, 13};
const int searchNumbersSize = sizeof(searchNumbers)/sizeof(searchNumbers[0]);
while(1) {
int input;
if(!read_input(&input)) {
count << "Bad input, exiting" << endl;
return 1;
}
print_search_results(searchNumbers, searchNumbersSize, input);
}
}
[1] http://www.cplusplus.com/reference/clibrary/cstdlib/bsearch/In my experience, the
if (condition)
consequence;
statementx;
...
style is a land mine just waiting for another (or even the same) developer to extend it to:
if (condition)
consequence1;
consequence2;
statementx;
...
Some of you might see what the problem is, but for most programmers, it is a virtually invisible bug because developers tend to interpret code by indentation even though the curly braces are missing, making consequence2 unconditional.
In addition to many of the other comments, the following:
m = (hi + lo)/2;
is the wrong way to find the middle index. This can overflow. You should do:
m = lo + (hi - lo) / 2;
Second, the line:
m=m;
has no effect, and can be eliminated.
(hi + lo)/2
to avoid overflow" - and the interviewer would then realize that the applicant thought of something that the interviewer didn't! :-) - Alok
I think the last part of your algorithm is sub-optimal. You could carry on your binary search to find the lowest and highest elements equal to the one you are looking for, and then subtract the addresses to find how many there are.
This would scale better if there are many identical elements, but, more importantly for the interview question, it would make your algorithm simpler. For example, the "6 outcomes" code is pretty hairy, and having such a bunch of if-else is often considered a code smell.
Random observations:
The binary search should be decoupled from "find the longest run of identical elements containing this index."
It's possible they want time logarithmic in the length of the run, rather than linear.
I'd reject any candidate who submitted a six-way case analysis for a simple problem like finding a run. If you split that out as its own routine, you probably find a simpler way to do it like
for (first = m; first > arr && *first == arr[m]; first--)
;
for (last = m; last < arr+r-1 && *last == arr[m]; last++)
;
This code costs linear in the length of the run; if they want logarithmic it becomes a tricky little problem—but in my opinion, over the top as an interview problem.
Your code is too complex for what it should do. There are too many comments, variables poor named, and not a clear definitions of functions roles. Some code to show what I would expect as a response:
#include <stdio.h>
int binary_search( const int value, const int *arr, size_t start, size_t end ){
if( value < arr[start] || value > arr[end] ){
return -1;
}
while(start <= end){
int pivot = (start+end) >> 1;
if(arr[pivot] == value){
return pivot;
}else if(arr[pivot] < value){
start = pivot+1;
} else if(arr[pivot] > value){
end = pivot-1;
}
}
return -1;
}
int get_occurences( int begin, const int *arr, size_t max){
int counter = 1;
int cursor = begin;
while ( (cursor+1) < max && arr[cursor] == arr[cursor+1]) {
counter++;
cursor++;
}
cursor = begin;
while ( (cursor-1) > 0 && arr[cursor] == arr[cursor-1]) {
counter++;
cursor--;
}
return counter;
}
#define MAX 22
int main()
{
int value;
int arr_sorted []={1,1,2,3,3,
4,4,4,4,5,
5,7,7,7,7,
8,8,8,9,11,
12,12};
size_t arr_size = MAX; // works also the other way
printf( "\nPlease input the integer you would like to find.\n" );
scanf( "%d", &value );
printf("Searching %d\n", value);
int pos = binary_search( value, arr_sorted, 0, arr_size-1);
if( pos == -1) {
printf( "%d has not been found.\n", value );
} else{
int howmany = get_occurences( pos, arr_sorted, arr_size);
printf( "The first matching index is %d.\n", pos );
printf( "The total number of instances is %d.\n", howmany );
}
return 0;
}
Is it just me, or am I the only one who would implement this in an entirely different fashion?
Looking at the requirements of the problem:
You have to think outside the box on a problem like this. Specifically, I would implement a solution to this by dumping the array into a hash at the beginning of the program. The hash key is the number, and the value is a struct containing the index of the first occurance and the total occurance count. You set up this hash as part of the program start up / initilization and then any subsequent look ups by the user are always constant time operations - BigO(1).
You would have no problem implementing this in C++ with the STL.
Binary Search is just the wrong solution to this problem.
Edit
Regarding the cost of setting up the hash:
Setup is a constant cost incurred only once. While I won't say it doesn't matter, there are a limited number of cases where it does; usually when you have either very small data sets or you execute the algorithm a very small number of times. Other than that setup cost is amortized across all the executions of the algorithm. An algorithm that requires n setup but executes in BigO(1) still beats an algorithm that requires 0 setup but executes in BigO(log n) time unless you're executing it a very small number of times.
[1] http://en.wikipedia.org/wiki/Unordered_map_%28C%2B%2B_class%29I came a bit late to this, but this is something like what I might expect to see in C++.
// Interface defined by the problem statement has been adjusted
size_t get_num_of_ints( const int* arr, size_t r, int N, size_t* first )
{
std::pair<const int *, const int *> range = std::equal_range(arr, arr+r, N);
size_t count = range.second - range.first;
if (count > 0 && first != 0)
*first = range.first - arr;
return count;
}
I think the interface is broken as defined, so I've fixed it :)
equal_range()
do all the work for you or (b) that you have the algorithmic abilities to code a non-broken binary search yourself, since both abilities are useful. - j_random_hacker
This is what I would have turned in, simple clean and easy to read:
I modified it to test the timing with an array of 300 Million ints and even on my older system it found the "worst case" values (the ones at the end of the array) in about a second (took 6 seconds to fill the array at startup).
So IMHO all this binary search blather falls under the category of "pre-mature optimization" for this simple interview question. :-)
Do the simplest thing that works. Write the code so people can read it and not have to decipher it. Of course, be prepared to discuss performance alternatives if needed.
Latest edit: Addition of FindStartingPoint() function to optimize for speed.
/*****************************************************************
Module: FindInts.cpp
Author: Ron Savage
Date: 03/13/2010
Description:
This program prompts the user for an integer, searches for it
in a given sorted array of integers and prints the first location
and number of matches in the array.
Modification History:
Date Init Comment
03/14/2010 RS Added a recursive FindStartingPoint function to
find a good starting point for the linear search.
03/14/2010 RS Added a 300 million int array to test timing.
03/13/2010 RS Created.
*****************************************************************/
#include <stdafx.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
/* function prototype */
void FillArray( int* searchList, size_t listSize);
int* FindStartPoint( int* searchList, size_t listSize, int numberToFind );
size_t FindMatches( int* searchList, size_t listSize, int numberToFind, size_t* firstMatchIndex, size_t* matchCount );
/*****************************************************************
Function: main()
Author: Ron Savage
Date: 03/13/2010
Description:
Entry point to the program.
*****************************************************************/
int main()
{
int userInput = 0;
int *valueList = 0;
// Allocate an array of 300 million ints
size_t listSize = 300000000;
if (valueList = (int*) malloc(listSize * sizeof(int)))
{
size_t firstMatchIndex = 0;
size_t matchCount = 0;
/* Fill the array with some values */
FillArray(valueList, listSize);
/* prompt the user to enter input */
printf( "\nPlease input the integer you would like to find.\n" );
scanf_s( "%d", &userInput );
/* Search the list for the value entered */
size_t iterations = 0;
iterations = FindMatches( valueList, listSize, userInput, &firstMatchIndex, &matchCount );
/* Display the results if found */
if ( matchCount > 0 )
{
printf("\n%d matches for [%d] were found, starting at index %d in %ld iterations.\n", matchCount, userInput, firstMatchIndex, iterations);
}
else
{
printf("\nNo matches for [%d] were found.\n", userInput);
}
}
else
{
printf("\nCouldn't allocate memory for [%ld] ints.\n", listSize);
}
}
/*****************************************************************
Function: FindMatches()
Author: Ron Savage
Date: 03/13/2010
Description:
This function searches the searchList for the value entered
by the user (numberToFind) and returns the first index and
count of the matches.
*****************************************************************/
size_t FindMatches( int* searchList, size_t listSize, int numberToFind, size_t* firstMatchIndex, size_t* matchCount )
{
// Set the default values if no matches are found
*firstMatchIndex = 0;
*matchCount = 0;
// Find the start point of the number
int* startPoint = FindStartPoint( searchList, listSize, numberToFind );
// Loop through the list looking for matches, exit early if we pass the value in the sorted list.
size_t searchIndex = 0;
size_t searchInterations = 0;
size_t startIndex = startPoint - searchList;
for (searchIndex = startIndex; searchIndex < listSize; searchIndex++)
{
searchInterations++;
if ( searchList[searchIndex] == numberToFind ) if ( ++(*matchCount) == 1 ) *firstMatchIndex = searchIndex;
if ( searchList[searchIndex] > numberToFind ) break;
}
return(searchInterations);
}
/*****************************************************************
Function: FindStartPoint()
Author: Ron Savage
Date: 03/14/2010
Description:
This function checks to see if the numberToFind is in the top
or bottom half of the searchList, then recursively calls itself
on each subsequently smaller sub-list until a starting point
at or slightly before the first numberToFind is found.
*****************************************************************/
int* FindStartPoint( int* searchList, size_t listSize, int numberToFind )
{
// Check the value in the middle, to determine which half the number is in
size_t middleIndex = listSize / 2;
// Recurse into this function for just the first half of the list
if ( searchList[middleIndex] >= numberToFind && middleIndex > 1 )
{
return(FindStartPoint(searchList, middleIndex, numberToFind ));
}
// Recurse into this function for just the last half of the list
if ( searchList[middleIndex] < numberToFind && middleIndex > 1 )
{
return(FindStartPoint(&searchList[middleIndex], middleIndex, numberToFind ));
}
return(searchList);
}
/*****************************************************************
Function: FillArray()
Author: Ron Savage
Date: 03/13/2010
Description:
This function fills the array with a sorted list of values.
*****************************************************************/
void FillArray( int* searchList, size_t listSize)
{
size_t searchIndex = 0;
int value = 0;
for (searchIndex = 0; searchIndex < listSize; searchIndex++)
{
searchList[searchIndex] = value++/100;
}
}
// Allocate an array of 1 billion ints
next line:
size_t listSize = 300000000;
err... Your commenting scares me. - Wallacoloo
I came up with a solution that is O(log n). It involves a modified binary search that can be told to always take exactly log n comparisons (no early breaking if a match is found), and depending on a parameter, it will either keep trying to find the value to the left or to the right in the array until exhaustion.
In this manner, we can run the binary search at most 2 times - no need to run it the second time if the first reports the value wasn't found - and give back the number of matches (last index - first index + 1). I pass in the pos
variable as a pointer so we can effectively "return" 2 values - the number of matches, and the index of the first match in the run.
// returns the number of matches, and sets pos to the index of
// the first match, or -1 if none found
int findMatches(int array[], int size, int searchNum, int* pos)
{
*pos = binarySearch(array, size, searchNum, -1);
// if it was found, pos points to a positive number
if(*pos >= 0)
{
int lastIdx = binarySearch(array, size, searchNum, 1);
return lastIdx - *pos + 1;
}
return 0;
}
Then I've got a binary search method that takes an extra argument, direction
, which indicates whether you want the first binary search index (0), the earliest (-1) or the last (+1).
int binarySearch(int array[], int size, int searchNum, int direction)
{
int left = 0;
int right = size - 1;
int center;
int pos = -1;
while(left <= right)
{
center = (right + left) >> 1;
if(array[center] == searchNum)
{
pos = center;
// break early if we want to find the exact match
if(direction == 0) break;
}
// adding 1 to searchNum means we will find the last in the run
if(array[center] < searchNum + ((direction > 0) ? 1 : 0))
{
left = center + 1;
}
else
{
right = center - 1;
}
}
return pos;
}
Another possibility is that the choice of C++ and C was a test in itself :) Maybe they were willing to settle for someone with C experience, but would have preferred someone with C++ experience, so that could have been a factor.
If you are really curious, you could even contact them again and ask for some feedback.
However, programming style is something that you should work on continuously, not just in preparation for an interview. You aren't going to change things much in that time. I would take a longer range view and try to get some C++, C#, and Java experience under your belt, maybe PHP or perl, or ruby or python, work continouously on programming better, maybe read a book or article on interviewing and resume writing, and keep applying.
I really think maybe they just didn't like the shirt you were wearing :)
I would factor the binary search out as a separate procedure.
Also, I would input the array, even if this is not called for, or factor it out into an input procedure that could be converted into something that reads from a file.
I prefer camelCase convention, and more meaningful variable names.
Your algorithm, however, seems fine. Have you thought about the possibility that the test was not a big factor in the hiring decision? Several applicants may have written equally good answers, and the decision made on other criteria.
It looks reasonable enough to me. Yes, maybe some STL methods might help. The only thing I would criticize is that you don't verify that the input is actually a number.
Still I don't see enough here to reject you base don this assignment. Maybe it was another part of the interview. Or maybe they did not reject you - they accepted someone else.
It's been a long while since I used C++, so I couldn't write it write now, but I know there were algorithms in the STL that would make very short work of this. Implementing your own binary search is probably a sign to the prospective employer that you're something of a beginner. The fact that you can implement it is good - but that you would choose to, when such algorithms are already implemented for you, might be discouraging to them.
std::lower_bound
. If the iterator returned by the search points to the value you searched for, you know that value is in the container. You can then walk forward and count the number of instances of that value in the container. - James McNellis
std::equal_range
- UncleBens
Was there a specific requirement that you do a binary search on the list? If there wasn't, why are you doing it? Unnecessarily overcomplicating a solution is one strike against you. * Edit: Whoops you said you had to do it this way in the question! Never mind, carry on *
Why are you using single character variable names? There's plenty of room in the symbol table for more descriptive names. That's another strike.
Too many comments. I can hear you all now: "is he crazy??". Seriously. The code should speak for itself as much as possible. Every time you start writing a comment think "is there a way I could make the code clear enough to avoid this comment?". That's strike three.
Coding is as much communicating to other people as it is communicating to a machine. Your coding style should reflect that. Code Complete [1] is a great book with lots of useful coding style advice. Read the first few chapters and code up the solution again, applying the techniques he describes. Compare your two solutions while asking yourself "which one would I rather maintain?".
[1] http://rads.stackoverflow.com/amzn/click/0735619670
if( L == 0 ) *first = m;
? And the first and last cases look like they could be collapsed also, so that leaves only three cases -- much nicer. - Michael Myers