Possible Duplicate:
Algorithm: How to tell if an array is a permutation in O(n)? [1]
I was asked this interview question at Microsoft, and it's been bugging me that I haven't been able to solve it. Does anybody have any ideas?
You're passed an array with n integers in it. Each integer is between 1 and n (inclusive). These numbers are not in sorted order and there may be duplicates or missing numbers.
The problem is to figure out if there are any duplicates in the array in O(n) time and O(1) space without destroying the array.
For a probabilistic solution, compute m different order-independent checksums and verify that they all correspond to the known checksums for an array containing every number once.
To choose checksums, you could sum, multiply, sum the squares, multiply the squares, etc., allowing each of them to wrap rather than using bignums. There's probably some way to synthesise an arbitrarily large family of checksums. For instance checksum q could sum the result of multiplying each element by the (1000+q)th prime number.
Crude analysis: Assuming 32-bit checksums with passable avalanche and reasonable independence between checksums, the probability of a collision is about 1/(232m). This means that a five-checksum verification is on par with sha1.
Edit: Note @Steve Jessup's point in the comments that simple multiplication with different multipliers won't produce independent checksums. @CodeInChaos's suggestion to use sha1 on each character would.
n
, so technically the complexity will be slightly larger than O(n). - ruslik
q
raised each element to the power of q
modulo some large prime, then we could be in business. Ideal would a hash function which is cryptographically secure except for the property that it ignores the order of its input bytes. Not sure if such exists and/or can be easily derived from a normal hash as CodeInChaos implies. - Steve Jessop
n
. What if n
is really large, so large that it doubles the order of magnitude of 10^-50, or triple? Then the chance of failure is not as simple as 10^-50. - 動靜能量
Looks like this will work:
Think of each array slot as a linked list node, containing the address of the next slot in the list. We calculate the address of next slot as (k-1), where k is the number stored in the slot, so if array[0] == 4, then next slot to visit is array[3]. An example array:
S = 0 1 2 3 4 (Array Idx)
k = 1 2 3 4 5 (Array Values)
The above array has 5 linked lists of one node each i.e. each list's last slot (the only slot of each list) pointing to the starting slot (i.e. itself). Another possible example:
S = 0 1 2 3 4 (Array Idx)
k = 2 1 3 5 4 (Array Values)
This array has 3 lists (2 1), (3), (5, 4) - two of them containing cycles ending at the starting node i.e. with the last slot in the list pointing to the starting slot of the list. Another example:
S = 0 1 2 3 4 (Array Idx)
k = 2 3 3 5 4 (Array Values)
This array has 3 lists as well, with one containing a cycle, that does not end at the starting node of that list, but at the last node itself: (2,3). There's another list starting in slot 1, which shares a slot with another list (3,3). And the last list is (5,4).
Our goal is to identify a list, whose last node does not point to itself, or the starting slot of that list. If such a list exists there is a duplicate in the array. We'll use this strategy to do this:
Start traversing the first linked list starting at slot 0. Save the starting node idx of the "current" linked list in a variable S, so S = 0.
As we visit each node, we will mark that node as visited by multiplying the number in the slot by -1.
If a visited slot 'k-1' contains a number 'n' such that (n-1) != (k-1) && (n-1) != S, and n < 0, we are visiting the slot again, and we have found our duplicate number 'n'.
However, if n-1 == k-1 || n-1 == S, the traversal of this list ends there and we need to find the next list to traverse.
Store the index of current slot in S. To find the starting node of the next list - walk down the rest of the array in a circular fashion, until you find a slot which does not contain a -ve number, or you reach the slot pointed to by S again.
If you reached the starting slot S, all the nodes in the array have been visited, and there are no duplicates.
If you find a new slot, save its index in S, and go back to 2.
After we are done we'll need to make one more pass through the array to restore the values (we are not supposed to destroy the array). This is still O(n) though.
signed int
. What if "integer" is supposed to mean unsigned int
, and n is supposed to be allowed to range all the way to UINT_MAX? The question doesn't say, so all you'd need is to clarify with the interviewer before continuing. Unfortunately we don't have the interviewer here to ask, so we talk about "cheating" and "not cheating" instead... - Steve Jessop
I found one solution, but it borders on cheating. Technically I don't destroy the array if I restore it afterwards to the original state. But of course it's bad style and fails in multithreaded code. And it requires MaxInt>=2*n+1.
I abuse one bit of each of the ints in the array as my n-bit bitvector to encode if I already encountered the number n. And when I'm done I strip them out again.
//Empty last bit
for(int i=0;i<n;i++)
{
a[i]*=2;
}
//Fill last bit with my data
for(int i=0;i<n;i++)
{
int num=a[i]/2-1;
a[num]|=1;
}
bool result=false;
//Restore the array and check for missing number
for(int i=0;i<n;i++)
{
if(a[i]&1!=0)
result=true;
a[i]/=2;
}
return result;
O(n*3)
time? - Core Xii
a[i] *= 2
overflows? - IVlad
true
for both a = {1, 2, 3, 4}
and a = {1, 2, 3, 3}
, n = 4
for me in C++. - IVlad
Very interesting problem. I personally suspect that there is no solution to this problem, ie. they wanted to see your way of reasoning and analyzing the problem... but I'm looking forward to other answers here and more than glad if another answer will prove me wrong!
The reasons why I suspect that no solution exists are:
If you do this by sorting the array, you need a sorting algorithm that takes O(n) time and O(1) memory. I think that the O(1) memory requirement means you'd have to sort in-place, which the interview question explicitly disallows. O(n) time sorting algorithms are rare I think; radix sort comes close with O(k×n), where k depends on the number of digits of your numbers; but even that is disputed by some.
If you try to do this without sorting, the obvious way is to somehow remember which numbers you've already encountered. The O(1) memory constraint doesn't allow this.
You're then left with "heuristic" methods, such as the n × (n+1) / 2 trick (invented by mathematician Gauss, I believe), but that's not 100% accurate, as demonstrated in the comments to some other answers here. For this particular solution, I'm not sure if it helped if you multiplied each number by a large constant so that they are "spread" a little further, such that they won't "interfere" when you add them up or multiply them; but that's a very vague idea which I'm almost sure won't make a difference in the end.
This paper [1] describes a solution to a similar problem under the same constraints. It's pretty advanced but seems to be what you're after.
The only difference is that there's always a duplicate in that problem, whereas in your case there might not be. So you just need to figure out how to detect that.
[1] http://geomete.com/abdali/papers/duplmath.pdfThe original question says:
figure out if there are any duplicates in the array in O(n) time and O(1) space without destroying the array
Does re-arranging the element count as "destroying the array"? For example, if those are just some collected data in some experiment where the order of performing the experiment is not important, then the order of those numbers doesn't matter.
In that case, we can take element 1, say, if it contains 8, then we swap element 1 with element 8, so now element 8 contains the number 8.
Now we look at element 1 again, if it contains 3, we swap it with element 3. If we are swapping into an element that already contains that number, then we have found the duplicate. If we reach the end of array without trying to swap the same number into a slot that already has that number, then we know there is no duplicate.
Just keep on swapping element 1 in this fashion until 1
is swapped into element 1, in which case we will move onto element 2.
It is like doing a Radix sort. It may look like it is O(n * n), but since each swapping will swap 1 more correct element into place, we can at most swap n
times, plus the "moving along the array" which is also O(n), so it is O(n + n), which is still O(n).
The array is a perfect discete uniform distribution ( Wikipedia [1]). A perfect discrete uniform distribution has a known variance that will change if there are any duplicates, so calculate the variance of the data and compare it to the expected variance. This algorithm seems to be working against a couple quick tests and is O(n) in time, O(log(n)) in space (only to grow the floating point values), non-destructive to the array. It does start to fail on very large arrays, hence the rounding-error check at the end.
Edit: Added skewness calculation, no false negatives through 2^n arrays up to 2^27, largest I can initialize. It will also detect a single duplicate in 2^27 item array. The wonky stuff in the loops is the Kahan summation algorithm, dramatically reduces error in summation.
C#:
public static bool ContainsDuplicates(int[] nums)
{
double ERROR_ALLOWED = 0.00000000000001;
double average = 0;
ulong sum = 0;
for (int i = 0; i < nums.Length; i++)
{
sum += (ulong)nums[i];
}
average = (double)sum / (double)nums.Length;
double variance = (double)nums[0] - average;
variance *= variance;
double c = 0;
for (int i = 1; i < nums.Length; i++)
{
double val = (double)nums[i] - average;
double y = val * val - c;
double t = variance + y;
c = (t - variance) - y;
variance = t;
}
variance /= nums.Length;
double skewBottom = Math.Pow(variance, 1.5);
double skewness = (double)nums[0] - average;
skewness *= skewness * skewness;
for (int i = 1; i < nums.Length; i++)
{
double val = (double)nums[i] - average;
double y = val * val * val - c;
double t = skewness + y;
c = (t - skewness) - y;
skewness = t;
}
skewness /= skewBottom;
double expectedVariance = nums.Length;
expectedVariance *= expectedVariance;
expectedVariance--;
expectedVariance /= 12;
double varianceError = Math.Abs(variance / expectedVariance - 1);
return varianceError > ERROR_ALLOWED ||
Math.Abs(skewness) > ERROR_ALLOWED;
}
[1] http://en.wikipedia.org/wiki/Uniform_distribution_%28discrete%29Answered here [1]
looks like I was on the right lines with the linked list idea, but I'm not sure I would have got there
[1] http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html1..n
in pigeonhole principle should be between 1..k
were k < n
. - Saeed Amiri
Using Sudhanshu idea a possibile algorithm in Java could be.
boolean findDuplicate(int[] array){
for(int i=0;i<array.length;i++){
array[Math.abs(array[i])-1]*=-1;
if(array[Math.abs(array[i])-1])>0 return false;//key already visited
}
for(int i =0;i<array.length;i++){
//if all is fine all values are negative now
array[i]*=-1;
}
return true;
}
Is it correct?
(Thanks to Maciej Hehl)
array[array[i]-1]*=-1;
to array[abs(array[i])-1]*=-1;
and the next line in a similar way. - Maciej Hehl
The paper that IVlad linked to inspired me to think of this solution (which unfortunately also cheats by destroying the array):
The idea is that you start at index 1. Then you look at the sequence 1,a[1], a[a[1]], a[a[a[1]]], and so on, and you look for when that sequence hits 1 again, or comes back to a number that it's already hit. If the latter occurs then you know that there is a duplicate. The cheating part comes in where you zero out every array element in this sequence. The reason for this is two-fold: (a) it makes it easy to determine if you hit a number twice in your sequence and (b) it makes the runtime O(n), since we mark this element as not needing to be iterated on when we go through the rest of the array.
Here is python code implementing the algorithm and some test cases, in case my description is not clear:
def arrayHasDuplicates(array):
startLoopIndex = 1;
while startLoopIndex < len(array):
loopIndex = startLoopIndex
while array[loopIndex-1] > 0:
nextLoopIndex = array[loopIndex-1]
array[loopIndex-1] = 0
loopIndex = nextLoopIndex
if loopIndex != startLoopIndex:
return True
startLoopIndex = startLoopIndex + 1
return False
def test(array):
print str(array)+" duplicates? "+str(arrayHasDuplicates(array))
test([1])
test([1,2])
test([2,2])
test([1,2,3])
test([2,2,2])
test([3,2,3])
test([1,1,1])
test([3,1,2])
test([7,5,3,1,2,4,6])
test([7,5,3,1,2,2,6])
[edited!]
The only problem seems to be that the question is not specifying whether these are signed or unsigned integers. If they are signed, we can use the sign bit for storage, as they are all guaranteed to be positive, and the solution is relatively trivial, and has been presented. For the sake of completeness, I present it once more:
//Solution involves the fact that we can use the sign-bit of
//the i'th element to record whether the value i is duplicated.
// suppose array is P[0] thru P[N-1]
for (int i = 0; i < N; i++)
{
int Pi = ABS(P[i]);
if (P[Pi] < 0)
break; // DUPLICATE
P[Pi] = -P[Pi];
}
bool foundDuplicate = (i < N); // if i reached end, that means no duplicates
// now clean array
for (int i = 0; i < N; i++)
if (P[i] < 0)
P[i] = -P[i];
(NB: this method is not thread safe as it messes with the array before restoring it)
Note that zero duplicates implies that the target contains each of the numbers 1 thru N. it implies that there are no omissions. So we only need to check duplicates.
As it is an interview question, it would probably be bad practice to optimise it straight away, as it would just give the interviewer a headache, but it could be sped up tons.
If, however, we are given unsigned integers, then we are not guaranteed the sign bit to play with. in this case a solution is not obvious, and if it exists, it is almost certainly going to be impractical. My gut feeling is that this problem with unsigned integers is intractable, and that a proof of this could be arrived at, though it would be very tricky.
It is worth remembering that this is a question in the context of an interview, and the interviewer is likely looking for qualities such as the ability to work efficiently, rather than eagerness to waste hours worrying out theoretical minutiae. Business is usually more a matter of making gut calls; most problems in business do not have a perfect solution.
Also, a company hiring software engineers would want people with the ability to detect a flawed specification, rather than simply those that would blindly do what they have been told.
I know this is not going to be a popular angle, but being realistic, while an interview for a position in a university would appreciate a lengthy discussion, an interview for a position in a corporation would most likely put the emphasis on 'ability to get things done'.
I suspect also that the interview question assumes signed integers. It is a syntax convention of C, is it not, that an integer is assumed to be signed unless explicitly marked with the 'unsigned' keyword? I'm not sure if this is contained in the original C specification, but I would be surprised if it isn't.
Its easy to find one duplicate:
Start from first position in array, check it's value if is 1
move to next, but if is i > 1
move it to a[i]
(save current a[i]
value) now set the i
st position value to a[1] + n
and move previous a[i]
value to it's appropriated home like a[1]
(i.e a[i] == j move it to a[j] and do the same with a[j]) when you move 'a[i]' it to its ... and so on if you finding two n +i
in a same home you find one duplicate, in the case where a[1]
is in place (1st value in array) select a[2]
and do the same, your running time algorithm is O(n) (because you visit each house at most twice) and the extra space you using is O(1). also you should skip items which are has a value of bigger than n
in your iteration.
n + i
so you know its exact place :D, I say just one duplicate when you find it you doing reverse. - Saeed Amiri
n + i
with n * j + i
I'll wrote it tomorrow. - Saeed Amiri
A solution, assuming that:
The algorithm: Go over the elements of the array. At each element with value k, go to element k - 1 (assuming 0-based indexing) of the array, and set its high digit to 1. This gives us a bitmap of which elements are taken without destroying the original data.
Then, go over the array again, and check if there are any elements that do not have the high digit set. While doing so, clear the high digit to return the array to its original state.
If there are elements that do not have the high digit set, we know that there must be missing numbers, and hence that there must be duplicates.
So we end up having taken up a O(1) space (the boolean used to keep track of whether any element does not have its high bit set) and O(n) time (we loop over the array exactly 2n times).
Firstly, let me start off by saying that this is cheating. This is O(n)
as O(n * SIZE_MAX)
, where SIZE_MAX
is constant, is equivalent to O(n)
. And it uses O(1)
space by allocating the same amount of space entirely independent of n
.
This won't work on a real machine, as it is a horrible abuse of big-O notation.
int
test_dups (size_t *array, size_t n)
{
int tmp[SIZE_MAX] = { 0 };
for (int i = 0; i < n; i++)
tmp[array[i]] += 1;
for (int i = 0; i < SIZE_MAX; i++)
if (tmp[array[i]] > 0)
return 1;
return 0;
}
A solution without doing any sum or multiplication:
Credit to Paul as I got this idea from his comment: "One thing about the problem as stated is every index must present in the array. So if you consider the array as a set of linked lists where teh value is used as the next index, then the sum of the lengths of all of the linked lists must be N. it seems this should be possible in O(N) but I can't see how to do it yet...".
Algo:
int i = 0;
int j = 0;
while(true) {
if (arr[i] > 0) {
arr[i] = - arr[i];
}
else {
printf("duplicate number at index %d\n", i);
break;
}
i = -1 * arr[i]; // we just made a[i] negative!
j++;
if (j == n) {
break;
}
}
// Again run a loop to make each number positive if it had become negative.
Proof:
1) If j < n and we are able to detect a cycle then there is a duplicate.
2) If j = n and we still did not catch cycle, then we have gone through each index and so each number is unique.
2) From pigeon-hole principle, If at least one number is missing, then a number must be duplicate
This is in fact similar to CodeInChaos's solution: the original question states
You're passed an array with n integers in it.
without saying unsigned integers, so we can toggle the numbers to be negative:
for each elements in array, say if it contains 8
, then go to element 8 and toggle the number to be negative. If it was already negative, that means it was toggled previously and therefore a duplicate is found.
otherwise, just go on, and if an element we reach has a negative number in it, just take the absolute number (-10
becomes 10
and go to element 10
to do the toggling)
If it can toggle every element without finding a value inside the array that we want to toggle but is already toggled, then we know there is no duplicate.
at the end, we just go through the array and toggle the elements back to positive numbers. (and O(n + n) is the same as O(n))
we can always negate the number, because such as for an 8-bit integer, the max is 127
, and the min is -128
, so 127 can always be toggled to be -127.
It is a bit like "using an extra bit", but make use of the fact that we can turn the number into a negative one.
if the given numbers are unsigned integers, we can still do it like this:
8 7 2 3 5 1 9 6 4
we use 2 pointers, one pointing at 8, one pointing at 1, can swap the numbers if the left one is greater than the median, and the right side is less than the median. If the number on either side is "already on the correct side", then just move the pointer one step forward and repeat the above. This will move all numbers to the left and right of the median in the correct group and is O(n).
1 4 2 3 5 8 9 6 7
now we have this array, we can do the toggling like the above by using 10 - n
... so if we want to toggle 1
, we put a 9
in it, if we want to toggle 6
, we put a 3
in it. The idea is, we make it so that it is not supposed to be on the left side, or not supposed to be on the right side, to use as a flag.
So for example, the number 1
, if it is "toggled", it will be 9
. The number 9
is not supposed to be on the left side of the array, so we know that this number has be toggled. It uses the same method as the algorithm first described in this post, only the initial toggling is done by making it a negative number, but this toggling is done by using (n+1) - i
This method involves moving the elements, so don't know if it counts as "destroying the array". (or we can use my other method in this thread that involves rearranging the items in array as well)
This wont work, see the comments ... just leaving it for discussions sake
Consider the following directed graph: n nodes (one for each array position) edges are from node k -> node a[k]:
Use Floyds Cycle finding algorithms to find a cycle and measure the length. If the length is equal to n there are no duplicates.
Only if the numbers are exactly the set 1..n the cycle length will be exactly n.
Cycle detection [1]
Even better use Brent's algorithm to find the cycle length directly ( Brents algorithm with Python example [2]):
[1] http://en.wikipedia.org/wiki/Cycle_detectionBrent's algorithm Richard P. Brent described an alternative cycle detection algorithm that, like the tortoise and hare algorithm, requires only two pointers into the sequence. 2 [3] However, it is based on a different principle: searching for the smallest power of two 2i that is larger than both λ and μ. For i = 0, 1, 2, etc., the algorithm compares x2i−1 with each subsequent sequence value up to the next power of two, stopping when it finds a match. It has two advantages compared to the tortoise and hare algorithm: it finds the correct length λ of the cycle directly, rather than needing to search for it in a subsequent stage, and its steps involve only one evaluation of ƒ rather than three.
You're passed an array with n integers in it. Each integer is between 1 and n (inclusive). These numbers are not in sorted order and there may be duplicates or missing numbers.
The problem is to figure out if there are any duplicates in the array in O(n) time and O(1) space without destroying the array
This is my solution (in Java). It is O(n) time and O(1) space and it does not destroy the array.
value & ((1 << 31)-1)
.Code example:
// Finds if array contains at least 1 duplicate
// O(n); Space(1); does not break array
public static boolean isDup(int a[]) {
int n = a.length, temp, mask = 1 << 31, i;
boolean result = false;
for (i = 0; i < n; i++) {
if ((temp = a[i]) < 0)
temp ^= mask;
if (a[temp - 1] > 0)
a[temp - 1] ^= mask;
else {
result = true;
break;
}
}
for (i = 0; i < n; i++)
if (a[i] < 0)
a[i] ^= mask;
return result;
}
In case for C or C++ you only need to change int n = a.length
in the method body.
Pro bono publico, answer to similar question - if you would need to print out all the duplicates. Here I'm exploiting the fact that xor is fastest operation and 0 is false in C (this will not work in Java).
// Finds all duplicates
// O(n log n); Space(1); does not break array
int i, j;
for (i = 0; i < n - 1; i++)
for (j = i + 1; j < n; j++)
if (!(a[i] ^ a[j]))
printf("Duplicate: %d : %d.", i, j);
<< 1
and >> 1
). Note also my solution changes the last bit on only for elements that it needs, in your case, that is not even possible. for (i = 0; i < n; i++) if (a[i] < 0) a[i] ^= mask;
should be something like int umask = mask - 1; for (; i >= 0; i--) a[i] &= umask;
, but I guess you get the point already. - Margus
I think rados was on to something. If in addition to the check that the sum adds up to n(n+1)/2, add a check to ensure that the product of the array is n! and if either check fails you have duplicates.
n(n+1)/2
and n!
under the given constraints. - IVlad
Basing on "http://www.wolframalpha.com/input/?i=Sum(e^(k/n*2*pi*i),1..n)"
if n is prime compute
sum=0
for k in 1..n: sum += exp(a[k]/n * 2*pi*i )
Now, it's enough to check if sum==0
.
(Here i = sqrt(-1)
and computation is on complex plane).
Primality of n is important, because there is no way to make the sum be 0 when there are duplicates (no subset of n-th order roots of 1 on the unit circle sums to 0, except the whole set). It can be however pretty close to 0 in some cases of repeating numbers and the difficulty is to asses how close.
Edit:
The precision of floats needs to be high enough to see if sum == 0. I don't know yet how to bound number of bits needed for complex numbers with respect to n.
A little late to the party but here's an i mplementation I wrote in Groovy a few months ago [1]:
// check if an list of n integers inclusive between 1 and n has a duplicate
def hasDup(list) {
def i=0, hasDup = false
list.each { item ->
def key = Math.abs(item) - 1
def value = list[ key ]
list[ key ] = -value //mark item as visited by making negative
}
list.eachWithIndex{ item, pos ->
if(item > 0)
hasDup = true //thus item as current position was not visited
list[pos] = Math.abs(item) //restor the value to it's original positive value
}
return hasDup
}
assert hasDup([1,2,3,4,5,5]) == true
assert hasDup([1,2,3,4,5]) == false
assert hasDup([1,2,2,3,4,5]) == true
assert hasDup([1,2,2,3,3,5]) == true
assert hasDup([1,1,3,3,5,5]) == true
assert hasDup([5,5,3,3,1,1]) == true
assert hasDup([2,4,3,3,1,2]) == true
assert hasDup([5,1,1,5,2,5]) == true
assert hasDup([1,6,1,6,1,6]) == true
assert hasDup([1,6,1,6,1,6,6,2]) == true
assert hasDup([1,6,1,6,1,6,6,6,1,2]) == true
assert hasDup([5,1,4,2,3,7,6]) == false
assert hasDup([4,3,2,1]) == false
println "all assertions valid"
[1] http://groovyconsole.appspot.com/script/232001Interesting puzzle. If one assumes that array elements are sized to precisely fit numbers 1..N, then there will be a tiny bit of extra space in the array, but I'm not sure how one could really exploit it, especially since the number of array elements one has to munge to get each extra bit of storage will increase with N. Indeed, experiments with values of N up to 2^24 suggest that one will have to munge about 90.9% of the numbers to free up 'n' bits (this ratio is essentially the same with n=2^16 or n=2^24). If one could free up 'n' bits in linear time, of course, the problem would readily be solved from there on out, but I don't see any nice way of subdividing the problem which wouldn't result in NlgN or worse time.
n = 0
i = 0
retval = true
while true
next_i = a[i]
i = next_i
a[i] = -a[i]
n++
if n == array.size then break
end
for i in 0..array.size do |i|
if array[i] > 0
retval = false
else
array[i] = -array[i]
end
end
return retval
hope this works
Here's an interesting (though ultimately flawed) approach for algorithm connoisseurs:
This is O(n) time and O(1) space.
It's clearly not a perfect solution as it can fail if n is too large (it can't distinguish between 1 and 69720375229712477164533808935312303556801 for example).
It possibly may fail for some other very carefully arranged input. But I haven't yet been able to find a case with 64-bit integers where it doesn't work. Counterexamples welcome!
Edit: Below, I'm going to assume that the O(1)
space requirement "really means" that we can't store more than a constant number of integers, not bits. Yes, I realize you can store anything encoded in an arbitrarily-large integer. Oh well :-) Note that even storing n
is going to violate the space requirement by requiring O(log n)
bits of storage.
[Assuming the array contains only elements from [n]
, as stated by the OP:]
The array is a permutation of [n]
(that is, the array passes the OP's check) if and only if the sum of elements equals T_n
and the product of elements equals n!
.
The reasoning: Suppose the array A
is a permutation of [n]
. Now suppose we want to generate an array A'
to "fool" the above-implied algorithm that checks sum(A)
and product(A)
. Without loss of generality, we begin with a sum-preserving alteration. Take any two elements x
and y
of A
. To preserve sum, we change them to x-k
and y+k
for some 0<k<min(x,n-y+1)
. Now, for this alteration to be product-preserving, it must be that xy=(x-k)(y+k)
. This simplifies to k=x-y
. Finally, this implies that the sum-preserving alteration has not, in fact, altered anything; x
and y
have simply switched locations in the array. Thus, A
remains a permutation.
Edit: The above reasoning will not work, as one can essentially perform another sum-preserving alteration to compensate for the product difference generated by the first one. Thanks go to the commenters for patiently getting me round to this.
What about checking duplicates THEN summing all the array elements? The problem is that how to check duplicates with O(n) time complexity? It sounds impossible, but there may be special mathematical properties with array of [1..n] which the interviewers may be wanting you to be a math contest winner. Or, they may be expecting you to mention about quantum computers.
My previous attempt, "If the array has no duplicates, then the sum is 1+2+...+n" failed, since sum [1,2,3,4] = sum [2,2,3,3].
This may help: http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html
If array is a permutation of 1,2,3,...,n then:
The sum of the entries in the array will equal a known value
AND
The sum of each entries squared will equal a known value.
Compute both of these values and see if they match the expected output. I bet it's impossible (or at least very hard) to find an array such that both conditions hold and the input array is not a permutation.
For example:
Is {1 4 3 3} a permutation of {1 2 3 4}?
Well:
sumOfEntries( {1 2 3 4} ) = 10
sumOfEntriesSquare( {1 2 3 4} ) = 1 + 4 + 9 + 16 = 30
But:
sumOfEntries( {1 4 3 3} ) = 11
sumOfEntriesSquare( {1 4 3 3} ) = 1 + 16 + 9 + 9 = 25
{1 4 3 3} is not a permutation because 10 != 11
{1 4 3 3} is not a permutation because 30 != 25
If the array has no duplicates, the sum of the elements of the array will equal ((n+1)*n)/2.
((n+1)*n/2
it does not follow that the array has no duplicates. - Don Roby
If the numbers are unique they can be sorted in O(n). Assume the numbers are in an array a
indexed 1..n. Go to element 1, if it is 1, go to the next element and continue this way until there is an element out of place, say m. Go to a[m]. If a[m] == m there is a duplicate, otherwise set n = a[m], a[m] = m, m = n, and continue. Eventually you will end up where you started. The starting element is now in place and you can continue as before. This will only fail to sort the array if there are duplicate elements and that will be detected in the process.
here's something that came to mind
let s = 0
go through the array, and for every element a[i],
if a[i] > i, s++
if a[i] < i, s--
else do nothing
if you finish with s = 0, there are no duplicates.
my intuition is that if every number from 1 to n appears once, this sum will always be zero regardless of permutation, but I don't really know if this is true
unfortunately this requires O(logn) space even if it does work.
Use a vector as a set to record each integer seen as you iterate through the array. The set will have constant (O(1)) size: one entry for each possible integer so the maximum number of entries is always MAX_INT. Only one iteration through the array is needed, so O(n) provided that the vector operations are constant.
(Update: per Jessica and Fabio's comment, need to check the number being less than or equal to n
, and also the numbers add up to (1+n) * n / 2
. I was assuming at first that they add up to that number, but just some number(s) are greater, some number(s) smaller, to compensate with each other, and with that condition, can they multiply all together to be exactly n!
?)
(Update: the idea is, find a mathematical property, when f(1, 2, 3, ..., n) gives a number that cannot be duplicated with a different set of numbers. If f alone is not enough, then maybe f and g together, or maybe f, and g, and h together, but that's the idea of approaching this problem this way)
I am suspecting we can multiply all the elements, and check it against n!
(n factorial).
Assuming the floating point is accurate no matter what order the numbers are multiplied.
Or we can use integer (bignum), but I think strictly speaking, then it is O(n log n).
This requires a mathematical proof, it is like, we expect a perfect occurance of 1, 2, 3, ..., n. If there is any deviation from it, they can't multiply to be exactly n!
Also, if we look at:
1 2 3 4 5 6 7 8 9
if we first make sure there is a medium 5
, and the number left of it, 4
, and right to it, 6
-- if any of those number doesn't exist, we know there must be a duplicate by pigeon hole principle. Now, we go through the unsorted array and calculate (a[i] - 4) ^ 2
and sum them up, as well as (a[i] - 6) ^ 2
and sum them up, and compare to what it is a perfect 1 2 3 4 5 6 7 8 9
, since if we have the case 1 2 3 3 5 7 7 8 9
, finding the deviation from 5
by (a[i] - 5) ^ 2 is not useful, if we shift the point where the deviation is comparing to, then they can't be equal. Also, we can use (a[i] - 4 ) ^ 3 instead of squared. It really depends on finding a math property where a perfect f(1, 2, 3, ..., n) cannot be exactly the same as f(n1, n2, n3, ... ), or if f alone is not enough, a combination of f and g that together ensures that when equal, only possible that the numbers must be from 1 to n, sequentially.
n
- 動靜能量
(1+n)*n / 2
- 動靜能量
n
becomes 20 and it hasn't broken, then your algorithm can take very long time to run!? (20^20 is 1.05E+26) - 動靜能量
You could use a bitmask variable v=0, and set, for each number found i,
v |= 1<<(i-1)
At the end, check if
v == (1<<n)-1
This works for n<64 for a long v, and can be extended easily for a greater n
This isn't really O(1) space, but neither n(n+1)/2 is!!
n<64
seems somehow more limiting than that n<2**64
- sepp2k
You can probably try this code: (this is python)
integers = [ 1, 4, 5, 3, 2 ]
sum = 0
for i in integers:
sum += 2 ** (i - 1)
if sum != 2 ** (len(integers)) - 1:
print 'Duplicate'
else:
print 'Perfetto'
Which only runs through len(integers) O(n) and uses sum for space O(1)
O(1)
and the time isn't O(n)
either as exponentiation is not O(1)
. - IVlad
i
is 1000
, you're not going to be able to calculate 2^1000
in O(1)
, nor add them all together in O(1)
. - IVlad
O(1)
as per the OP's clarifications. What you cannot assume is that addition of n
32-bit integers without overflow is O(n)
, because you might have to implement your own data type to store the result, since it might be too big for the fundamental types. You don't have to worry about saving the values yourself obviously, you need to worry about writing a function that is fed the values and computes the answer in O(n). - IVlad
Alright, this is different, but it works!
I ran this test program (C#):
static void Main(string[] args) {
//Run 100 tests, each test with an array the size of the test number
for (int i = 1; i <= 100; i++) {
int[] temp = new int[i];//array the size of the test number
//Fill temp array with a permutation
for (int j = 0; j < i; j++) {
temp[j] = j + 1;
}
//Now that setup is done, test for a permutation
bool check = IsPermutation(temp);
Console.WriteLine("Permutation: " + check);
}
}
static bool IsPermutation(int[] array) {
int x = 0;
for (int i = 0; i < array.Length; i++) { //O(n)
//Because it's an XOR, it's commutative,
//and doesn't depend on the order of the array
x ^= array[i];
}
bool result = false;
switch (array.Length % 4) {
case 0:
result = (x == array.Length);
break;
case 1:
result = (x == 1);
break;
case 2:
result = (x == array.Length + 1);
break;
case 3:
result = (x == 0);
break;
}
return result;
}
So I put up my previous answer in a bit of a haste. My code wasn't very readable and didn't convey what I was doing. The actual algorithm is the IsPermutation method, everything else in the main method is setup for the IsPermutation method, therefore irrelevant for the O(n) requirement.
As for Jessica's { 4, 3, 2, 1 }:
First, XOR ALL elements, not just the first three. 4 ^ 3 = 7 ^ 2 = 5 ^ 1 = 4. So in my code, x is 4. Because we switch on array.Length % 4 == 0
, we check the array.Length == x
branch of the switch and it's true, so it's a permutation.
This should do the job:
bool has_duplicates(const int * a, int n) {
int expected = (n * (n + 1)) / 2;
int computed = 0;
for (int i = 0; i < n; ++i) computed += a[i];
return expected != computed;
}
Hint: "array with n integers" and "integer is between 1 and n (inclusive)"