I had to solve a problem earlier on today which was very interesting: finding the largest binary gap in a number. (The longest gap between two 1
s in a binary stream) For instance, 9 has a max binary gap of 2, since 9 in binary is 1001
, 529 has a max binary gap of 4, since 529 in binary is 1000010001
.
The way I solved it was probably not the best nor a mathematical solution, I simply converted the int
in Python to a binary string via bin(n)[2:]
, then found the index of all matches of 1 in the string, then looped through each and counted the difference between each index and returned the greatest result.
There has to be a better, more mathological (yes, I just made that up) solution for this. I'm terrible with math, which is why I reverted to using strings... does anyone have a purely mathematical/extremely performant solution to the task at hand? I'd like to learn from the pros :)
Looks like another job for
groupby
[1].
>>> from itertools import groupby
>>> n = 529
>>> max(sum(1 for i in g) for k,g in groupby(bin(n)[2:]) if k=='0')
4
Perhaps not the fastest, but reasonably so:*
% python -m timeit -s "from itertools import groupby; n = 10" "max(sum(1 for i in g) for k,g in groupby(bin(n)[2:]) if k=='0')"
100000 loops, best of 3: 5.46 usec per loop
% python -m timeit -s "from itertools import groupby; n = 10**100" "max(sum(1 for i in g) for k,g in groupby(bin(n)[2:]) if k=='0')"
10000 loops, best of 3: 128 usec per loop
On a MacBook Pro, 2.5 GHz Core i5, Python 2.6.5 running in 32-bit mode. (Doubling the bits seems to knock about 25% off the time.)
*Actually JBernardo's solution seems to roll right over this one for large numbers.
[1] http://docs.python.org/library/itertools.html#itertools.groupbyNo need to convert to binary string - you can test each bit using bit operators like |
and &
. I have written down the solution where you iterate once starting from the lowest bit keeping the max gap in the variable m
and the current running gap in the variable cur
:
def maxGap(n):
cur = 0
m = 0
p = 1
while (p <= n and (n & p) == 0): p <<= 1
while (p <= n):
if (n & p != 0):
m = max(cur, m)
cur = 0
else:
cur+=1
p <<= 1
return m
0xe7f6
it's 2.8 times slower, and for a really large number like 0xe7f683c0246322
it's a shade under 4 times as slow. (These done with timeit.timeit
on a 64-bit Python 2.7 on Ubuntu.) - Chris Morgan
(n & p) == 0
to n & p
and (n & p != 0)
to n & p
; make it over 1.5 times as fast (for large inputs) by changing m = max(cur, m)
to if cur > m: m = cur
. That still doesn't make it as fast as JBernado's answer. - Chris Morgan
maxGap
will run at least 3x slower (maybe 10x) than an equivalent C function (which would look almost identical, incidentally). - j_random_hacker
Another string based solution
def max_gap(x):
return len(max(bin(x)[2:].rstrip('0').split('1')))
for python2.6+ you can use format(x, 'b')
instead of bin(x)[2:]
for readability
bin(x)[2:]
is unsurprisingly about twice as fast as format(x, 'b')
. - Chris Morgan
On the face of it, this problem seems similar to normal bit counting, which has been very heavily analyzed. All of the solutions presented so far would be terribly slow by cyrptographers' standards (heavy bit count users). I'd guess that if you wanted the fastest easily implemented software solution, you'd want to do something with pre-computed tables. Look at http://gurmeet.net/puzzles/fast-bit-counting-routines/ for good code examples of approaches to bit counting (http://graphics.stanford.edu/~seander/bithacks.html is an interesting page too but may not help you with this particular problem). For consecutive bit counting, I think you'd want something like three tables - a "left zero count", "middle zero count" and "right zero count" table. Then precompute that for the biggest number size you can stomach (e.g. 16 bits) and chomp your inputs that number of bits at a time, tracking the largest consecutive block of zeroes you've found.
O(n log n)
in the number of bits, but you can eliminate the n
factor by exploiting the parallelism in register operations - two 16-bit operations can be handled as one 32-bit operation, four 8-bit operations can be handled as one 32-bit operation, and so on. The trouble with this gap problem is that gaps can span the divide - you don't have two independent subproblems. Not sure if this method applies. - Steve314
Yours is a perfectly good solution IMO. There's little you can do to improve the performance, in asymptotic terms at least. Probably not worth trying to optimise unless you're working in C or C++, and even then only if you need to optimise.
However, as a slightly different (no doubt slower, but perhaps simpler) solution...
>>> "10010001101".split ("1")
['', '00', '000', '', '0', '']
Use a list comprehension to map from those strings of zeros to the lengths, then extract the maximum.
EDIT As pointed out by fiver, this doesn't work if your number is even (the binary ends with one or more zeros). To fix the bug, take a look at the string.strip
method.
I should probably update this answer to include more ideas from the comments - for the moment, just be aware that the comments are worth reading.
strip
method. - Steve314
max(map(len, "100100001101".split("1")[1:-2]))
would work - BlueRaja - Danny Pflughoeft
map
) would be len(max('1001000'.split('1')))
. - zeekay
[1:-1]
. That works, because the split
ends the list with ""
if the list ends with a 1
(and begins the list with a ""
if the list begins with a 1
) - BlueRaja - Danny Pflughoeft
Another divide and conquer approach:
Shift the number right successively by 1, 2, 4, ... and apply OR, until it is all ones. Then you know the smallest power of two that is greater than the largest gap of zeros. Back up one step and repeat for the remaining zeros.
Whether this is an improvement would be interesting to see. I would guess that for 64 bit numbers it would be better in the average case but not in the worst case, compared to a loop that looks at the bits one by one.
Edit: In the average case it is substantially faster, by a factor of more than 9 for random 64 bit numbers. Here is the Java code of the implementations I used for benchmarking:
private static final int consecutiveZeros0(long n)
{
int result = 0;
long rest = n;
int count = 0;
while (rest != 0)
{
if ((rest & 1) == 0)
{
count++;
}
else
{
result = Math.max(count, result);
count = 0;
}
rest >>>= 1;
}
return result;
}
private static final int consecutiveZeros1(long n)
{
int result = 0;
long a = n;
while (!isAllOnes(a))
{
int count = 1;
a |= (a >> count);
while (!isAllOnes(a | (a >> count)))
{
a |= (a >> count);
count *= 2;
}
result += count;
}
return result;
}
private static boolean isAllOnes(long a)
{
return (a & (a + 1)) == 0;
}
I think I can suggest a mathematical way.....before reading remember that i am not a good ex-plainer so bear up with it..:) I have only thought of the algorithm and not implemented so bear up with some loopholes.
What i suggest is we can detect a pattern in the way the binary numbers change as the number increases.lets take an example if we want to take out the max binary gap for 27 starting from 16
10000 max=0
10001 max=3 [1 number]
10010 10011 max=2 [2 numbers]
10100
10101
10110
10111 max=1 [4 numbers]
11000 max=0
11001 max=2 [1 number]
11010 11011 max=1 [2 numbers]
11100 max=0
11101 max=1 [1 number]
11110 11111 max=0 [2 numbers]
this number(27) lies between 2^4 and 2^5 and from the above pattern we can see that any number lying in this range cannot have binary gap more than 3. above we can see for a range 0f 2^4-2^5 the values change in the interval of 2^3 then 2^2 and so on.... so by finding the interval in which the number we want to find teh binary gap of lies we can calculate its maximum binary gap. taking an example: the max binary gap follows the above pattern so when we get a number say 27 we first find the range within which it lies.we get 2^4-2^5. therefore value of max=3[4-1], now 27mod[lower range] we get 11 which greater that 8 so now what we know from this is that value of max=3-1 nor 11mod4 we get 3.before proceeding one more thing noteworthy in the pattern is that the value of max within the range also follows a pattern within first 8 number of the range 4[max=3]-->2[max=2]-->1[max=1]-->1[max=0](common to all) then for next 4 2[max=2]-->1[max=1]-->1[max=0] and so on.... coming back to the example.. after mod with 8 we get three so we know that the value of max=2 now(following the pattern 8-->4-->2-->1) note:before moding check if the remainder of the previous i larger than 2^n(whatever may be the case then) 3mod2 we get 1 making the value of max=1....which will be the answer for 27.....
I am really bad descriptor and ex-planer so please bear up with the explanation and comment on whatever you find confusing..... and do tell me the loopholes if you find any....:) hope this is the answer....
Well I can think of one way to do it mathematically, but it would still involve iteration.
Assume the indices of the binary number start from 0 at the least significant digit, and up one for each digit.
for the number n, the index of the first 1 is given by floor(log(n)). Then set n = n-2^floor(log(n)), repeat until n = 0. Then you will have the indices of all 1's and it is trivial to find the largest gap.
Logs are all base 2 of course. I'm sure there is a better way than this though.
I believe you can do it without converting the number to a string.
You can rotate the number to the right and check if the number is odd , if yes you have a 1 in the last bit else its a 0,now keep rotating and storing the distance(s) between the 1s and you will have the largest distance.
This solution uses the fact that x&(x-1)
is x
with its least significant 1 bit set to 0. It returns 1<<max_binary_gap
.
# Return Y,BIT where BIT is the least significant 1 bit of X, and
# Y is X shifted right to remove this bit.
def lsbit(x):
u = x & (x-1) # U = X with least significant 1 removed
bit = u ^ x # BIT = least significant 1
return ((x/bit)>>1), bit
# Return 0 if X is negative or does not have 2 bits set
# Otherwise, return 1<<max_binary_gap
def binary_gap(x):
if x <= 0: return 0 # Invalid input
gap = 0
x,bit = lsbit(x)
while x>0:
x,bit = lsbit(x)
gap = max(gap,bit) # Keep largest BIT in GAP
return gap