share
Stack OverflowSimple proof that GUID is not unique
[+323] [30] Kai
[2009-11-10 00:55:24]
[ c# guid ]
[ https://stackoverflow.com/questions/1705008/simple-proof-that-guid-is-not-unique ]

I'd like to prove that a GUID is not unique in a simple test program. I expected the following code to run for hours, but it's not working. How can I make it work?

BigInteger begin = new BigInteger((long)0);
BigInteger end = new BigInteger("340282366920938463463374607431768211456",10);  //2^128
for(begin; begin<end; begin++)
  Console.WriteLine(System.Guid.NewGuid().ToString());

I'm using C#.

(107) As a software developer, what would you say if a user came to you and said "it's not working"? - JoshJordan
(152) Wait several trillion years. - hobbs
(7) BigInteger is a Java construct. What are you using? - Nathan Taylor
(5) emm... I'm guessing GUID has some kind of time stamp in it. So perhaps you can try to have many threads running at the same time generaging GUID and save them all to a database with the GUID as the primary key.... and wait for it to fail? - Nestor
(6) serveral threads is a good idea - Kai
(67) Upmodded because this is the most amusing thing I've seen online today. - jrockway
(32) @jrockway - lol. I'm having troubling finding anything about this question that isn't fundamentally wrong. The longer I look at it, the funnier it gets. - tylerl
(8) You forgot to take the protons in your CPU decaying into account. - Dour High Arch
(243) It's only globally unique, so it's only unique on our planet. If you want a truly unique id you need to use a universally unique id (UUID). I assume that you're only interested in uniqueness within our universe. :-) - tvanfosson
(7) The more I read Kai's comments, the more I think this is a troll. - Ray
(13) Um, what happened to "No question is too trivial or too "newbie"."? We don't have to treat him bad just because we think his question is dumb! - RCIX
(1) Kai asked why his loop wasn't iterating in a comment, and from that i surmised his real question. Please answer that, instead of the question "why does genertaing guids take longer than i expect". - RCIX
(1) RCIX, I don't believe your edit captures Kai's intent. Any way I read it (and read his comments to the answers below), it's clear Kai's expecting to find duplicate GUIDs during a short run of the loop he posted. - Michael Petrotta
(1) That's beside the point. If you look at his comments to nathan taylor's answer below, you see he says he can't iterate biginteger. - RCIX
(6) @RCIX - but, if you look at his comments to rjmunro, it's clear he's also and primarily asking how to make this go faster. It's fundamentally flawed in that the hardware that is available to us is incapable of doing this in any reasonable period of time. He needs to step back and realize that what he wants is not possible. Let's also not forget the fact that he's simply dumping all these GUIDs to the console. Is he going to hand compare them all? - jasonh
(3) First time i loled on this site. - Sergey
(1) My point was merely that we should try our best to understand what he's really asking, and that we need not point out asides to a problem. - RCIX
(1) That said, i was transferring my feelings of upsettedness from sonewhere else and i am truly sorry for saying you guys were flaming him. - RCIX
(7) This is a real question. There's a lot for him to learn from some of the responses below such as stackoverflow.com/questions/1705008/… and stackoverflow.com/questions/1705008/… (full disclosure: I also have posted an answer to this question). This question is about programming and no question is too trivial or too "newbie". - jason
(19) Hmm, seems correct, it should print 42 after a while... - pihentagy
(2) Even though people are bashing this guy for the question. This question actually generated some interesting answers. Let alone the amusement it has come of it. =D - Nuno Furtado
(2) Shouldn't that be 340282366920938463463374607431768211457? - Jonas Elfström
(2) @Nathan Taylor: BigInteger was added in .NET 4 (msdn.microsoft.com/en-us/library/…); however, this code uses a non-existent constructor overload (string, int) so it will not compile. - Bradley Grainger
(15) People, it's obviously taking too long because C# is such a slow language. He should use C. - ptomato
(2) "In theory, theory and practice are the same. In practice, they are not." -- Lawrence Peter Berra - Andrew Lewis
(2) I remember when, I had to let a program run for days on my 286 IBM-compatible computer. I have to smile because Kai is complaining that his program is not finishing within "hours". - AMissico
lol @ptomato. being a C# programmer that still made me laugh. - lb.
(2) Simple program that will take longer: Guid test = Guid.NewGuid(); while (true) if test.Equals(Guid.NewGuid()) throw new Exception("Duplicate found!"); - Jason Goemaat
(1) I am disappointed by the fact that most of the responses to this question have been harassing Kai for his simple brute force attempt to prove that one GUID could conceivably match another. Statistical unlikeliness aside, I see no one denying the possibility. I believe the spirit of his question was, how could he prove his hypothesis. - Nathan Hartley
[+405] [2009-11-10 04:23:25] ligos [ACCEPTED]

Kai, I have provided a program that will do what you want using threads. It is licensed under the following terms: you must pay me $0.0001 per hour per CPU core you run it on. Fees are payable at the end of each calendar month. Please contact me for my paypal account details at your earliest convenience.

using System;
using System.Collections.Generic;
using System.Linq;

namespace GuidCollisionDetector
{
    class Program
    {
        static void Main(string[] args)
        {
            //var reserveSomeRam = new byte[1024 * 1024 * 100];     // This indeed has no effect.

            Console.WriteLine("{0:u} - Building a bigHeapOGuids.", DateTime.Now);
            // Fill up memory with guids.
            var bigHeapOGuids = new HashSet<Guid>();
            try
            {
                do
                {
                    bigHeapOGuids.Add(Guid.NewGuid());
                } while (true);
            }
            catch (OutOfMemoryException)
            {
                // Release the ram we allocated up front.
                // Actually, these are pointless too.
                //GC.KeepAlive(reserveSomeRam);
                //GC.Collect();
            }
            Console.WriteLine("{0:u} - Built bigHeapOGuids, contains {1} of them.", DateTime.Now, bigHeapOGuids.LongCount());


            // Spool up some threads to keep checking if there's a match.
            // Keep running until the heat death of the universe.
            for (long k = 0; k < Int64.MaxValue; k++)
            {
                for (long j = 0; j < Int64.MaxValue; j++)
                {
                    Console.WriteLine("{0:u} - Looking for collisions with {1} thread(s)....", DateTime.Now, Environment.ProcessorCount);
                    System.Threading.Tasks.Parallel.For(0, Int32.MaxValue, (i) =>
                    {
                        if (bigHeapOGuids.Contains(Guid.NewGuid()))
                            throw new ApplicationException("Guids collided! Oh my gosh!");
                    }
                    );
                    Console.WriteLine("{0:u} - That was another {1} attempts without a collision.", DateTime.Now, ((long)Int32.MaxValue) * Environment.ProcessorCount);
                }
            }
            Console.WriteLine("Umm... why hasn't the universe ended yet?");
        }
    }
}

PS: I wanted to try out the Parallel extensions library. That was easy.

And using OutOfMemoryException as control flow just feels wrong.

EDIT

Well, it seems this still attracts votes. So I've fixed the GC.KeepAlive() issue. And changed it to run with C# 4.

And to clarify my support terms: support is only available on the 28/Feb/2010. Please use a time machine to make support requests on that day only.

EDIT 2 As always, the GC does a better job than I do at managing memory; any previous attempts at doing it myself were doomed to failure.


(120) That last Console.WriteLine made me laugh real hard. I think you should throw a CommonlyAcceptedCosmologicTheoriesWrongException instead. - R. Martinho Fernandes
(17) does marking this as Accepted also means that @Kai accepts the terms stipulated by @ligos? - kb.
(3) Setting reserveSomeRam = null; doesn't actually accomplish anything. - DevinB
(4) @devinb please explain? it looks like it is freeing the bytes it was previously allocated so that the GC can Collect() it. Why doesn't it accomplish anything? - mythz
(1) @ligos: You need to put a GC.KeepAlive(reserveSomeRam) somewhere--probably after the try/catch--in order to prevent collection of the array. Hopefully you're offering eternal customer support with your licensing terms... - Curt Nichols
(2) @ligos: a few comments: code you put in SO answers must be public domain; I believe you mean "universe" and not "univese"; I ran your program in the future; I ran it in reverse time so you'd have to pay me money; surprisingly it didn't find a collision and the reason that the universe has not ended has something to do with highly charged recursive damma fluctuation cores. - yairchu
(1) @mythz: The JIT will detect that the array referenced by reserveSomeRam is never used (i.e., the value in reserveSomeRam is never read), so it will be GCed long before the variable is nulled out. As @Curt Nichols mentions, a call to GC.KeepAlive would need to be added to prevent this. - Bradley Grainger
@BradleyGrainger I see it now - Good Catch :) - mythz
(1) @yairchu: executing my program at any time other than now is not in the EULA, my legal team will be contacting you last Thursday. - ligos
@ligos, now the memory will never be deleted, plus, the compiler might be too smart for the suggested solution to work correctly anyway (realize the only way out of the loop is through the catch and thus, sees that the original assignment is still never used) - Grant Peters
"And using OutOfMemoryException as control flow just feels wrong". If it was written in VB it would be common practice. - Evan Plaice
(3) GuidCollisionDetector. The name has potential - Ufuk Hacıoğulları
(3) Thanks for this code. I have put it in my GUID generator code for making cross database foreign keys and it's been a great help in preventing all the collisions we've been experiencing. However, now my application freezes up and I can't see why?! Must be a .Net framework bug or something. - WOPR
(2) 7 hours on clock and no collision. I give up! - Vercas
(2) i'm suing for my money back... I ran this program for 3,654,5334 years before realizing that many of the guid's were not in the original hashset! in fact, every time there was no collision, a future collision was lost! I even bought a machine with 64 exabytes of ram only to realize that .net 4 can only utilize about 9 exabytes at most! this program is a sham! All those lost guids wandering aimlessly in cyberspace..i want my $100 trillion back! - NightDweller
I hate to point out the obvious, but it's perfectly practical to iterate over every 64-bit integer, given a few years and a few powerful machines. We certainly don't require until the end of the universe. - Nick Johnson
@Nick - You realise there's two loops right? It should iterate (2^63)^63 times. Of course, because I'm running things in parallel, it will be much faster ;-) - ligos
(1) @NightDweller - I'll just get the quantum version out of that box I put my cat in the other day... - ligos
@ligos I missed that. Three, actually, since the parallel for is another one. - Nick Johnson
Coming late to the party, but OutOfMemoryException cannot be caught with try/catch. It is an asynchronous exception. - x0n
The code won't work! The HashSet will reach its limits which are AFAIK 2^32 elements. Even when writting your own HashSet the code is crashing when 2^64 elements are reached, which is the actual limit of C# Arrays. - Felix K.
1
[+226] [2009-11-10 01:02:12] rjmunro

This will run for a lot more than hours. Assuming it loops at 1 GHz (which it won't - it will be a lot slower than that), it will run for 10790283070806014188970 years. Which is about 83 billion times longer than the age of the universe.

Assuming Moores law [1] holds, it would be a lot quicker to not run this program, wait several hundred years and run it on a computer that is billions of times faster. In fact, any program that takes longer to run than it takes CPU speeds to double (about 18 months) will complete sooner if you wait until the CPU speeds have increased and buy a new CPU before running it (unless you write it so that it can be suspended and resumed on new hardware).

[1] http://en.wikipedia.org/wiki/Moores_law

(27) damn - so maybe serveral threads generating guids is a better idea? - Kai
(107) 4 threads on a quad core processor would make it run in 20 billion times the age of the universe - so yeah, that would help a lot. - rjmunro
(34) I am suspicious that this is a troll, but on the off chance that it isn't: threads are not magical. If you can do a billion operations per second on one thread, then going to ten threads means that each one runs 1/10th as often. Each thread does 100 M operations per second; the total number of operations per second is not increased. The way to increase number of operations per second is to buy more computers. Suppose you bought a billion more computers. That would reduce the problem to only taking 10790283070806 years, which is still more than four hours. - Eric Lippert
(10) I think rjmunro is assuming each thread would run on a separate core; 83 billion universes / 4 cores indeed approximately equals 20 billion universes. Time to buy Intel stock! - Dour High Arch
Would change my downvote to an upvote but it says the vote is too old unless the answer is edited... :( - RCIX
(1) aren't you all going off on one? GUIDs are inheriently non random. They are based on the MAC address of the TCP/IP adaptor in yr machine. The Time they were generated and a random number. So given this weighting they are a lot easier to predict if you have the original machine and no when it was generated. See below. - AnthonyLambert
Why don't we just use 83 billion processors? =) - Erik Forbes
(1) @Tony Lambert - That was one of the GUID generation algorithms, but isn't widely used nowadays because of privacy concerns. - Greg Beech
(4) @Erik 83 billion processors means that you will be able to do it in about the amount of time the universe has existed so far. So even that's not nearly enough. - rjmunro
@Eric: Your answer would only be correct on a single-core machine, right? - Steven Sudit
(1) @Steven. Yes. I suppose I should have said "buy more cores". - Eric Lippert
(1) @Eric: Yes, roughly 83 billion more... - Steven Sudit
(1) If I remember correctly generating GUIDs on this platform is serialized, so don't waste your time refactoring your code for multiple cores. - Curt Nichols
(2) Moores law does not in fact talk about CPU speeds. It talks about the number of transistors per CPU. Which actually means that threads will become very, very relevant to the argument of waiting for faster hardware. - Michael Borgwardt
(4) At your rate of 1 billion GUIDs per second it would only take 634 years to have a 50% chance of a collision. - Jason Goemaat
Hmm. I would recommend putting some HTC to work with some smart parallelisation (alternatively GPU). @Michael Borgwardt: Wasn't it about of number of transistors per inch squared (alternativly cm^2? - Maja Piechotka
@Maciej: actually, no. Moore's original statement was about "number of components per integrated circuit" and "I believe that such a large circuit can be built on a single wafer." - Michael Borgwardt
2
[+170] [2009-11-10 01:07:30] tylerl

A GUID is theoretically non-unique. Here's your proof:

  • GUID is a 128 bit number
  • You cannot generate 2^128 + 1 or more GUIDs without re-using old GUIDs

However, if the entire power output of the sun was directed at performing this task, it would go cold long before it finished.

GUIDs can be generated using a number of different tactics, some of which take special measures to guarantee that a given machine will not generate the same GUID twice. Finding collisions in a particular algorithm would show that your particular method for generating GUIDs is bad, but would not prove anything about GUIDs in general.


(44) Pigeonhole Principle to the rescue! - yfeldblum
(22) +1 for the sun going cold comment. There was an interesting comment somewhere about the pointlessness of encryption keys > 256 bits. To iterate over all possible key values would take more energy than the entire universe holds. Toggling a bit in the CPU requires a small amount energy (it's what generates the heat) which, when multiplied up 2^256 times is a really massive number exceeding the energy stored in the universe, using E=mc2 the universe would need mass of 2^227kg, our sun is 2^101kg so thats 2^126 suns! - Skizz
(31) @Skizz: This is true for brute force attacks only. When an encryption scheme is "broken", it means that it can be solved in less time than brute force, but the solve time remains proportional to the key size. - Steven Sudit
(1) @StevenSudit: proportional to exponent of the key size (unless P==NP) - Ihar Bury
(1) @Orlangur Proportional to the key size measured in bits. - Steven Sudit
@StevenSudit No. If you double number of bits it will not double the solve time. 2048 bit keys require orders of magnitude more time to break than 1024 bit ones. - Ihar Bury
@Orlangur Each bit doubles the search space. - Steven Sudit
(1) Holy Mother of Jebus. Quit arguing. xkcd.com/386 - tylerl
3
[+137] [2009-11-10 03:47:21] jason

Of course GUIDs can collide. Since GUIDs are 128-bits, just generate 2^128 + 1 of them and by the pigeonhole principle [1] there must be a collision.

But when we say that a GUID is a unique, what we really mean is that the key space is so large that it is practically impossible to accidentally generate the same GUID twice (assuming that we are generating GUIDs randomly).

If you generate a sequence of n GUIDs randomly, then the probability of at least one collision is approximately p(n) = 1 - exp(-n^2 / 2 * 2^128) (this is the birthday problem [2] with the number of possible birthdays being 2^128).

   n     p(n)
2^30 1.69e-21
2^40 1.77e-15
2^50 1.86e-10
2^60 1.95e-03

To make these numbers concrete, 2^60 = 1.15e+18. So, if you generate one billion GUIDs per second, it will take you 36 years to generate 2^60 random GUIDs and even then the probability that you have a collision is still 1.95e-03. You're more likely to be murdered at some point in your life [3] (4.76e-03) than you are to find a collision over the next 36 years. Good luck.

[1] http://en.wikipedia.org/wiki/Pigeonhole_principle
[2] http://en.wikipedia.org/wiki/Birthday_paradox
[3] http://reason.com/archives/2006/08/11/dont-be-terrorized

(239) If you're murdered at some point in your life, odds are it'll be at the end. - Michael Myers
(25) @mmyers: Excellent point. That means that my chances of being murdered right now are absurdly low, since this isn't the end of my life. Oh, wait... - Steven Sudit
Also, if two GUIDs happen to be created within a short period, the chances they are used within the same system is slight. Therefore, this increases uniqueness. - AMissico
These numbers and reference to the birthday problem are meaningless. GUID generation algorithms don't generate values over the entire range with equal probability. In fact IIRC the original algorithm used the MAC address of the generating PC + the current time as part of the result - which reduces the risk of collision with Guids generated on other PCs, but of course reduces the key space. - Joe
@Joe: Who says you have to use an off-the-shelf generation algorithm? - jason
(17) You're assuming that the probability of being murdered is a constant for all human beings. But clearly people who write snide remarks in forum posts are the kind of people who are more likely to be murdered than the average person. - Jay
@Joe: The standard algorithm used nowadays is a random number, with a few bits set to indicate the generation algorithm. So there are slightly less than 128 random bits. You're right that the original algorithm used a timestamp and MAC code, but there was a big fuss in the press about "Microsoft inserting the computer's identity [MAC address] in a hidden place [in a GUID] in MS office documents," so they switched to the random scheme. - user9876
4
[+61] [2009-11-10 03:56:20] ctacke

If you're worried about uniqueness you can always purchase new GUIDs so you can throw away your old ones. I'll put some up on eBay if you'd like.


(13) Cool - how much for the complete set, from 0 to (2^128)-1 ? - user180247
(23) On sale, $0.01 per 1k GUIDs. I'll throw in some bamboo wind chimes if you order in the next 60 minutes. - ctacke
(7) My set is more exclusive and of higher quality. They are double checked and verified which makes them worth the $1 per GUID. You can even buy them in batches if you don't want to make the full investment in one go. I will have to charge an extra $10 per batch though. - Thomas
(3) I'll set you up on a monthly plan and give you unlimited guids for the right price. ^ Those guys are trying to scam you and sell you overpriced guids. I'll sell you quality guids made in China! - ErocM
5
[+47] [2010-06-03 11:49:47] AMissico

Personally, I think the "Big Bang" was caused when two GUIDs collided.


(4) Just remember It takes a "special" kind of programmer to do that... - AnthonyLambert
I'd like to hear your reasoning to your theory. I think we could start a new religion based on this and recruit T.Cruise! - ErocM
@ErocM; See "Brane cosmology" (en.wikipedia.org/wiki/Brane_cosmology) and "Membrane (M-Theory)" (en.wikipedia.org/wiki/Membrane_(M-Theory)). The idea is if two branes touch a new universe is created. Therefore, you can infer that if two GUIDs touch, a new universe is thus created. - AMissico
(2) If Timecop taught us anything is that the same matter cannot occupy the same space at any given time. So if two GUIDs where to collide, they would consume each other and the resulting implosion would generate a black hole, gobbling up the entire universe. So in reality, it wouldn't create a Universe, it'll destroy it. - AJC
6
[+42] [2009-12-06 07:46:25] R. Martinho Fernandes

You can show that in O(1) time with a variant of the quantum bogosort [1] algorithm.

Guid g1 = Guid.NewGuid();
Guid g2 = Guid.NewGuid();
if(g1 != g2) Universe.Current.Destroy();
[1] http://en.wikipedia.org/wiki/Bogosort#Quantum_Bogosort

(21) I'm getting an exception when calling Destroy(). Based on the text, I think my computer lacks the necessary hardware to destroy the current universe. Do you know where I might be able to obtain it? - Steven Sudit
(11) @Steven: Nah, some management guys got too worried about how bad that API would look to the public, and dictated it to always fail for "security reasons". If you look at the method's source there's just that one line: throw new MundaneHardwareException();. Anyway, I heard the guys at CERN have some kind of Big Hadron Thingy that might do the trick... - R. Martinho Fernandes
(7) @Martinho: Ah, ok. I'll look into replacing Universe.Current.Destroy() with Cern.Lhc.DestroyThisUniverse(). - Steven Sudit
(61) I knew there was a reason I programmed in Haskell. These side effects are getting scary. - Edward Kmett
(6) "There is a theory which states that if ever anyone ever discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarrely inexplicable. There is another theory that states that this has already happened." -- Douglas Adams, The Hitchhiker's Guide to the Galaxy - Mike Pirnat
(1) I am confused. Shouldn't it be if(g1 == g2)? - AMissico
(2) @AMissico: I believe the point is to destroy all possible universes in which you don't find a match so that in the surviving universe(s) you do have a match. Of course, I wonder if the assignments would need to be something like Guid g1 = Universe.Quantum.Branch(Guid.NewGuid()); Otherwise, the deterministic nature of pseudorandom generation of Guid.NewGuid() would tend to produce the same result in all universes branching from the same initial execution. - Rob Parker
(1) @Rob: surely if we can destroy universes, we can have truly random (i.e., quantum-based) generators :P - R. Martinho Fernandes
(1) @Martinho: Sure, but that wouldn't be in the built-in implementation of Guid.NewGuid() would it? But my code example above is probably incorrect; you would need to use your truly-random source or a more exact Universe.Quantum.Branch.ChooseBytes(16) (returns a byte array of specified length with a 1-for-1 distribution among the necessary branches generated) to then feed to the Guid creation (or set the value into a blank Guid instance). Or you could chain 128 calls to Universe.Quantum.Branch.Fork() to build up the bits (wich is what ChooseBytes(int) does under the covers). - Rob Parker
@MikePirnat: My theory is that, as soon as this mechanism was discovered, it was replaced by some much mire bizarrely inexplicable mechanism. (And, no, a theory that this had already happened would not make sense.) - sbi
7
[+28] [2009-11-10 00:57:08] Graviton

Any two GUIDs are very likely unique (not equal).

See this SO entry [1], and from Wikipedia [2]

While each generated GUID is not guaranteed to be unique, the total number of unique keys (2^128 or 3.4×10^38) is so large that the probability of the same number being generated twice is very small. For example, consider the observable universe, which contains about 5×10^22 stars; every star could then have 6.8×10^15 universally unique GUIDs.

So probably you have to wait for many more billion of years, and hope that you hit one before the universe as we know it comes to an end.

[1] https://stackoverflow.com/questions/39771/is-a-guid-unique-100-of-the-time
[2] http://en.wikipedia.org/wiki/Globally_Unique_Identifier

so is 2^128 not the correct number of possible guids? - Kai
(21) It is. Why do you think 2^128 is a small number? - jrockway
Yes, 2^128 is the correct number of possible guids. - Graviton
(3) That's a hell of a number. $ irb >> 2**128 => 340282366920938463463374607431768211456 - adamJLev
(45) @Infinity - Even to you? - Austin Richardson
8
[+27] [2009-12-06 08:45:45] Stephen M. Redd

[Update:] As the comments below point out, newer MS GUIDs are V4 and do not use the MAC address as part of the GUID generation (I haven't seen any indication of a V5 implementation from MS though, so if anyone has a link confirming that let me know). WIth V4 though, time is still a factor though, and the odds against duplication of GUIDs remains so small as to be irrelevant for any practical usage. You certainly would not be likely to ever generate a duplicate GUID from just a single system test such as the OP was trying to do.

Most of these answers are missing one vital point about Microsoft's GUID implementation. The first part of the GUID is based on a timestamp and another part is based on the MAC address of the network card (or a random number if no NIC is installed).

If I understand this correctly, it means that the only reliable way to duplicate a GUID would be to run simultainous GUID generations on multiple machines where the MAC addresses were the same AND where the clocks on both systems were at the same exact time when the generation occured (the timestamp is based on milliseconds if I understand it correctly).... even then there are a lot of other bits in the number that are random, so the odds are still vanishingly small.

For all practical purposes the GUIDs are universally unique.

There is a pretty good description of the MS GUID over at "The Old New Thing" blog [1]

[1] http://blogs.msdn.com/oldnewthing/archive/2008/06/27/8659071.aspx

(3) Thats actually doable when using virtualization. You can and you do get duplicate guids. - Goran
(8) Raymond is outdated on the MAC Address part though, Microsoft doesn't use these anymore. See en.wikipedia.org/wiki/GUID#Algorithm for the difference between V1 and V4 Guids. - Michael Stum
(1) This is no longer the case. The current V5 scheme is just 128 bits of pure pseudorandom goodness. - Edward Kmett
funny how you state everything I did a month later than me and you get 16 points and I still have 0? - AnthonyLambert
(1) Ya Tony, there is something odd with that. Back when I answered the post, there were only 3 or 4 answers, and I didn't remember seeing yours... if I had, I'd just have upvoted it. I typically don't answer questions when there are other answers already that cover it well enough (which is why I have a rather low overall rep probably). - Stephen M. Redd
9
[+23] [2009-11-10 04:31:10] KristoferA

Here's a nifty little extension method that you can use if you want to check guid uniqueness in many places in your code.

internal static class GuidExt
{
    public static bool IsUnique(this Guid guid)
    {
        while (guid != Guid.NewGuid())
        { }
        return false;
    }
}

To call it, simply call Guid.IsUnique whenever you generate a new guid...

Guid g = Guid.NewGuid();
if (!g.IsUnique())
{
    throw new GuidIsNotUniqueException();
}

...heck, I'd even recommend calling it twice to make sure it got it right in the first round.


(2) How does this ensure that this guid has never been generated anywhere else in this world? :p Heck we need a world guid pool. :) - nawfal
10
[+19] [2009-11-10 04:03:41] user180247

Counting to 2^128 - ambitious.

Lets imagine that we can count 2^32 IDs per second per machine - not that ambitious, since it's not even 4.3 billion per second. Lets dedicate 2^32 machines to that task. Furthermore, lets get 2^32 civilisations to each dedicate the same resources to the task.

So far, we can count 2^96 IDs per second, meaning we will be counting for 2^32 seconds (a little over 136 years).

Now, all we need is to get 4,294,967,296 civilisations to each dedicate 4,294,967,296 machines, each machine capable of counting 4,294,967,296 IDs per second, purely to this task for the next 136 years or so - I suggest we get started on this essential task right now ;-)


11
[+17] [2010-05-31 15:22:52] kibitzer

Well if the running time of 83 billion years does not scare you, think that you will also need to store the generated GUIDs somewhere to check if you have a duplicate; storing 2^128 16-byte numbers would only require you to allocate 4951760157141521099596496896 terabytes of RAM upfront, so imagining you have a computer which could fit all that and that you somehow find a place to buy terabyte DIMMs at 10 grams each, combined they will weigh more than 8 Earth masses, so you can seriously shift it off the current orbit, before you even press "Run". Think twice!


12
[+12] [2009-11-10 00:58:40] Nathan Taylor
for(begin; begin<end; begin)
    Console.WriteLine(System.Guid.NewGuid().ToString());

You aren't incrementing begin so the condition begin < end is always true.


(1) no - cause i can't iterate bigint - Kai
(3) Does it really matter if he loops forever versus looping 340282366920938463463374607431768211456 times? - Jay
(3) so... would you rather be punched 340282366920938463463374607431768211456 times or forever!?!?!? - ErocM
actually this is what really answers to the question! and no votes at all :p - nawfal
13
[+11] [2010-06-01 02:12:41] Matt Peterson

If GUID collisions are a concern, I would recommend using the ScottGuID [1] instead.

[1] http://weblogs.asp.net/leftslipper/archive/2010/04/01/last-guid-used-up-new-scottguid-unique-id-to-replace-it.aspx

14
[+9] [2009-11-10 01:33:46] MZB

Presumably you have reason to be believe that the algorithm for producing Guids is not producing truly random numbers, but is in fact cycling with a period << 2^128.

e.g. RFC4122 method used to derive GUIDs which fixes the values of some bits.

Proof of cycling is going to depend upon the possible size of the period.

For small periods, hash table of hash(GUID) -> GUID with replacement on collision if GUIDs do not match (terminate if they do) might be an approach. Consider also only doing the replacement a random fraction of the time.

Ultimately if the maximum period between collisions is large enough (and isn't known in advance) any method is only going to yield a probability that the collision would be found if it existed.

Note that if the method of generating Guids is clock based (see the RFC), then it may not be possible to determine if collisions exist because either (a) you won't be able to wait long enough for the clock to wrap round, or (b) you can't request enough Guids within a clock tick to force a collision.

Alternatively you might be able to show a statistical relationship between the bits in the Guid, or a correlation of bits between Guids. Such a relationship might make it highly probable that the algorithm is flawed without necessarily being able to find an actual collision.

Of course, if you just want to prove that Guids can collide, then a mathematical proof, not a program, is the answer.


15
[+8] [2010-05-31 18:05:07] Dad

I don't understand why no one has mentioned upgrading your graphics card... Surely if you got a high-end NVIDIA Quadro FX 4800 or something (192 CUDA cores) this would go faster...

Of course if you could afford a few NVIDIA Qadro Plex 2200 S4s (at 960 CUDA cores each), this calculation would really scream. Perhaps NVIDIA would be willing to loan you a few for a "Technology Demonstration" as a PR stunt?

Surely they'd want to be part of this historic calculation...


hmmmm..... I could run it on our 10,000 node grid at work. - AnthonyLambert
16
[+8] [2011-03-17 22:18:05] Jason Goemaat

But do you have to be sure you have a duplicate, or do you only care if there can be a duplicate. To be sure that you have two people with the same birthday, you need 366 people (not counting leap year). For there to be a greater than 50% chance of having two people with the same birthday you only need 23 people. That's the birthday problem [1].

If you have 32 bits, you only need 77,163 values to have a greater than 50% chance of a duplicate. Try it out:

Random baseRandom = new Random(0);

int DuplicateIntegerTest(int interations)
{
    Random r = new Random(baseRandom.Next());
    int[] ints = new int[interations];
    for (int i = 0; i < ints.Length; i++)
    {
        ints[i] = r.Next();
    }
    Array.Sort(ints);
    for (int i = 1; i < ints.Length; i++)
    {
        if (ints[i] == ints[i - 1])
            return 1;
    }
    return 0;
}

void DoTest()
{
    baseRandom = new Random(0);
    int count = 0;
    int duplicates = 0;
    for (int i = 0; i < 1000; i++)
    {
        count++;
        duplicates += DuplicateIntegerTest(77163);
    }
    Console.WriteLine("{0} iterations had {1} with duplicates", count, duplicates);
}

1000 iterations had 737 with duplicates

Now 128 bits is a lot, so you are still talking a large number of items still giving you a low chance of collision. You would need the following number of records for the given odds using an approximation:

  • 0.8 billion billion for a 1/1000 chance of a collision occurring
  • 21.7 billion billion for 50% chance of a collision occurring
  • 39.6 billion billion for 90% chance of a collision occurring

There are about 1E14 emails sent per year so it would be about 400,000 years at this level before you would have a 90% chance of having two with the same GUID, but that is a lot different than saying you need to run a computer 83 billion times the age of the universe or that the sun would go cold before finding a duplicate.

[1] http://en.wikipedia.org/wiki/Birthday_problem

17
[+7] [2009-11-10 16:04:33] AnthonyLambert

Aren't you all missing a major point?

I thought GUIDs were generated using two things which make the chances of them being Globally unique quite high. One is they are seeded with the MAC address of the machine that you are on and two they use the time that they were generated plus a random number.

So unless you run it on the actual machine and run all you guesses within the smallest amount of time that the machine uses to represent a time in the GUID you will never generate the same number no matter how many guesses you take using the system call.

I guess if you know the actual way a GUID is made would actually shorten the time to guess quite substantially.

Tony


(3) Not all GUIDs are created this way. Even if they were, Kai need only wait until the timestamp used to create the GUID wraps around enough times for one he used to create a GUID is used again. - Dour High Arch
(3) Guids have not been based on the mac address since 2000 or 2001. As of one of the service packs for NT4 and/or Win2k they changed the algorithm altogether. They are now generated by a random number generator, minus a few bits that identify what kind of guid it is. - KristoferA
(4) not all GUIDs come from windows platforms... - AnthonyLambert
OP mentions C#, so it's Windows. Besides, are V4 GUID's a Windows-only thing? - Steven Sudit
@Steven: OP mentions C#, so it's... one of several platforms that you can currently compile C# for (.NET, Mono, dotGNU, MonoTouch, ...) - R. Martinho Fernandes
(5) @Martinho: Ah, but Mono's unit test for Guid, in GuidTest.cs, contains a method which creates two new GUID's and checks them for equality, failing if they are equal. As Mono builds successfully, we can be absolutely certain that its GUIDs are unique! :-) - Steven Sudit
@Martinho: Humor aside, Mono uses a PRNG. search.koders.com/csharp/… - Steven Sudit
18
[+6] [2009-12-06 08:16:18] Michael Stum

You could hash the GUIDs. That way, you should get a result much faster.

Oh, of course, running multiple threads at the same time is also a good idea, that way you'll increase the chance of a race condition generating the same GUID twice on different threads.


19
[+6] [2010-08-25 18:04:46] Behrooz

GUIDs are 124 bits because 4 bits hold the version number.


the reason for not adding this as a comment:no one mentioned it, and i don't know who i should tell this to.:) - Behrooz
Hooooraaaay i did it.In some "real" app i wrote, I got a Guid collision in a table with ~260k Rows. (MSSQL 2008 R2 Express). - Behrooz
20
[+6] [2011-08-26 13:24:49] JiminP
  1. Go to the cryogenics lab in the New York City.
  2. Freeze yourself for (roughly) 1990 years.
  3. Get a job at Planet Express.
  4. Buy a brand-new CPU. Build a computer, run the program, and place it in the safe place with an pseudo-perpetual motion machine like the doomsday machine.
  5. Wait until the time machine is invented.
  6. Jump to the future using the time machine. If you bought 1YHz 128bit CPU, go to 3,938,453,320 days 20 hours 15 minutes 38 seconds 463 ms 463 μs 374 ns 607 ps after when you started to run the program.
  7. ...?
  8. PROFIT!!!

... It takes at least 10,783,127 years even if you had 1YHz CPU which is 1,000,000,000,000,000 (or 1,125,899,906,842,624 if you prefer to use binary prefix) times faster than 1GHz CPU.

So rather than waiting for the compute finished, it would be better to feed pigeons which lost their home because other n pigeons took their home. :(

Or, you can wait until 128-bit quantum computer is invented. Then you may prove that GUID is not unique, by using your program in reasonable time(maybe).


I was waiting for a super hero reference in this answer - fail by poster : p - awesome none the less. - IbrarMumtaz
21
[+4] [2009-11-10 02:57:50] RCIX

Have you tried begin = begin + new BigInteger((long)1) in place of begin++?


(2) nobody has voted for the answer that really answers the question :P - nawfal
22
[+4] [2011-03-28 23:51:07] Bill Yang

If the number of UUID being generated follows Moore's law, the impression of never running out of GUID in the foreseeable future is false.

With 2 ^ 128 UUIDs, it will only take 18 months * Log2(2^128) ~= 192 years, before we run out of all UUIDs.

And I believe (with no statistical proof what-so-ever) in the past few years since mass adoption of UUID, the speed we are generating UUID is increasing way faster than Moore's law dictates. In other words, we probably have less than 192 years until we have to deal with UUID crisis, that's a lot sooner than end of universe.

But since we definitely won't be running them out by the end of 2012, we'll leave it to other species to worry about the problem.


23
[+3] [2010-06-05 02:51:11] Mark Ransom

The odds of a bug in the GUID generating code are much higher than the odds of the algorithm generating a collision. The odds of a bug in your code to test the GUIDs is even greater. Give up.


24
[+2] [2010-05-31 13:07:10] ydebilloez

The program, albeit its errors, shows proof that a GUID is not unique. Those that try to prove the contrary are missing the point. This statement just proves the weak implementation of some of the GUID variations.

A GUID is not necessary unique by definition, it is highly unique by definition. You just refined the meaning of highly. Depending on the version, the implementator (MS or others), use of VM's, etc your definition of highly changes. (see link in earlier post)

You can shorten your 128 bit table to prove your point. The best solution is to use a hash formula to shorten your table with duplicates, and then use the full value once the hash collides and based on that re-generate a GUID. If running from different locations, you would be storing your hash/full key pairs in a central location.

Ps: If the goal is just to generate x number of different values, create a hash table of this width and just check on the hash value.


25
[+2] [2012-02-22 09:12:20] Ben

Not to p**s on the bonfire here, but it does actually happen, and yes, I understand the joking you have been giving this guy, but the GUID is unique only in principle, I bumped into this thread because there is a bug in the WP7 emulator which means every time it boots it gives out the SAME GUID the first time it is called! So, where in theory you cannot have a conflict, if there is a problem generating said GUI, then you can get duplicates

http://forums.create.msdn.com/forums/p/92086/597310.aspx#597310


26
[+1] [2010-11-16 00:25:20] realworldcoder

Since part of Guid generation is based on the current machine's time, my theory to get a duplicate Guid is:

  1. Perform a clean installation of Windows
  2. Create a startup script that resets the time to 2010-01-01 12:00:00 just as Windows boots up.
  3. Just after the startup script, it triggers your application to generate a Guid.
  4. Clone this Windows installation, so that you rule out any subtle differences that may occur in subsequent boot-ups.
  5. Re-image the hard drive with this image and boot-up the machine a few times.

27
[0] [2011-04-15 00:05:07] whardier

For me.. the time it takes for a single core to generate a UUIDv1 guarantees it will be unique. Even in a multi core situation if the UUID generator only allows one UUID to be generated at a time for your specific resource (keep in mind that multiple resources can totally utilize the same UUIDs however unlikely since the resource inherently part of the address) then you will have more than enough UUIDs to last you until the timestamp burns out. At which point I really doubt you would care.


28
[0] [2011-07-14 17:50:41] Scott

Here's a solution, too:

int main()
{
  QUuid uuid;
  while ( (uuid = QUuid::createUuid()) != QUuid::createUuid() ) { }
  std::cout << "Aha! I've found one! " << qPrintable( uuid.toString() ) << std::endl;
}

Note: requires Qt, but I guarantee that if you let it run long enough, it might find one.

(Note note: actually, now that I'm looking at it, there may be something about the generation algorithm that prevents two subsequently generated uuids that collide--but I kinda doubt it).


29
[0] [2012-04-01 07:45:35] nawfal

The only solution to prove GUIDs are not unique would be by having a World GUID Pool. Each time a GUID is generated somewhere, it should be registered to the organization. Or heck, we might include a standardization that all GUID generators needs to register it automatically and for that it needs an active internet connection!


30