share
Stack OverflowDefend zero-based arrays
[+89] [55] DrJokepu
[2008-12-26 04:12:05]
[ arrays language-agnostic discussion ]
[ http://stackoverflow.com/questions/393462] [DELETED]

A question asked here recently [1] reminded me of a debate I had not long ago with a fellow programmer. Basically he argued that zero-based arrays should be replaced by one-based arrays since arrays being zero based is an implementation detail that originates from the way arrays and pointers and computer hardware work, but these sort of stuff should not be reflected in higher level languages.

Now I am not really good at debating so I couldn't really offer any good reasons to stick with zero-based arrays other than they sort of feel like more appropriate. I am really interested in the opinions of other developers, so I sort of challenge you to come up with reasons to stick with zero-based arrays!

You could always use fortran... - dmckee
Not sure why this question is getting downvoted... - cletus
(2) +1 from me. No reason to downvote, it's a pretty valid discussion I think. - Renaud Bompuis
(7) Well, people upvote when they LIKE a question, which means they downvote when they don't like a question. Which is anti-intellectual, but who expects anything more? - Yar
(7) I just don't vote a question up if I don't like it. - Brad Gilbert
(1) I got used to the fact that on Stackoverflow, answers and questions get downvoted immediately without any reason whatsoever. There are many strange people on the Internet :) Anyway, thanks for the excellent answers. - DrJokepu
Thanks for the question. It prompted quite good answers. +1... - paercebal
In an n-element array, the element 'n' is not present . A n-element array has members which number from 0 to n-1 only. So, isn't it better if we have an array starting from 1 and so an n-element array actually represents the n elements present in the array. - Karthik Balaguru
[+88] [2008-12-26 13:57:48] Bill the Lizard [ACCEPTED]

I don't think any of us can provide a stronger argument than Edsger W. Dijkstra's article "Why numbering should start at zero" [1].

[1] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

He did bring up some stats and use math style proof. But I'm sure someone could still argue. Though I approve so I wouldn't. - Loki
(5) Dijkstra's article is about style, but then, his arguments are about simplicity and ease of use... +1. - paercebal
1
[+65] [2008-12-27 00:31:30] paercebal

Authority argument

Well... Apparently, most languages, including very recent ones, are zero-based. As those languages were written by quite skilled people, your friend must be wrong...

Why one?

why 1 would be a better starting index than zero? Why not 2, or 10? The answer itself is interesting because it shows a lot about the though process of the people defending the idea.

The first argument is that it's more natural, because the 1st is usually the one before all others, at least, for the majority of people...

The number-one argument is that the last index is also the size of the array...

I'm still impressed by the "quality" of the reasons I usually hear for this kind of arguments... And even more when I'm reminded that...

Why not zero?

... "One-based" notations are left-overs from the western culture that ignored the existence of zero for centuries, if not more.

Believe it or not, the original gregorian calendar goes from -3, -2, -1, 1, 2, 3... Try to imagine the problem it contributed to western science (for example, how many years from 1st January -2 to 1st January 2 to see than the original gregorian calendar conflicts with something as simple as substraction...).

Keeping to one-based arrays is like (well, I'll be downmodded for that... ^_^ ...), keeping to miles and yards in the 21th century...

Why Zero? Because it's math!

First (OOops... Sorry... I'll try again)

Zero, Zero is nothing, one is something. And some religious texts hold that "At the beginning, there was nothing". Some computer-related discussion can be as burning as religious debates, so this point is not so out of topics as it seems... ^_^

First, It's easier to work with a zero-based array and ignore its zero-th value than work with one-based array and hack around to find its zero-th value. This reason as almost as stupid as the previous, but then, the original argument in favor of one-based arrays was quite a fallacy, too.

Second, Let's remember that when dealing with numbers, chances are high you'll deal with math one moment or another, and when you deal with math, chances are good you are not in the mood for stupid hacks to get around obsolete conventions. The One-based notation plagued maths and dates for centuries, too, and by learning from our mistakes, we should strive to avoid it in future oriented sciences (including computer languages).

Third, As for computer language arrays being tied to hardware, allocate a C array of 21 integers, and move the pointer 10 indices to the right, and you'll have a natural [-10 to 10] array. This is not natural for hardware. But it is for maths. Of course, math could be obsolete, but the last time I checked, most people in the world believed it was not.

Four, As already pointed elsewhere, even for discrete position (or distances reduced to discrete values), the first index would be zero, like the floor in a building (starting at zero), the decreasing countdown (3, 2, 1, ZERO!), the ground altitude, the first pixel of an image, the temperature (zero Kelvin, for the absolute zero, or zero centigrade degrees, as water freezing temperature of 273 K). In fact, the only thing that really starts with one is the traditional way of "first, second, third, etc." iteration notation, which leads me naturally to the next point...

Five the next point (which naturally follows the previous) is that high-level containers should be accessed, not by index, but by iterators, unless the indices themselves have an intrinsic value. I'm surprised your "higher-level-language" advocate did not mention that. In the case the index itself is important, you can bet half the time you have a math-related question in mind. And thus, you'd like your container to be math-friendly, and not math-disabled like "thy olde gregorian calendar" starting at 1, and needing regurgitated hacks to make it work.

Conclusion

The argument given by your fellow programmer is a fallacy because it needlessly ties spoken/written language habits, which are, by nature, blurry, to computer languages (where you don't want your instruction blurred), and because by attributing wrongly an hardware reason to this problem, he.she hopes to convince you, as languages go higher and higher in abstraction, that the zero-based array is a thing of the past.

Zero-based arrays are zero-based because of math-related reasons. Not for hardware-related reasons.

Now, if this is a problem to your fellow programmer, have him start to program with real high level constructs, like iterators and foreach loops.


zero centigrade degrees is 273,15K ;) - Pim Jager
(1) I know (I have a Master diploma on Physics), but I felt playing with decimals was less the point than the humorous side I tried to color my arguments with... ^_^ ... - paercebal
(11) +1 for mocking the miles/yards/feet/inches system - hasen j
(10) Your paragraphs are labelled "Zero, First, Second, Third, Four, Five". For consistency, you ought to either use cardinal numbers ("Zero, One, Two, Three, Four, Five") or ordinal numbers ("Zeroth, First, Second, Third, Fourth, Fifth"). :-) - ShreevatsaR
(6) Similarly, for the first year of our lives, we are not one year old but zero years old - ChrisV
(2) Amazing! So many passionate arguments, and not a single convincing one. While a single link by Bill the Lizard explains it all. - Nikita Rybak
(2) @Nikita Rybak: What is amazing is that you missed what was seen by all commentators before you: Of course Bill the Lizard's answer is the right one. This is why I voted him a +1, and this is why it was chosen as the question's best answer. My answer is more about making fun of the fallacious reasons behind 1-based arrays, and offering concrete cases where a 1-based array would be a nuisance. Still, I'm surprised you found "not a single convincing one", even considering the reasons are mixed with irony... - paercebal
@paercebal, what a perfect answer for such a subjective question that could never be asked here anymore. - Kirk Woll
2
[+46] [2008-12-27 00:36:45] FlySwat

The Mayans did not have a concept of zero.

Because of that, all of their arrays started at 1.

You can see how well that worked out in the long run.


(33) They also had no valid return code for their C functions. Coincidence? I think not. :) - Bill the Lizard
(6) So if I'm understanding this correctly, all of their conditionals in C would have evaluated to true? - Graphics Noob
(6) The Romans didn't either - hasen j
(1) Nitpicking comment: According to Wikipedia ( en.wikipedia.org/wiki/Zero#History_of_zero ), the Mayans did have a zero... Meaning that even if one is right about that, one can end quite wrong anyway... ^_^ ... - paercebal
3
[+36] [2008-12-26 07:30:56] Norman Ramsey

Half-open intervals compose well. If you're dealing in 0 <= i < lim and you want to extend by n elements, the new elements have indices in the range lim <= i < lim + n. Working with zero-based arrays makes arithmetic easier when splitting or concatenating arrays or when counting elements. One hopes the simpler arithmetic leads to fewer fencepost errors.


+1 for half-open intervals - it just makes everything easier. - Eclipse
4
[+26] [2008-12-26 05:12:05] Jay Bazuzi

Certain types of array manipulation get crazy complicated with 1-based arrays, but remain simpler with 0-based arrays.

I did some numerical analysis programming at one point. I was working with algorithms to manipulate compressed, sparse matrices, written in both FORTRAN and C++.

The FORTRAN algorithms had a lot of a[i + j + k - 2], while the C++ had a[i + j + k], because the FORTRAN array was 1-based, while the C++ array was 0-based.


I agree. The only moment in which I find a 1-based array useful is when I want to make room for a null-item index. For instance, if I have an array of objects and use their indexes as handles, and want to have a null handle. - Fabio Ceconello
I've hit the unnecessary complication of 1 based arrays too, 0 based arrays have, in my limited experience, always produced clearer code for array indexing with the rare exception. - Eric Scrivner
How would the FORTRAN and C++ indices differ by 2 if their respective indices are offset only by 1? Also, why minus 2? If FORTRAN is 1-based, then wouldn't you add 2 (or 1)? - RexE
@RexE: That's how it works, and that's why it's so complicated with 1-based arrays. - Jay Bazuzi
@RexE: Suppose you emulate a 3d array with a flat one. Then, in 0 base, element (0 0 0) corresponds to element 0 in the flat array. OTOH, if it is 1-based, element (1 1 1) corresponds to element 1 in the flat array: 1+1+1-2. - rlbond
Good points, but how many times have you had to do n-1 when accessing an array? More than a+b+c-2 i bet... - RCIX
5
[+26] [2008-12-26 22:38:27] RoadWarrior

I feel that my proposal for 0.5-based arrays has been unjustly dismissed without due consideration.


(5) Well if you are going to unfairly just ignore complex numbers. - Martin Beckett
(1) I'm amazed no-one has made these points before but 0.5 is clearly a rational choice - complex numbers are irrational and unnecessarily, well, complex. As for 0 and 1, even the mathematicians can't seem to decide which of those is the first natural number, so why not compromise by taking the average? - Steve314
Give credit where credit is due. This is Peter van der Liden's joke. amazon.com/Expert-Programming-Peter-van-Linden/dp/0131774298 - Nicholas Palko
i is not irrational! - naught101
@naught101: i is not irrational! : ... but π is... - paercebal
6
[+17] [2008-12-27 13:14:39] community_owned

The index in an array is not really an index. It is simply an offset that is the distance from the start of the array. The first element is at the start of the array so there is no distance. Therefore the offset is 0.


(2) For most languages that get designed nowadays, this is really an implementation detail, which shouldn't appear in the language (except when there are other better reasons to do so) - Jens Schauder
7
[+14] [2008-12-26 04:31:01] Renaud Bompuis

The reasons are not just historical: C and C++ are still around and widely used and pointer arithmetic is a very valid reason for having arrays start at index 0.

For other languages lacking pointer arithmetic, whether the first element is at index 0 or 1 is more of a convention rather than anything else.
The problem is that languages that use index 1 as their first element don't exist in a vacuum and usually have to interact with libraries that are often written in -you guessed it- C or C++...

VB and its derived flavours have suffered from having arrays start either at 0 or 1 and it's been a source of problems for a long time.

Bottom-line is: it doesn't matter what your language consider the first element index as long as it's consistent throughout. The problem is that considering 1 as a first index makes it harder to work with in practice.


Agreed. Consistency matters, and unless you have the luxury of avoiding low-level code (including C/C++) entirely then working with 1-based arrays is just asking for trouble. - Shog9
While we're at it, a question: do you ever use low-level code in a non-platform specific way? In other words, you're always on one platform or another and you have to know which, right? - Yar
(1) As someone who thinks VB .NET is usually unfairly maligned, I have to say that the VB .NET practice on arrays is awful. They split the difference and made it even more confusing: Arrays start at 0, but Dim a as Integer(5) creates an array with 6 positions. The rationale seemed to be that having an extra position was better than having bugs from addressing past the array's length. Unfortunately on that (and other issues, like And and Or being bitwise) they buckled to demands from a lot of VB6 programmers who didn't end up using VB .NET anyway. - Kyralessa
(1) @Kyralessa: No, the rationale was to have backwards compatibility to VB6 (automatic upgrade assistant …) even though they were well aware that the notation is counter-intuitive and error-prone. On the other hand, And and Or being bitwise has got nothing to do with VB6, it’s the only logical solution for a VB-type language. You do have AndAlso and OrElse for your logical operations. - Konrad Rudolph
And and Or being bitwise has everything to do with VB6, because they were bitwise in VB6. The ugly operators AndAlso and OrElse should've been made bitwise, since bitwise operations are far less common than logical ones. There are a lot of ugly warts like that left on the language due to "backwards compatibility", like the fact that ByVal is plastered all over the place even though it's the default. - Kyralessa
8
[+13] [2008-12-26 04:33:30] Joshua

I have dealt with both. I find zero based arrays reduce fencepost errors.


Wow. I didn't even know they had a name. I feel the same way. I remember in Visual Basic I was always like, "this code will work, but if not, I'm just one off." In Java I've never had that feeling. - Yar
fencepost errors === off by one errors - Brad Gilbert
@Brad: Thanks, I didn't know that. Pretty sweet they have a name. - Pim Jager
9
[+10] [2010-07-25 15:13:44] Igor Oks

In computer science, zero is usually used as the starting point [1]. See the advantages [2] of the zero-based numbering on the same page.

[1] http://en.wikipedia.org/wiki/Zero-based_numbering
[2] http://en.wikipedia.org/wiki/Zero-based_numbering#Advantages

10
[+8] [2008-12-26 07:11:27] Eduardo León

If you use zero-based arrays, the array's length is the set of the valid indices. At least, that's what Peano arithmetic says:

0 = {}
1 = 0 U {0} = {0}
2 = 1 U {1} = {0,1}
3 = 2 U {2} = {0,1,2}
...
n = n-1 U {n-1} = {0,1,2...n-1}

So it's the most natural notation, in a sense.


11
[+7] [2008-12-26 07:05:12] diclophis

zero based arrays are like herpes

  1. they won't kill you
  2. they have been around forever
  3. just deal with it, and get off my lawn

${old man}++ (old man reference, get it) - Brad Gilbert
Lol (stupid 10 char limit) - Pim Jager
12
[+6] [2010-07-25 15:14:06] Anders K.

Because there is a strong correlation between arrays and pointers in C

char* p = "hello";
char q[] = "hello";

assert(p[1] == q[1]);

assert(*p == *q)

*p is the same as *(p + 0)

having a starting index of 1 will give you headache later


13
[+6] [2010-07-25 16:31:22] Steve Jessop

Is it not possible for an array to start from 1.

It would have been logically possible for the standard to be written to specify that. But it wasn't written that way, it doesn't specify that, so there's no way to do that in C. Of course you can write wrapper functions and so on, which allow you to call the first element of an array 0, 1, or even 'A' if you like.

Why is it not possible for an array to start from 1

Well, Perl has a global magic variable that you can set to say what number arrays start from. Setting it to anything other than 0 (the default) is generally considered a big mistake. It's really not something you want to be configurable, because all that could lead to is confusion.

and why is the standard not getting revised to start the array from 1 ?

The language nearly 40 years old. The standard is over 20 years old, and makes a point of preserving backward-compatibility whenever possible (that is, the vast majority of valid C89 programs are also valid C99 programs with the same meaning). It's a bit late now to make a change would would break nearly every single C program which has ever been written.

As for why 0 was chosen over 1 in the first place - a lot of people feel that it's more natural for the index to correspond to "the offset from the start of the array", rather than "the ordinal number of the array element in sequence". In C, a pointer to an array refers to the same address as a pointer to the array's first element, so the offset of the first element necessarily is 0. You don't have to agree with them that it's more natural, but you do have to live with their decision.

This is even aside from the fact that a lot of mathematicians (and hence computer scientists and programmers) prefer to count from 0 anyway. So for example the countable infinity is ℵ0, not ℵ1. In fact ℵ1 is defined to be the least cardinal greater than ℵ0.


Though it's pronounced aleph-null, not aleph-zero, isn't it? - TRiG
@TRiG: that's how I pronounce it, which suggests to me that it's also how Oxford set theory lecturers pronounce it. "Null" is just Latin for "zero" anyway :-) - Steve Jessop
14
[+5] [2008-12-26 04:22:34] cletus

Zero-based arrays have their roots in C and even assembler. With C, pointer math basically works like this:

  • Each element of an array occupies a certain number of bytes. A 32 bit integer is (obviously) 4 bytes;
  • The address of an array is occupied by the first element of the array with subsequent elements in equal-sized contiguous blocks after that.

To illustrate, assume int a[4] is at 0xFF00, the addresses are:

  • a[0] -> 0xFF00;
  • a[1] -> 0xFF04;
  • a[2] -> 0xFF08;
  • a[3] -> 0xFF0C.

So, with zero based indices, the addres math is simple:

Address of element = Address of array + index * sizeof(type)

In fact the expressions in C are all equivalent:

  • a[2];
  • 2[a]; and
  • *(a+2).

With one-based arrays, the math is (ever so) slightly more complicated.

So the reasons are largely historical.


(1) The original question already states "arrays being zero based is an implementation detail that originates from the way arrays and pointers and computer hardware work, but these sort of stuff should not be reflected in higher level languages." - poulejapon
Worth mentioning that languages that allow N-based arrays typically generate code with array 'offsets' automatically calculated at zero runtime cost. - Roddy
15
[+4] [2008-12-26 04:33:08] poulejapon

Code including some original position/relative position information are much cleaner with arrays starting at 0.

For instance : The code to copy a vector at a defined position in a bigger vector is a pain with arrays starting at 1 :

function copyAtPos (dest, vect, i):
    for i from 1 -> vect.length do
        dest[pos+i-1] = vect[i]

By opposition with arrays starting at 0:

function copyAtPos (dest, vect, i):
    for i from 0 -> vect.length-1 do
        dest[pos+i] = vect[i]

If you begin writing big convolutions formula, it becomes a must.


(1) Shouldn't the second example be "for i from 0 ..."? - Jules
Yes, the second loop should absolutely start from 0, I have updated the answer. - hlovdal
16
[+4] [2008-12-26 13:32:49] ima

Defend one-based arrays


I believe the questioner is saying they are "natural." We use one-based arrays when we count sheep, for instance. - Yar
(2) Common sense is poor guide there - ima
common sense isnt so common. ima - if i could upvote you, i would - community_owned
(1) OK, to be devils advocate here, if you're an applied mathematician, you will much prefer 1-base, because that is how linear algebra is done. So if you want to code up a Choleski decomposition, and get it right, you don't want to have to convert to 0-base. - Mike Dunlavey
(6) "We use one-based arrays when we count sheep, for instance". This is not true. Because before counting the first sheep, we have 0 sheep... - Joepie
(3) @Joepie, yet returning the 0th sheep gives you a sheep. - Pool
17
[+4] [2008-12-26 07:28:54] Nietzche-jou

A heap is one example of the advantages to 1-based arrays. Given an index i, the index of i's parent and left child are

PARENT[i] = i ÷ 2

LCHILD[i] = i × 2

But only for 1-based arrays. For 0-based arrays you have

PARENT[i] = (i + 1) ÷ 2 - 1

LCHILD[i] = (i + 1) × 2 - 1

And then you have the property that i is also the size of the sub-array to that index (i.e. indices in the range [1,i]).

But in the end it doesn't matter, because you can make a 0-based array into a 1-based array by allocating one more element than normal, and ignoring the zeroth. Thus you can opt-in to get the benefits of 1-based arrays when appropriate, and keep the 0-based arrays for cleaner arithmetic in almost all the other situations.


18
[+4] [2008-12-26 21:54:31] Roddy

As a 10+yr C/C++ programmer, with a very strong background in Pascal and Delphi, I still miss Pascal's strong array bound and index type checking, and the flexibility and safety that comes with it. An obvious example of this is an array data holding values for each month.

Pascal:

 Type Month = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec);

  Var Days[Month] of integer;

  ... 
  if Year mod 4 = 0 then // yes this is vastly simplified for leap years and yes i don't know what the comment marker is in pascal and no i won't go look it up
    Days[Feb] := 29
  else
    Days[Feb] := 28;

Writing similar code in C languages without using +/-1's or 'magic numbers' is pretty challenging. Note that expressions like Days[2] and Days[Jan+Dec] simply won't compile, which can appear brutal to people who still think in C or Assembler.

I have to say there are many aspects of Pascal/Delphi languages that I don't miss a bit, but C zero-based arrays do seem just "dumb" by comparison.


It might be worth noting that your algorithm isn't correct for the year 2100. en.wikipedia.org/wiki/Leap_year#Algorithm - Mark A. Nicolosi
(2) I know ;-) However, it was correct for the year 2000. I'm just playing "spot the pedant"... - Roddy
Spot the pedant! LOL. - jcollum
Yes. Avoid the whole issue, base the array however you want to. - Loren Pechtel
I wouldn't be surprised if your average Pascal compiler assigns Jan = 0, Dec = 11 when generating the machine code :-) - Christian Hayter
Yes, use Pascal, then you can argue about the style guide that declares arrays must always start at 0. Or 1. Seriously though, I miss it too (sometimes). - Mark Ransom
19
[+4] [2010-07-25 15:15:20] Dennis Munsie

The reason it starts at 0 and not 1 is that the you can think of the offset as how far from the beginning of the array's memory is this element. It's not saying give me the 0th element -- it's saying, give me the element who is 0 elements from the start.

Another way to look at it is that these are (for the most part) equivalent:

array[n]

*(array + n)

The reason the standard won't ever be changed is because C has been around for about 40 years now. There is no compelling reason to change it and if they did, all of the existing code that depends on the start of the array being at 0 would be broken.


In fact, you can rewrite array[n] as n[array] in C. It's not a good idea to do it, it's confusing! But it's legal (well, at least up to C89) because of that identity above and the fact that addition is commutative. - Donal Fellows
That is a crazy way to write that -- something that if you saw in any code that you had to maintain would be a huge warning sign. Thankfully I haven't run across that... yet :) - Dennis Munsie
20
[+3] [2010-07-25 15:14:27] Borealid

Why do you want arrays to start at one?

When you say a[x][y], the compiler translates this into: a+(x*num_cols+y). If arrays started at one, this would become a+(x*num_cols+y-1). This would be an extra arithmetic operation every single time you want to access an array element. Why would you want to slow down programs?


(1) actually, it would have to become a + ((x - 1) * num_cols) + y - 1) -- both x and y would start from 1. - Dennis Munsie
21
[+2] [2010-06-17 19:51:36] enjerth

Using 1-based arrays, transform a single-dimension array into a multi-dimensional array:

int w = 5, h = 5, d = 5;

int[] a1 = new int[w * h * d], new a2 = int[w,h,d];

for (int z = 1; z <= d; z++)

  for (int y = 1; y <= h; y++)

    for (int x = 1; x <= w; x++)

      a1[x + (y - 1) * w + (z - 1) * h] = a2[x,y,z];

Note that your y and z indexes are 0-based (y - 1, z - 1) even when your array is 1-based. Under some circumstances, you can't avoid 0-based indexes. For consistency, why not always use 0-based indexes?


22
[+2] [2009-06-07 15:03:45] Strilanc

Have you ever been annoyed by "20th century" actually referring to the 1900s? Well, it's a good analogy for the tedious things you deal with all the time when using 1-based arrays.

Consider a common array task like the .net IO.stream read method:

int Read(byte[] buffer, int offset, int length)

Here is what I suggest you do to convince yourself 0-based arrays are better:

In each indexing style, write a BufferedStream [1] class that supports reading. You may change the definition of the Read function (eg. use a lower bound instead of an offset) for the 1-based arrays. No need for anything fancy, just make it simple.

Now, which one of those implementations is simpler? Which one has +1 and -1 offsets sprinkled here and there? That's what I thought. In fact I would argue that the only cases where the indexing style doesn't matter is when you should have used something that wasn't an array, like a Set.

[1] http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx

It's a bad analogy confusing integer logic with floating point. - Pool
23
[+2] [2010-07-25 15:15:59] Gian

It's because of how arrays are constructed. It doesn't make a lot of sense for them to start from one. An array is a base address in memory, a size, and an index. To access the nth element, it's:

base + n * element_size

So 0 is obviously the first offset.


24
[+2] [2008-12-26 22:03:52] J.F. Sebastian

A classic article (1982) on the subject is Why numbering should start at zero (EWD 831) [1] by Edsger W. Dijkstra.

[1] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

I learned of that argument from one of my college professors. I did not know the argument was due to Dijkstra. Thank you. - Jason
25
[+2] [2008-12-26 20:18:30] community_owned

00:00:59 is the FIRST minute in an hour


And 0.99 is part of the first unit. That analogy fails. - Pool
26
[+2] [2008-12-27 00:35:01] Loki

Personally, the one argument is when seeing array indexes as offsets. It just makes sense.

One could say that its the first element, but the offset of the first element relative to the origin of the array is zero. As such, taking the array origin and adding zero will yield the first element.

So in computing its easier to add zero to find the first element than to add one and then remove one.

I think anyone who did some lower level stuff always think the base zero way. And the people who are beginning or used to higher level often not-algorithmic programming might wish for a base one system. Or maybe we are just biased by past experiences.


Exactly - it's basically a convention lying around from lower level languages. - Pool
27
[+2] [2008-12-26 23:49:19] MkV

If you were in somewhere other than America, the ground level of a building is the ground floor and the floors above it start at 1. Zero based arrays therefore seem more natural to non-Americans I guess, heh.


Interesting :). But not correct I think : in some countries, the first level of a building have a different name (as in "rez-de-chaussée" in France while the other floors are called "étages" - the first "étage" is thus the 2nd floor). Conclusion, even in France, levels are 1-based - there is no level 0 :) - Sylvain
(1) that would be my point, different name, since 0th floor or 0 etage would be awkward, but sitting on the 3rd level you are on floor[2], so below that is floor[1] and below that is floor[0]. - MkV
28
[+2] [2008-12-26 04:21:32] Yar

My feeling is that it's completely arbitrary. There's nothing special about zero- or one-based arrays. Since liberating myself from Visual Basic (mostly, sometimes I do tiny things in Excel) I haven't worked with 1-based arrays, and... it's the same. The fact is that if you need the third element of the array, it's just an implementation detail that it is called 3 or 2. However, 99% of the work you do with arrays is only interested in two absolute points: the first element and the count or length. Again, it's just an implementation detail that the first element is called zero instead of one, or that the last element is called count-1 or, instead, count.

Edit: Some of the answerers have mentioned that 1-based arrays are more prone to fencepost errors. In my experience, thinking about it now, this is true. I remember thinking, in VB, "this will either work or will blow up because I'm off by one." In Java that never happens. Though I thought I was getting better, some of the answerers point out cases in which 0-based arrays result in nicer arithmetic, EVEN when you don't have to deal with a lower-level lang.


In PHP, most search inside string function return FALSE on not found. Not -1. - jmucchiello
You are confusing the count with the index of the last element. The count of an empty array is always 0, whether you use zero-based or one-based arrays. The advantage of one-based arrays is that the count is the position of the last element (but that's about the only advantage). - Jules
True regarding both of these points: last half deleted, since it is the same for zero-based or one-based arrays: count is 0 if you've got zero elements. - Yar
I meant the last half of my answer... - Yar
29
[+1] [2008-12-26 05:02:32] Colin

There are actually several different ways to implement this:

  • 0-based array indexes
  • 1-based array indexes
  • either 0 or 1 based arrays (like VB 6.0... this is truly horrible)

Ultimately, I don't think it matter whether a language uses 0 or 1 based arrays. But, I think the best choice is to use 0-based arrays, for the simple reason that most programmers are used to that convention, and it is consistent with the vast majority of already written code.

The only way you can really go wrong, though, is to be inconsistent like Visual Basic. The code base that I am currently maintaining is split between 0 and 1 based arrays; and it is exceedingly difficult to figure out which is which. This leads to an annoyingly verbose for loop:

dim i as integer, lb as integer, ub as integer
lb = LBound(array)
ub = UBound(array)
for i = lb to ub
       '...
next

hahaha I remember that, man that sucked... - Yar
I think I remember even having arrays starting with negative numbers. Just one of the many reasons I stay away from VB. - Brad Gilbert
30
[+1] [2008-12-26 07:51:47] DarenW

It is just so, and has been for many years. To change it, or even to debate it, is just as pointless as to change or debate changing traffic lights. Let's make blue=stop, red=go.

Look into changes made over time in Numerical Recipes for C++. They had used macros to fake 1-based indexing, but in the 2001 edition gave up and joined the herd. There may be enlighting material on the reasons behind this at their site www.nr.com

BTW, also annoying is the variants of specifying a range out of an array. Example: python vs. IDL; a[100:200] vs a[100:199] to get 100 elements. Just gotta learn the quirks of each language. To change a language that does it one way to match the other would cause such cussing and gnashing of teeth, and not solve any real problem.


31
[+1] [2008-12-26 21:28:42] Lasse V. Karlsen

At this point, it doesn't matter.

Arrays in many languages are zero-based.

Either live with it.

Or don't use those languages.

You can argue that the arrays should have been 1-based or whatever, but they're not.

What you can do is get famous and good enough to be part of the design group of the next big thing, and thus you get to have your say.

Other than that...

Tough luck.


As for why zero-based arrays are usually used, it's because otherwise you'd have to add an instruction to adjust for the 1-bias to the compiled code, and thus the code would potentially be slower.


32
[+1] [2009-01-02 17:26:55] Mike Dunlavey

It is hard to defend 0-base without programming a lot of array-based code, such as string searching and various sorting/merging algorithms, or simulating multi-dimensional arrays in a single-dimension array. Fortran is 1-based, and you need a lot of coffee to get this kind of code done right.

But it goes way beyond that. It is a very useful mental habit to be able to think about the length of something rather than the indices of its elements. For example, in doing pixel-based graphics, it is much clearer to think of coordinates as falling between pixels rather than on them. That way, a 3x3 rectangle contains 9 pixels, not 16.

A little more far-fetched example is the idea of look-ahead in parsing, or in printing sub-totals in a table. The "common-sense" approach says 1) get the next character, token, or table row, and 2) decide what to do with it. The look-ahead approach says 1) assume you can see it, and decide if you want it, and 2) if you do want it, "accept" it (which allows you to see the next one). Then if you write out the pseudo-code, it is much simpler.

Still another example is how to use "goto" in languages where you have no choice, such as MS-DOS batch files. The "common-sense" approach is to attach labels to blocks of code to be done, and label them as such. Often a better approach is to put labels at the ends of blocks of code, for the purpose of skipping over them. This makes it "structured" and much easier to modify.


33
[+1] [2009-06-07 14:04:35] Sylvain

The only two (very) serious reasons to used 0-based indices instead of 1-based indices seem to avoid reeducating a lot of programers AND for backward compatiblity.

I didn't see any other serious arguments against 1-based indices in all the answers you received.

In fact, indices are naturally 1-based, and here is why.

First, we must ask : Were does arrays come from ? Do they have real-world equivalents ? The answer is yes : they are how we modelize vectors and matrix in computer science. However, Vectors and matrix are mathematicals concepts that were using 1-based indices before the computer-era (and that still mostly use 1-based indices nowaday).

In the real world, indices are 1-bases.

As Thomas said above, languages that used 0-bases indices are in fact using offsets, not indices. And developers who are using these languages think about offsets, not indices. This would not be a problem if things were clearly stated, but they are not. A lot of developers using offsets still talk about indices. And a lot of developers using indices still don't know that C, C++, C#, ... use offsets.

This is a wording problem.

(Note about Diskstra's paper - It says exactly what I have said above : mathematician do use 1-based indices. But Diskstra think that matematicians should not use them because some expression would then be ugly (eg.: 1 <= n <= 0). Well, not sure he is right on that one - doing such a paradigm shift in order to avoid those exceptional empty sequences seems a lot of trouble for a little result...)


(1) Mathematicians don't always use 1-based indices. I've seen x0 used plenty of times for the initial value of a sequence. It depends on whichever is more convenient. - Adam Crume
34
[+1] [2008-12-27 11:33:15] Laserallan

I prefer 0 based index since since modulo (and the AND operator when used for modulo) always returns 0 for some values.

I often find myself using arrays like this:

int blah = array[i & 0xff];

I often get that kind of code wrong when using 1 based indices.


35
[+1] [2010-07-25 15:15:43] Praveen S

It is possible if you take care while writing your "own" code. You can assume your index starts from n for all n>=0 and program accordingly.

Regarding standard, Borealid has a great argument.


36
[+1] [2010-07-25 16:10:03] LostMohican

because array names are constant pointers to array starting positions. For example In C array[2] is turned into array + (sizeof(array)*2), which will give you two elements beyond the starting element(third element:)). so if you want to reach the starting element, with the same math, you should do

array + (sizeof(array)*i) = array

(sizeof(array)*i) = 0

i = 0

simple equation math.


37
[+1] [2009-11-23 22:24:33] Jeff G

I prefer 0-based arrays because, as mentioned by others, it makes math easier. For example, if we have a 1-dimensional array of 100 elements emulating a 10x10 grid, then what is the array index i of the element in row r, col c:

0-based: i = 10 * r + c
1-based: i = 10 * (r - 1) + c

And, given the index i, going back to the row and column is:

0-based: c = i % 10
         r = floor(i / 10)
1-based: c = (i - 1) % 10 + 1
         r = ceil(i / 10)

Given that the math above is clearly more complex when using 1-based arrays, it seems logical to choose 0-based arrays as the standard.

However, I think that someone could claim that my logic is flawed because I assume that there would be a reason to represent 2D data in a 1D array. I have run into a number of such situations in C/C++, but I must admit that needing to perform such computations is somewhat language dependent. If arrays truly performed all index math for the client, all the time, then the compiler could simply convert your M-based array accesses to 0-based at compile-time and hide all of these implementation details from the user. In fact, any compile-time constant could be used to do the same set of operations, although such constructs would probably just lead to incomprehensible code.

Perhaps a better argument would be that minimizing the number of array index operations in a language with 1-based arrays would require that integer division be performed using the ceiling function. However, from a mathematical perspective, integer division should return d remainder r, where d and r are both positive. Therefore, 0-based arrays should be used to simplify math.

For example, if you are generating a lookup table with N elements, the nearest index prior to the current value into the array for value x would be (approximately, ignoring values where the result is an integer prior to rounding):

0-based with floor: floor((N - 1) * x / xRange)
1-based with floor: floor((N - 1) * x / xRange) + 1
1-based with ceil : ceil ((N - 1) * x / xRange)

Notice that if the standard convention of rounding down is used, 1-based arrays require an additional operation, which is undesirable. This kind of math cannot be hidden by the compiler, as it requires lower-level knowledge about what is happening behind the scenes.


It's a good reason until you have higher level languages supporting multi-dimension arrays. - Pool
38
[+1] [2010-02-16 21:17:08] Brian Chandler

I'm betting the programmer was just annoyed with the counter-intuitiveness of a 0 based array in day to day thinking and was arguing for a more intuitive means of describing arrays. I find it ironic that as humans we spent so much time to come up with "classes" so that we could describe things in a more human way in our code, but then when looking at 0 vs 1 arrays we seem to get hung up on the logic of it alone.

As far as the computer is concerned and mathematically 0 is going to probably be better, but I feel a point is being missed here. If we wanted to describe things in a more human way (e.g. classes), why wouldn't we want the same for other parts of the language? Is that not equally logical or valid (or take higher priority for that matter...) to make a language more easily understandable and usable to humans and thus, by extension, less prone to scenarios that tend to create logic-bugs and more prone to faster production of a usable creation. PHP Example:

array(1 => 'January', 'February', 'March');

gives a 1 based array per our request.

Why not have the norm:

array('January', 'February', 'March');

And the exception be:

array(0 => 'Value for scenario where 0 *has* to be used as the key',
      'value2', 'value3');

In the case of say PHP, my bet is 80% of the time a 1 based array being the default syntax-wise would decrease logic-bugs in real world use cases, or at least not cause more on average, while making it a lot easier on the coder to produce usable code quicker. Remember, I'm assuming there would still be the option of, array(0 => 'value') for when it's needed, but also assuming the majority of the time it's practical to have something closer to a real world description.

This really doesn't sound too far fetched of a request when looking at it from that perspective. When approaching an interface, be it an OS or a language for a programmer, the closer to human thinking and habits we design it around, the happier in most cases we will be and the less misunderstandings between the human and the computer(human logic-bugs), and the faster production, etc. we will have. If 80% of the time in the real world I describe things with 1 when making lists or counting, then the computer should ideally interpret my meaning into a way it understands with as little information or change from my normal way of describing something as possible. In short the closer we can model the real world, the better quality the abstraction. So what he wants is by no means stupid since that is the ultimate goal and would be evidence of a need for more abstraction. The computer can still ultimately see it as a special use of a 0 based array. I could care less how the computer interprets it so long as it's a simpler and more intuitive way for me to describe what I want to it with less bugs over time.

So, that's my two cents. I seriously doubt what he was saying, or what was interpreted, was what he meant. What he probably meant was, "I hate having a less-intuitive way of telling the computer what I want." :) Don't we all? lol.


39
[+1] [2010-07-25 15:10:00] Dennis Munsie

Nope. The best you could do is skip the first element, but then your arrays would be completely different than every other array that you would encounter.


Ok -- my answer was for the question "Is it not possible for an array to start from 1 ?" -- the question changed after I submitted. - Dennis Munsie
Ok -- taken care ! - Karthik Balaguru
40
[0] [2010-07-25 16:30:44] Clark Gaebel

You can, but compiler optimizations are free to create invalid results (gcc and msvc won't, but clang will).

char* myArray = malloc(100) - 1;
/* now myArray[1] is the first element, and myArray[100] is the last. */
free(myArray + 1);

But, as other people have mentioned, don't do this. Un-learn your bad habits of starting at 1.

Another solution: Just ignore the first element of the array.


41
[0] [2010-07-25 15:29:24] Rob Napier

An array in C is shorthand for pointer arithmetic. Consider this case:

struct Foo *foo;
struct Foo foos[10];

foo = &(foos[1]);
foo = foos + (1 * sizeof(struct Foo));

The last two lines mean the same thing. Changing the initial offset would break this correlation, making many things in C much more difficult.

Some languages with a very strict Array type, like Pascal, allow you to start counting in other ways. But in C, and in many languages derived from C, arrays are just shorthand for pointer arithmetic, so you can't mess with their starting index.


42
[0] [2010-07-25 15:30:12] R..

If you insist on having your arrays start at 1...

type real_foo[COUNT], *const foo=real_foo-1;

If you're really sadistic, you could even make a preprocessor macro to do this for you...

#define CONCAT(x,y) x ## y
#define ARRAY1(name,size) CONCAT(real_, name), *const name=CONCAT(real_, name)-1
type ARRAY1(foo, COUNT);

Hope I didn't screw up those macros...


(1) Technically it's UB to take a "one off the beginning" pointer in C, unlike the "one off the end" pointer which is valid. In practice you'll get away with it now that we all use machines with flat address spaces. The code you have might provoke a compiler warning sometimes, though. - Steve Jessop
Indeed, I should have mentioned that. - R..
43
[0] [2010-07-25 15:33:25] Michael Foukarakis

0 ≤ i < N just feels nicer. Besides, since the Arabs went through all the trouble to give us the beautiful 0, why not use it?


It were the Indians, if I remember right :-) - Landei
Yes, you're talking about the Olmecs' calendar. The Maya also used it, but neither influence Old World numeric systems. The first time zero was used (as a notion) was by the Babylonians. The modern positional notation was first introduced by Al-Khwarizmi around 500 AD (iirc). - Michael Foukarakis
It came to Europe by way of the Arabs, which is why we tend to think of them as Arabian numerals. They were popularised in Europe by Fibonacci. - TRiG
44
[0] [2010-07-25 15:34:28] Nikolai N Fetissov

C is a very simple language (or so they say). Its design was based strongly on available hardware features, i.e. on ease of implementing the compiler. Indexing an array translates directly into pointer arithmetic, so:

int array[256];
int i = 10;
...
array[i] = 12;

translates into something like:

*(array + i*sizeof(int)) = 12;

or in imaginary intermediate machine language:

load value of array into register Ra
load value of i into register Rb
right shift Rb by X number of bits
add value in Rb to value in Ra
store value 12 at address in Ra

(Note: here array is the address fixed at load time, so I say "load value of array" - it's a bit different for indexing a pointer, one more indirection is required to obtain the base address.)

With this in mind zero-based indexing is only natural.


45
[0] [2010-07-25 15:38:59] progrmr

It's certainly possible but the C language does not support this. Many other languages such as Fortran, PL/1, Pascal, Modula2, Ada support this.

It was part of the C language design to keep the compiler simple and small and it would break too many things to change it now.


46
[0] [2009-07-16 06:29:42] Jason Williams

With zero-based arrays, you can use an unsigned int as the index and then you don't have to test for index out of range on the lower bound. e.g:

int GetValue(unsigned index)
{
    ASSERT(index < arraySize);
    return(array[index];
}

47
[0] [2008-12-26 18:10:46] Redbeard 0x0A

I'm going to step out on a limb here and suggest something different than an integer 'keyed' array.

I think your coworker is getting at creating a one to one mapping of a 'set' in the physical world where we always start counting at 1. I can understand this, when you are not doing anything fancy, it is easy to understand some code when you are mapped 1 to 1 between software and the physical world.

My suggestion

Don't use integer based arrays for whatever you are storing, but use some other kind of dictionary or key value pair. These map better to real life as you aren't bound by an arbitrary integer. This has its place and I would recommend using it as much as you can due to the benifits of mapping concepts 1 to 1 between software and the physical world.

i.e. kvp['Name Server'] = "ns1.example.com"; (This is just one out of a million possible examples).

Discaimer

This most definitely not work when you are working with concepts based in mathmatics, basically because math is closer to the actual implementation of a computer. Using kvp sets are not going to help anything here, but will actually mess things up and make it more problematic. I haven't thought through all the corner cases where something may work better as kvp or as an array.

The end idea is to use the zero-based arrays or key value pairs where it makes sense, remember that when you only have a hammer, every problem starts looking like a nail...


48
[0] [2008-12-26 23:09:50] f100

Zero is natural when talking about the location of an item in a linear collection.

Think of a shelf full of books - the first book is located flush with the side wall of the shelf - that's location zero.

So I guess it depends on whether you consider array indices a means of finding things or referring to things.


49
[0] [2008-12-26 05:37:55] Chad Okere

Why not 2 or 3 or 20? It isn't like having 1-based arrays is somehow easier or simpler to understand then zero based arrays. In order to switch to 1-based arrays, every programmer out there would have to relearn how to work with arrays.

And furthermore, when you're dealing with offsets into existing arrays, it makes more sense too. If you've read 115 bytes out of an array, you know the next chunk starts at 115. And so on, the next byte is always the size of the byte's you've read. With 1-based you'd need to add one all the time.

And you do sometimes need to deal with chunks of data in arrays, even in language without "true" pointer arithmetic. In java you could have data in memory mapped files, or buffers. In that case, you know block i is at size * i. With a 1-based index it would be at block*i+1.

With 1-based indexing, a lot of techniques would require +1s all over the place.


Why not 2 or 3 or 20? Because 0 is the additive identity and 1 is the multiplicative identity. They make the most sense. - Adam Crume
50
[0] [2008-12-26 04:45:05] dsimcha

I think that when you're trying to think about how your code works at a lower level, if this becomes necessary, zero-based arrays map more nicely to this lower level. Also, I'd assume that you'd need an extra add instruction for every dereference of a one-based array to make the pointers work correctly. I see one way around this (make the pointer point to the -1st/0th element of the array, not the 0th/1st), but that might wreak havok with garbage collectors, since you now have pointers pointing outside their allocated blocks.


51
[-1] [2008-12-26 08:03:39] abelenky

Zero-based arrays give you a choice: If you want a one-based array, simply ignore the zeroth element and treat is as a one-based array. The space-waste is minimal, and everyone gets what they want.

But in a language that enforces 1-based arrays, you cannot pretend to have a zero-based array.

Therefore, zero-based is superior: putting the choice and the flexibility in the hands of programmer, where it belongs.


(2) Treating 0-based arrays as 1-based by just ignoring the first element would make your code really unreadable. Better use the language in the way it was supposed to be used. - ibz
@ibz, can I vote up your comment? - Yar
Ignoring the 0th element of an array is idiomatic in BASIC. I wouldn't recommend it in C-like language, though. - dan04
52
[-1] [2008-12-26 21:25:39] Brent.Longborough

They just work better, they work right and produce less bugs. Believe.


53
[-1] [2009-07-16 06:13:43] Omega

Less is more.


54
[-5] [2008-12-26 05:04:52] Kent Fredric

At the most atomic level, its simply due down to being binary.

0   =>  0 
1   =>  1
10  =>  2
11  =>  3

"0" is the first binary number, and likewise arrays are index by their "first" binary number.

Also, given An arbitrary base address for a dataset, the first record exists at that address, not that address + 1,

$base = 010110; 
$firstvalue  = $base + 0 * $unitsize
$secondvalue = $base + 1 * $unitsize
$thirdvalue  = $base + 2 * $unitsize

If you had a "1" based array the internal system would have to constantly decrement the target value by 1 to find the underlying memory address that data was stored at.

0 is just as much the first binary number as it is the first decimal number. Not compelling.

Write the number "0" on a page. How many numbers do you have?

If I have the numbers 500 to 550 on the page, how many numbers do I have? I have 51! But the difference is only 50.

Put 50 sheep in a paddock. Surprisingly, none of them look like numbers, but they are still a countable volume.

We however have this crazy idea that when a sheep looks a little bit too much like a 0, it must not be used, and it must be thrown out and we have to find a sheep looking like 50 to take its place.

Simplified

Use 1 to N for counting. But 0 is still a valid place holding symbol, and as such, it should be used.

If 0 is not a valid place holding symbol, then we should scrap 10, 20, 30, 40 ... etc from our number system, and go straight from 9 to 11.

1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27

In essence, we humans not starting with "0" is a CULTURAL thing. Its not a rational one.


(3) 0 is just as much the first binary number as it is the first decimal number. Not compelling. - Jay Bazuzi
55