A question asked here recently [1] reminded me of a debate I had not long ago with a fellow programmer. Basically he argued that zero-based arrays should be replaced by one-based arrays since arrays being zero based is an implementation detail that originates from the way arrays and pointers and computer hardware work, but these sort of stuff should not be reflected in higher level languages.
Now I am not really good at debating so I couldn't really offer any good reasons to stick with zero-based arrays other than they sort of feel like more appropriate. I am really interested in the opinions of other developers, so I sort of challenge you to come up with reasons to stick with zero-based arrays!
I don't think any of us can provide a stronger argument than Edsger W. Dijkstra's article "Why numbering should start at zero" [1].
[1] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.htmlWell... Apparently, most languages, including very recent ones, are zero-based. As those languages were written by quite skilled people, your friend must be wrong...
why 1 would be a better starting index than zero? Why not 2, or 10? The answer itself is interesting because it shows a lot about the though process of the people defending the idea.
The first argument is that it's more natural, because the 1st is usually the one before all others, at least, for the majority of people...
The number-one argument is that the last index is also the size of the array...
I'm still impressed by the "quality" of the reasons I usually hear for this kind of arguments... And even more when I'm reminded that...
... "One-based" notations are left-overs from the western culture that ignored the existence of zero for centuries, if not more.
Believe it or not, the original gregorian calendar goes from -3, -2, -1, 1, 2, 3... Try to imagine the problem it contributed to western science (for example, how many years from 1st January -2 to 1st January 2 to see than the original gregorian calendar conflicts with something as simple as substraction...).
Keeping to one-based arrays is like (well, I'll be downmodded for that... ^_^ ...), keeping to miles and yards in the 21th century...
First (OOops... Sorry... I'll try again)
Zero, Zero is nothing, one is something. And some religious texts hold that "At the beginning, there was nothing". Some computer-related discussion can be as burning as religious debates, so this point is not so out of topics as it seems... ^_^
First, It's easier to work with a zero-based array and ignore its zero-th value than work with one-based array and hack around to find its zero-th value. This reason as almost as stupid as the previous, but then, the original argument in favor of one-based arrays was quite a fallacy, too.
Second, Let's remember that when dealing with numbers, chances are high you'll deal with math one moment or another, and when you deal with math, chances are good you are not in the mood for stupid hacks to get around obsolete conventions. The One-based notation plagued maths and dates for centuries, too, and by learning from our mistakes, we should strive to avoid it in future oriented sciences (including computer languages).
Third, As for computer language arrays being tied to hardware, allocate a C array of 21 integers, and move the pointer 10 indices to the right, and you'll have a natural [-10 to 10] array. This is not natural for hardware. But it is for maths. Of course, math could be obsolete, but the last time I checked, most people in the world believed it was not.
Four, As already pointed elsewhere, even for discrete position (or distances reduced to discrete values), the first index would be zero, like the floor in a building (starting at zero), the decreasing countdown (3, 2, 1, ZERO!), the ground altitude, the first pixel of an image, the temperature (zero Kelvin, for the absolute zero, or zero centigrade degrees, as water freezing temperature of 273 K). In fact, the only thing that really starts with one is the traditional way of "first, second, third, etc." iteration notation, which leads me naturally to the next point...
Five the next point (which naturally follows the previous) is that high-level containers should be accessed, not by index, but by iterators, unless the indices themselves have an intrinsic value. I'm surprised your "higher-level-language" advocate did not mention that. In the case the index itself is important, you can bet half the time you have a math-related question in mind. And thus, you'd like your container to be math-friendly, and not math-disabled like "thy olde gregorian calendar" starting at 1, and needing regurgitated hacks to make it work.
The argument given by your fellow programmer is a fallacy because it needlessly ties spoken/written language habits, which are, by nature, blurry, to computer languages (where you don't want your instruction blurred), and because by attributing wrongly an hardware reason to this problem, he.she hopes to convince you, as languages go higher and higher in abstraction, that the zero-based array is a thing of the past.
Zero-based arrays are zero-based because of math-related reasons. Not for hardware-related reasons.
Now, if this is a problem to your fellow programmer, have him start to program with real high level constructs, like iterators and foreach loops.
The Mayans did not have a concept of zero.
Because of that, all of their arrays started at 1.
You can see how well that worked out in the long run.
Half-open intervals compose well. If you're dealing in 0 <= i < lim
and you want to extend by n
elements, the new elements have indices in the range lim <= i < lim + n
. Working with zero-based arrays makes arithmetic easier when splitting or concatenating arrays or when counting elements. One hopes the simpler arithmetic leads to fewer fencepost errors.
Certain types of array manipulation get crazy complicated with 1-based arrays, but remain simpler with 0-based arrays.
I did some numerical analysis programming at one point. I was working with algorithms to manipulate compressed, sparse matrices, written in both FORTRAN and C++.
The FORTRAN algorithms had a lot of a[i + j + k - 2]
, while the C++ had a[i + j + k]
, because the FORTRAN array was 1-based, while the C++ array was 0-based.
n-1
when accessing an array? More than a+b+c-2
i bet... - RCIX
I feel that my proposal for 0.5-based arrays has been unjustly dismissed without due consideration.
i is not irrational!
: ... but π is... - paercebal
The index in an array is not really an index. It is simply an offset that is the distance from the start of the array. The first element is at the start of the array so there is no distance. Therefore the offset is 0.
The reasons are not just historical: C and C++ are still around and widely used and pointer arithmetic is a very valid reason for having arrays start at index 0.
For other languages lacking pointer arithmetic, whether the first element is at index 0 or 1 is more of a convention rather than anything else.
The problem is that languages that use index 1 as their first element don't exist in a vacuum and usually have to interact with libraries that are often written in -you guessed it- C or C++...
VB and its derived flavours have suffered from having arrays start either at 0 or 1 and it's been a source of problems for a long time.
Bottom-line is: it doesn't matter what your language consider the first element index as long as it's consistent throughout. The problem is that considering 1 as a first index makes it harder to work with in practice.
And
and Or
being bitwise has got nothing to do with VB6, it’s the only logical solution for a VB-type language. You do have AndAlso
and OrElse
for your logical operations. - Konrad Rudolph
And
and Or
being bitwise has everything to do with VB6, because they were bitwise in VB6. The ugly operators AndAlso
and OrElse
should've been made bitwise, since bitwise operations are far less common than logical ones. There are a lot of ugly warts like that left on the language due to "backwards compatibility", like the fact that ByVal is plastered all over the place even though it's the default. - Kyralessa
I have dealt with both. I find zero based arrays reduce fencepost errors.
In computer science, zero is usually used as the starting point [1]. See the advantages [2] of the zero-based numbering on the same page.
[1] http://en.wikipedia.org/wiki/Zero-based_numberingIf you use zero-based arrays, the array's length is the set of the valid indices. At least, that's what Peano arithmetic says:
0 = {}
1 = 0 U {0} = {0}
2 = 1 U {1} = {0,1}
3 = 2 U {2} = {0,1,2}
...
n = n-1 U {n-1} = {0,1,2...n-1}
So it's the most natural notation, in a sense.
zero based arrays are like herpes
Because there is a strong correlation between arrays and pointers in C
char* p = "hello";
char q[] = "hello";
assert(p[1] == q[1]);
assert(*p == *q)
*p is the same as *(p + 0)
having a starting index of 1 will give you headache later
Is it not possible for an array to start from 1.
It would have been logically possible for the standard to be written to specify that. But it wasn't written that way, it doesn't specify that, so there's no way to do that in C. Of course you can write wrapper functions and so on, which allow you to call the first element of an array 0, 1, or even 'A' if you like.
Why is it not possible for an array to start from 1
Well, Perl has a global magic variable that you can set to say what number arrays start from. Setting it to anything other than 0 (the default) is generally considered a big mistake. It's really not something you want to be configurable, because all that could lead to is confusion.
and why is the standard not getting revised to start the array from 1 ?
The language nearly 40 years old. The standard is over 20 years old, and makes a point of preserving backward-compatibility whenever possible (that is, the vast majority of valid C89 programs are also valid C99 programs with the same meaning). It's a bit late now to make a change would would break nearly every single C program which has ever been written.
As for why 0 was chosen over 1 in the first place - a lot of people feel that it's more natural for the index to correspond to "the offset from the start of the array", rather than "the ordinal number of the array element in sequence". In C, a pointer to an array refers to the same address as a pointer to the array's first element, so the offset of the first element necessarily is 0. You don't have to agree with them that it's more natural, but you do have to live with their decision.
This is even aside from the fact that a lot of mathematicians (and hence computer scientists and programmers) prefer to count from 0 anyway. So for example the countable infinity is ℵ0, not ℵ1. In fact ℵ1 is defined to be the least cardinal greater than ℵ0.
Zero-based arrays have their roots in C and even assembler. With C, pointer math basically works like this:
To illustrate, assume int a[4]
is at 0xFF00, the addresses are:
So, with zero based indices, the addres math is simple:
Address of element = Address of array + index * sizeof(type)
In fact the expressions in C are all equivalent:
With one-based arrays, the math is (ever so) slightly more complicated.
So the reasons are largely historical.
Code including some original position/relative position information are much cleaner with arrays starting at 0.
For instance : The code to copy a vector at a defined position in a bigger vector is a pain with arrays starting at 1 :
function copyAtPos (dest, vect, i):
for i from 1 -> vect.length do
dest[pos+i-1] = vect[i]
By opposition with arrays starting at 0:
function copyAtPos (dest, vect, i):
for i from 0 -> vect.length-1 do
dest[pos+i] = vect[i]
If you begin writing big convolutions formula, it becomes a must.
Defend one-based arrays
A heap is one example of the advantages to 1-based arrays. Given an index i, the index of i's parent and left child are
PARENT
[i] = i ÷ 2
LCHILD
[i] = i × 2
But only for 1-based arrays. For 0-based arrays you have
PARENT
[i] = (i + 1) ÷ 2 - 1
LCHILD
[i] = (i + 1) × 2 - 1
And then you have the property that i is also the size of the sub-array to that index (i.e. indices in the range [1,i]).
But in the end it doesn't matter, because you can make a 0-based array into a 1-based array by allocating one more element than normal, and ignoring the zeroth. Thus you can opt-in to get the benefits of 1-based arrays when appropriate, and keep the 0-based arrays for cleaner arithmetic in almost all the other situations.
As a 10+yr C/C++ programmer, with a very strong background in Pascal and Delphi, I still miss Pascal's strong array bound and index type checking, and the flexibility and safety that comes with it. An obvious example of this is an array data holding values for each month.
Pascal:
Type Month = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec);
Var Days[Month] of integer;
...
if Year mod 4 = 0 then // yes this is vastly simplified for leap years and yes i don't know what the comment marker is in pascal and no i won't go look it up
Days[Feb] := 29
else
Days[Feb] := 28;
Writing similar code in C languages without using +/-1's or 'magic numbers' is pretty challenging. Note that expressions like Days[2] and Days[Jan+Dec] simply won't compile, which can appear brutal to people who still think in C or Assembler.
I have to say there are many aspects of Pascal/Delphi languages that I don't miss a bit, but C zero-based arrays do seem just "dumb" by comparison.
The reason it starts at 0 and not 1 is that the you can think of the offset as how far from the beginning of the array's memory is this element. It's not saying give me the 0th element -- it's saying, give me the element who is 0 elements from the start.
Another way to look at it is that these are (for the most part) equivalent:
array[n]
*(array + n)
The reason the standard won't ever be changed is because C has been around for about 40 years now. There is no compelling reason to change it and if they did, all of the existing code that depends on the start of the array being at 0 would be broken.
array[n]
as n[array]
in C. It's not a good idea to do it, it's confusing! But it's legal (well, at least up to C89) because of that identity above and the fact that addition is commutative. - Donal Fellows
Why do you want arrays to start at one?
When you say a[x][y]
, the compiler translates this into: a+(x*num_cols+y)
. If arrays started at one, this would become a+(x*num_cols+y-1)
. This would be an extra arithmetic operation every single time you want to access an array element. Why would you want to slow down programs?
Using 1-based arrays, transform a single-dimension array into a multi-dimensional array:
int w = 5, h = 5, d = 5;
int[] a1 = new int[w * h * d], new a2 = int[w,h,d];
for (int z = 1; z <= d; z++)
for (int y = 1; y <= h; y++)
for (int x = 1; x <= w; x++)
a1[x + (y - 1) * w + (z - 1) * h] = a2[x,y,z];
Note that your y and z indexes are 0-based (y - 1, z - 1) even when your array is 1-based. Under some circumstances, you can't avoid 0-based indexes. For consistency, why not always use 0-based indexes?
Have you ever been annoyed by "20th century" actually referring to the 1900s? Well, it's a good analogy for the tedious things you deal with all the time when using 1-based arrays.
Consider a common array task like the .net IO.stream read method:
int Read(byte[] buffer, int offset, int length)
Here is what I suggest you do to convince yourself 0-based arrays are better:
In each indexing style, write a BufferedStream [1] class that supports reading. You may change the definition of the Read function (eg. use a lower bound instead of an offset) for the 1-based arrays. No need for anything fancy, just make it simple.
Now, which one of those implementations is simpler? Which one has +1 and -1 offsets sprinkled here and there? That's what I thought. In fact I would argue that the only cases where the indexing style doesn't matter is when you should have used something that wasn't an array, like a Set.
[1] http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspxIt's because of how arrays are constructed. It doesn't make a lot of sense for them to start from one. An array is a base address in memory, a size, and an index. To access the nth element, it's:
base + n * element_size
So 0 is obviously the first offset.
A classic article (1982) on the subject is Why numbering should start at zero (EWD 831) [1] by Edsger W. Dijkstra.
[1] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html00:00:59 is the FIRST minute in an hour
Personally, the one argument is when seeing array indexes as offsets. It just makes sense.
One could say that its the first element, but the offset of the first element relative to the origin of the array is zero. As such, taking the array origin and adding zero will yield the first element.
So in computing its easier to add zero to find the first element than to add one and then remove one.
I think anyone who did some lower level stuff always think the base zero way. And the people who are beginning or used to higher level often not-algorithmic programming might wish for a base one system. Or maybe we are just biased by past experiences.
If you were in somewhere other than America, the ground level of a building is the ground floor and the floors above it start at 1. Zero based arrays therefore seem more natural to non-Americans I guess, heh.
My feeling is that it's completely arbitrary. There's nothing special about zero- or one-based arrays. Since liberating myself from Visual Basic (mostly, sometimes I do tiny things in Excel) I haven't worked with 1-based arrays, and... it's the same. The fact is that if you need the third element of the array, it's just an implementation detail that it is called 3 or 2. However, 99% of the work you do with arrays is only interested in two absolute points: the first element and the count or length. Again, it's just an implementation detail that the first element is called zero instead of one, or that the last element is called count-1 or, instead, count.
Edit: Some of the answerers have mentioned that 1-based arrays are more prone to fencepost errors. In my experience, thinking about it now, this is true. I remember thinking, in VB, "this will either work or will blow up because I'm off by one." In Java that never happens. Though I thought I was getting better, some of the answerers point out cases in which 0-based arrays result in nicer arithmetic, EVEN when you don't have to deal with a lower-level lang.
There are actually several different ways to implement this:
Ultimately, I don't think it matter whether a language uses 0 or 1 based arrays. But, I think the best choice is to use 0-based arrays, for the simple reason that most programmers are used to that convention, and it is consistent with the vast majority of already written code.
The only way you can really go wrong, though, is to be inconsistent like Visual Basic. The code base that I am currently maintaining is split between 0 and 1 based arrays; and it is exceedingly difficult to figure out which is which. This leads to an annoyingly verbose for loop:
dim i as integer, lb as integer, ub as integer
lb = LBound(array)
ub = UBound(array)
for i = lb to ub
'...
next
It is just so, and has been for many years. To change it, or even to debate it, is just as pointless as to change or debate changing traffic lights. Let's make blue=stop, red=go.
Look into changes made over time in Numerical Recipes for C++. They had used macros to fake 1-based indexing, but in the 2001 edition gave up and joined the herd. There may be enlighting material on the reasons behind this at their site www.nr.com
BTW, also annoying is the variants of specifying a range out of an array. Example: python vs. IDL; a[100:200] vs a[100:199] to get 100 elements. Just gotta learn the quirks of each language. To change a language that does it one way to match the other would cause such cussing and gnashing of teeth, and not solve any real problem.
At this point, it doesn't matter.
Arrays in many languages are zero-based.
Either live with it.
Or don't use those languages.
You can argue that the arrays should have been 1-based or whatever, but they're not.
What you can do is get famous and good enough to be part of the design group of the next big thing, and thus you get to have your say.
Other than that...
Tough luck.
As for why zero-based arrays are usually used, it's because otherwise you'd have to add an instruction to adjust for the 1-bias to the compiled code, and thus the code would potentially be slower.
It is hard to defend 0-base without programming a lot of array-based code, such as string searching and various sorting/merging algorithms, or simulating multi-dimensional arrays in a single-dimension array. Fortran is 1-based, and you need a lot of coffee to get this kind of code done right.
But it goes way beyond that. It is a very useful mental habit to be able to think about the length of something rather than the indices of its elements. For example, in doing pixel-based graphics, it is much clearer to think of coordinates as falling between pixels rather than on them. That way, a 3x3 rectangle contains 9 pixels, not 16.
A little more far-fetched example is the idea of look-ahead in parsing, or in printing sub-totals in a table. The "common-sense" approach says 1) get the next character, token, or table row, and 2) decide what to do with it. The look-ahead approach says 1) assume you can see it, and decide if you want it, and 2) if you do want it, "accept" it (which allows you to see the next one). Then if you write out the pseudo-code, it is much simpler.
Still another example is how to use "goto" in languages where you have no choice, such as MS-DOS batch files. The "common-sense" approach is to attach labels to blocks of code to be done, and label them as such. Often a better approach is to put labels at the ends of blocks of code, for the purpose of skipping over them. This makes it "structured" and much easier to modify.
The only two (very) serious reasons to used 0-based indices instead of 1-based indices seem to avoid reeducating a lot of programers AND for backward compatiblity.
I didn't see any other serious arguments against 1-based indices in all the answers you received.
In fact, indices are naturally 1-based, and here is why.
First, we must ask : Were does arrays come from ? Do they have real-world equivalents ? The answer is yes : they are how we modelize vectors and matrix in computer science. However, Vectors and matrix are mathematicals concepts that were using 1-based indices before the computer-era (and that still mostly use 1-based indices nowaday).
In the real world, indices are 1-bases.
As Thomas said above, languages that used 0-bases indices are in fact using offsets, not indices. And developers who are using these languages think about offsets, not indices. This would not be a problem if things were clearly stated, but they are not. A lot of developers using offsets still talk about indices. And a lot of developers using indices still don't know that C, C++, C#, ... use offsets.
This is a wording problem.
(Note about Diskstra's paper - It says exactly what I have said above : mathematician do use 1-based indices. But Diskstra think that matematicians should not use them because some expression would then be ugly (eg.: 1 <= n <= 0). Well, not sure he is right on that one - doing such a paradigm shift in order to avoid those exceptional empty sequences seems a lot of trouble for a little result...)
I prefer 0 based index since since modulo (and the AND operator when used for modulo) always returns 0 for some values.
I often find myself using arrays like this:
int blah = array[i & 0xff];
I often get that kind of code wrong when using 1 based indices.
It is possible if you take care while writing your "own" code. You can assume your index starts from n for all n>=0 and program accordingly.
Regarding standard, Borealid has a great argument.
because array names are constant pointers to array starting positions. For example In C array[2] is turned into array + (sizeof(array)*2), which will give you two elements beyond the starting element(third element:)). so if you want to reach the starting element, with the same math, you should do
array + (sizeof(array)*i) = array
(sizeof(array)*i) = 0
i = 0
simple equation math.
I prefer 0-based arrays because, as mentioned by others, it makes math easier. For example, if we have a 1-dimensional array of 100 elements emulating a 10x10 grid, then what is the array index i of the element in row r, col c:
0-based: i = 10 * r + c 1-based: i = 10 * (r - 1) + c
And, given the index i, going back to the row and column is:
0-based: c = i % 10 r = floor(i / 10) 1-based: c = (i - 1) % 10 + 1 r = ceil(i / 10)
Given that the math above is clearly more complex when using 1-based arrays, it seems logical to choose 0-based arrays as the standard.
However, I think that someone could claim that my logic is flawed because I assume that there would be a reason to represent 2D data in a 1D array. I have run into a number of such situations in C/C++, but I must admit that needing to perform such computations is somewhat language dependent. If arrays truly performed all index math for the client, all the time, then the compiler could simply convert your M-based array accesses to 0-based at compile-time and hide all of these implementation details from the user. In fact, any compile-time constant could be used to do the same set of operations, although such constructs would probably just lead to incomprehensible code.
Perhaps a better argument would be that minimizing the number of array index operations in a language with 1-based arrays would require that integer division be performed using the ceiling function. However, from a mathematical perspective, integer division should return d remainder r, where d and r are both positive. Therefore, 0-based arrays should be used to simplify math.
For example, if you are generating a lookup table with N elements, the nearest index prior to the current value into the array for value x would be (approximately, ignoring values where the result is an integer prior to rounding):
0-based with floor: floor((N - 1) * x / xRange) 1-based with floor: floor((N - 1) * x / xRange) + 1 1-based with ceil : ceil ((N - 1) * x / xRange)
Notice that if the standard convention of rounding down is used, 1-based arrays require an additional operation, which is undesirable. This kind of math cannot be hidden by the compiler, as it requires lower-level knowledge about what is happening behind the scenes.
I'm betting the programmer was just annoyed with the counter-intuitiveness of a 0 based array in day to day thinking and was arguing for a more intuitive means of describing arrays. I find it ironic that as humans we spent so much time to come up with "classes" so that we could describe things in a more human way in our code, but then when looking at 0 vs 1 arrays we seem to get hung up on the logic of it alone.
As far as the computer is concerned and mathematically 0 is going to probably be better, but I feel a point is being missed here. If we wanted to describe things in a more human way (e.g. classes), why wouldn't we want the same for other parts of the language? Is that not equally logical or valid (or take higher priority for that matter...) to make a language more easily understandable and usable to humans and thus, by extension, less prone to scenarios that tend to create logic-bugs and more prone to faster production of a usable creation. PHP Example:
array(1 => 'January', 'February', 'March');
gives a 1 based array per our request.
Why not have the norm:
array('January', 'February', 'March');
And the exception be:
array(0 => 'Value for scenario where 0 *has* to be used as the key',
'value2', 'value3');
In the case of say PHP, my bet is 80% of the time a 1 based array being the default syntax-wise would decrease logic-bugs in real world use cases, or at least not cause more on average, while making it a lot easier on the coder to produce usable code quicker. Remember, I'm assuming there would still be the option of, array(0 => 'value') for when it's needed, but also assuming the majority of the time it's practical to have something closer to a real world description.
This really doesn't sound too far fetched of a request when looking at it from that perspective. When approaching an interface, be it an OS or a language for a programmer, the closer to human thinking and habits we design it around, the happier in most cases we will be and the less misunderstandings between the human and the computer(human logic-bugs), and the faster production, etc. we will have. If 80% of the time in the real world I describe things with 1 when making lists or counting, then the computer should ideally interpret my meaning into a way it understands with as little information or change from my normal way of describing something as possible. In short the closer we can model the real world, the better quality the abstraction. So what he wants is by no means stupid since that is the ultimate goal and would be evidence of a need for more abstraction. The computer can still ultimately see it as a special use of a 0 based array. I could care less how the computer interprets it so long as it's a simpler and more intuitive way for me to describe what I want to it with less bugs over time.
So, that's my two cents. I seriously doubt what he was saying, or what was interpreted, was what he meant. What he probably meant was, "I hate having a less-intuitive way of telling the computer what I want." :) Don't we all? lol.
Nope. The best you could do is skip the first element, but then your arrays would be completely different than every other array that you would encounter.
You can, but compiler optimizations are free to create invalid results (gcc and msvc won't, but clang will).
char* myArray = malloc(100) - 1;
/* now myArray[1] is the first element, and myArray[100] is the last. */
free(myArray + 1);
But, as other people have mentioned, don't do this. Un-learn your bad habits of starting at 1.
Another solution: Just ignore the first element of the array.
An array in C is shorthand for pointer arithmetic. Consider this case:
struct Foo *foo;
struct Foo foos[10];
foo = &(foos[1]);
foo = foos + (1 * sizeof(struct Foo));
The last two lines mean the same thing. Changing the initial offset would break this correlation, making many things in C much more difficult.
Some languages with a very strict Array type, like Pascal, allow you to start counting in other ways. But in C, and in many languages derived from C, arrays are just shorthand for pointer arithmetic, so you can't mess with their starting index.
If you insist on having your arrays start at 1...
type real_foo[COUNT], *const foo=real_foo-1;
If you're really sadistic, you could even make a preprocessor macro to do this for you...
#define CONCAT(x,y) x ## y
#define ARRAY1(name,size) CONCAT(real_, name), *const name=CONCAT(real_, name)-1
type ARRAY1(foo, COUNT);
Hope I didn't screw up those macros...
0 ≤ i < N
just feels nicer. Besides, since the Arabs went through all the trouble to give us the beautiful 0, why not use it?
C is a very simple language (or so they say). Its design was based strongly on available hardware features, i.e. on ease of implementing the compiler. Indexing an array translates directly into pointer arithmetic, so:
int array[256];
int i = 10;
...
array[i] = 12;
translates into something like:
*(array + i*sizeof(int)) = 12;
or in imaginary intermediate machine language:
load value of array into register Ra
load value of i into register Rb
right shift Rb by X number of bits
add value in Rb to value in Ra
store value 12 at address in Ra
(Note: here array
is the address fixed at load time, so I say "load value of array
" - it's a bit different for indexing a pointer, one more indirection is required to obtain the base address.)
With this in mind zero-based indexing is only natural.
It's certainly possible but the C language does not support this. Many other languages such as Fortran, PL/1, Pascal, Modula2, Ada support this.
It was part of the C language design to keep the compiler simple and small and it would break too many things to change it now.
With zero-based arrays, you can use an unsigned int as the index and then you don't have to test for index out of range on the lower bound. e.g:
int GetValue(unsigned index)
{
ASSERT(index < arraySize);
return(array[index];
}
I'm going to step out on a limb here and suggest something different than an integer 'keyed' array.
I think your coworker is getting at creating a one to one mapping of a 'set' in the physical world where we always start counting at 1. I can understand this, when you are not doing anything fancy, it is easy to understand some code when you are mapped 1 to 1 between software and the physical world.
My suggestion
Don't use integer based arrays for whatever you are storing, but use some other kind of dictionary or key value pair. These map better to real life as you aren't bound by an arbitrary integer. This has its place and I would recommend using it as much as you can due to the benifits of mapping concepts 1 to 1 between software and the physical world.
i.e. kvp['Name Server'] = "ns1.example.com";
(This is just one out of a million possible examples).
Discaimer
This most definitely not work when you are working with concepts based in mathmatics, basically because math is closer to the actual implementation of a computer. Using kvp sets are not going to help anything here, but will actually mess things up and make it more problematic. I haven't thought through all the corner cases where something may work better as kvp or as an array.
The end idea is to use the zero-based arrays or key value pairs where it makes sense, remember that when you only have a hammer, every problem starts looking like a nail...
Zero is natural when talking about the location of an item in a linear collection.
Think of a shelf full of books - the first book is located flush with the side wall of the shelf - that's location zero.
So I guess it depends on whether you consider array indices a means of finding things or referring to things.
Why not 2 or 3 or 20? It isn't like having 1-based arrays is somehow easier or simpler to understand then zero based arrays. In order to switch to 1-based arrays, every programmer out there would have to relearn how to work with arrays.
And furthermore, when you're dealing with offsets into existing arrays, it makes more sense too. If you've read 115 bytes out of an array, you know the next chunk starts at 115. And so on, the next byte is always the size of the byte's you've read. With 1-based you'd need to add one all the time.
And you do sometimes need to deal with chunks of data in arrays, even in language without "true" pointer arithmetic. In java you could have data in memory mapped files, or buffers. In that case, you know block i is at size * i. With a 1-based index it would be at block*i+1.
With 1-based indexing, a lot of techniques would require +1s all over the place.
I think that when you're trying to think about how your code works at a lower level, if this becomes necessary, zero-based arrays map more nicely to this lower level. Also, I'd assume that you'd need an extra add instruction for every dereference of a one-based array to make the pointers work correctly. I see one way around this (make the pointer point to the -1st/0th element of the array, not the 0th/1st), but that might wreak havok with garbage collectors, since you now have pointers pointing outside their allocated blocks.
Zero-based arrays give you a choice: If you want a one-based array, simply ignore the zeroth element and treat is as a one-based array. The space-waste is minimal, and everyone gets what they want.
But in a language that enforces 1-based arrays, you cannot pretend to have a zero-based array.
Therefore, zero-based is superior: putting the choice and the flexibility in the hands of programmer, where it belongs.
They just work better, they work right and produce less bugs. Believe.
Less is more.
At the most atomic level, its simply due down to being binary.
0 => 0
1 => 1
10 => 2
11 => 3
"0" is the first binary number, and likewise arrays are index by their "first" binary number.
Also, given An arbitrary base address for a dataset, the first record exists at that address, not that address + 1,
$base = 010110;
$firstvalue = $base + 0 * $unitsize
$secondvalue = $base + 1 * $unitsize
$thirdvalue = $base + 2 * $unitsize
If you had a "1" based array the internal system would have to constantly decrement the target value by 1 to find the underlying memory address that data was stored at.
0 is just as much the first binary number as it is the first decimal number. Not compelling.
Write the number "0" on a page. How many numbers do you have?
If I have the numbers 500 to 550 on the page, how many numbers do I have? I have 51! But the difference is only 50.
Put 50 sheep in a paddock. Surprisingly, none of them look like numbers, but they are still a countable volume.
We however have this crazy idea that when a sheep looks a little bit too much like a 0, it must not be used, and it must be thrown out and we have to find a sheep looking like 50 to take its place.
Simplified
Use 1 to N for counting. But 0 is still a valid place holding symbol, and as such, it should be used.
If 0 is not a valid place holding symbol, then we should scrap 10, 20, 30, 40 ... etc from our number system, and go straight from 9 to 11.
1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27
In essence, we humans not starting with "0" is a CULTURAL thing. Its not a rational one.