Comments on: On Iterators and Indexes

By: cowtowncoder

cowtowncoder — Fri, 06 Jan 2006 22:06:11 +0000

O(1)… no, not really, you just need to dig deeper

Elliotte, System.arraycopy does exactly the same copying under the hood (albeit possibly via a syscall that uses the most optimal native block copy available)? Complexity-wise it is still definitely O(N) for insertions in and deletions from the middle. There is no magic bullet available there; when elements need to be moved, they need to be moved, and the speed is relative to the size of the block moved. If this were not the case, insertion to bigger lists would not take any longer than those to smaller lists; you can benchmark and verify that longer lists do take longer to modify. It’s enlightening to check out relative speed difference between simple loop copy and System.arraycopy. Which is faster depends on platform and the size of the block — arraycopy has JNI overhead, and for smaller copies may be slower. More often though they are about as fast, for smallish blocks.

By: Elliotte Rusty Harold

Elliotte Rusty Harold — Fri, 06 Jan 2006 22:05:18 +0000

ArrayList is O(1)

ArrayList is O(1) for all operations including insert and delete. The implementation is probably not what you think it is. The key is the use of System.arraycopy() to copy elements when the list is resized, rather than manually copying each element in turn. The naive descriptions of array lists given in most data structures text book simply do not describe how Java implements this very efficient class. Look at the code in ArrayList.java some time. You'll be pleasantly surprised.

By: cowtowncoder

cowtowncoder — Fri, 06 Jan 2006 22:04:46 +0000

One reason to prefer forward indexing...

caches are designed for it I'm surprised that no one commented on something that is usually brought up: one of main reasons to loop upwards is that CPU caches generally are most adpated to such usage patterns, and may not like the "wrong" direction (pre-fetching not working as well). Now whether this is the case for all JVMs I don't know... and further, it may only matter for big arrays (array lists), not for smaller ones. Anyway, thought I'll add the one remaining somewhat well-known argument in the soup. For what it's worth, I do think that while Iterators have their place, the overhead is likely to always exist; and as such it makes sense to optimize accessing INTERNAL ArrayLists using direct indexing. For public API possibility of getting a LinkedList may be a valid concern, however Posted by on Thursday, December 2nd, 2004 at 11:08 AM

By: hal

hal — Fri, 06 Jan 2006 22:04:19 +0000

Reuse of iterators

We can certainly discuss the difference in syntactic style when comparing indexes with iterators. But with the new for syntax in Java 1.5 (or was it 5.0?), the styles are very similar, so the biggest argument for using indexes is performance. The problem with iterators is of course that they have to be allocated, thus reducing speed and quickly generating garbage that must be reclaimed. Java is fairly good at both now, with the generational garbage collector, but one way of reducing the problem would be to define that iterators may be reused once they've returned false for hasMore(). Hence, a class may hold a pool of iterators that are in use and once an iterator's hasMore() method returns false, it is put in a list of reusable iterators. The iterator method (in Iterable) may return a reusable iterator or create a new one. Usually iterators are shortlived and few are needed at once, so this should save both time and space.

By: Alex Blewitt

Alex Blewitt — Fri, 06 Jan 2006 22:03:33 +0000

ArrayList is not O(1)

The comment that “ArrayList is O(1) for all operations” is quite definitely not true. In fact, it’s only true for get; insert and delete are O(n) operations. If you’re dealing with a large (i.e. > 10) amounts of data ArrayList append degrades to O(n), because when you insert data into the array, every now and again it has to resize. (The default implementation is to allocates space for 10 entries, but can be changed in the constructor.) The resize operation is O(n). So in fact, ArrayList is only O(1) for get; all others are O(n).

LinkedList, on the other hand, is O(1) for all operations *except* index -based access. But since most people really don’t need index based accses (they’re all using Iterators to go through the entire collection), iteration is O(1) per element anyway.

By: Kevin Klinemeier

Kevin Klinemeier — Fri, 06 Jan 2006 22:03:07 +0000

Re: Reason to count backwards But if you iterate backwards, and a listener is removed, don’t you dispatch an event twice to the current listener? An example:
List location, listener
1, A
2, B
3, C

4, D
You’re processing list location 4, listener D. During that time,
listener B removes itself. Your list now reads:

1, A
2, C
3, D

When you process location 3, you get D again? Or are double-deliveries not a problem for your system? Also, don’t you face the problem that a listener can be removed between the time you called size() and the first list access? Small, but ArrayIndexOutOfBounds is kind of annoying at runtime. When I did listener dispatching (3 years ago) I never figured out a thread-safe approach to listener dispatch that didn’t involve synchronization, which quickly required list copying to reduce lock contention. I’ll admit I didn’t spend a ton of time on it, as the solution I had was “good enough”.

-Kevin Klinemeier (who can’t figure out how to add his name directly to the tagline)

By: Kevin Klinemeier

Kevin Klinemeier — Fri, 06 Jan 2006 22:01:41 +0000

One-off bug in counterexample (in response to hip@a.cs.okstate.edu)

Yeah, I left the one-off bug pointed out by another poster in the code I used– pure laziness on my part. The results are still valid, perhaps even moreso. No matter what order you do the two tests in, the first one is always longer. No performance difference is shown between != and >=.

By: bchapman

bchapman — Fri, 06 Jan 2006 22:01:06 +0000

Reason to count backwards

One place where we deliberately count backwards is when we are dispatching events to listeners. There is a strong likelyhood that some listeners will remove themselves during the event processing. If you count forward, you need to iterate over a copy to prevent skipping over a listener following one that removes itself, whereas if you count backward, and the current listener gets removed from the List, it doesn't break the iterating logic. By the way I disagree with the FORTRAN professor, Given a choice I would take a maintainable incorrect program over an unmaintainable correct one. In the first case someone else can fix it. In the second case you cannot fix it when you find you were wrong about it being correct (or when the requirements change). I would put Readability ahead of Correctness. How can you possibly make it correct it if it is not readable?

By: hip

hip — Fri, 06 Jan 2006 22:00:39 +0000

Hotspot optimization, not faster operation

Hmm, the last example still has a "one-off" error, probably from a cut-and-paste. But does that matter? A error is a error no matter where it comes from. I can hear my old FORTRAN professor "Correctness first, Readability second, Documentation third." And when someone would inevitably ask "what about speed?", he would respond with "you have enough to worry about with the first three.

By: Kevin Klinemeier

Kevin Klinemeier — Fri, 06 Jan 2006 21:59:40 +0000

Hotspot optimization, not faster operation

Since the -server flag instructs the VM to do more aggressive optimization, I wondered if this were affecting micro-benchmark. So I first ran it as posted above, and got the same results: Increasing Delta: 67063 Decreasing Delta: 32516 Next, I switched the two for loops, so that the != comes before the >= comparison (code below). Suprisingly, I got the same kind of results, the first loop takes longer: Increasing Delta: 67173 Decreasing Delta: 32485 I think that hotspot is either creating overhead that isn't there in the second loop, or finally performing some significant optimization while executing the second loop. The interesting bit is, of course, that it doesn't matter which loop is second. Modified code for second run:

public class TimeArray {
  public static void main(String args[]) {
    int something = 2;
    int k = 0;
    long startTime = System.currentTimeMillis();
        
    for (k = 0; k < 10; k++) {
      for (int i = Integer.MAX_VALUE - 1; i != 0; i--) {
         something = -something;
      }
    }
        
    long midTime = System.currentTimeMillis();
    for (k = 0; k < 10; k++) {
      for (int i = Integer.MAX_VALUE - 1; i >= 0; i--) {
        something = -something;
      }
    }
    long endTime = System.currentTimeMillis();
    System.out.println("Increasing Delta: " + (midTime - startTime));
    System.out.println("Decreasing Delta: " + (endTime - midTime));
  }
}

Comments on: On Iterators and Indexes

By: cowtowncoder

O(1)… no, not really, you just need to dig deeper

By: Elliotte Rusty Harold

ArrayList *is* O(1)

By: cowtowncoder

One reason to prefer forward indexing...

By: hal

Reuse of iterators

By: Alex Blewitt

ArrayList is *not* O(1)

By: Kevin Klinemeier

By: Kevin Klinemeier

One-off bug in counterexample (in response to hip@a.cs.okstate.edu)

By: bchapman

Reason to count backwards

By: hip

Hotspot optimization, not faster operation

By: Kevin Klinemeier

Hotspot optimization, not faster operation

ArrayList is O(1)

ArrayList is not O(1)