On Iterators and Indexes

Here’s a neat little trick Wolfgang Hoschek showed me. When iterating across a list, if you don’t care about the order in which you iterate, why not do it backwards? Like so:

for (int i=list.size(); --i >= 0; ) {

The advantage is that you avoid calling size() more than once, which at least occasionally I’ve proved to be a hot spot. (It isn’t always, of course, but it definitely is sometimes.) Traditionally this problem is fixed by storing size in a variable, like so:

int size = list.size();
for (int i=0; i < size; i++) { 

However Hoschek’s approach avoids excessive scope on the size variable, which logically should only exist inside the loop.

Hoschek’s solution is a tad obscure for my tastes. I had to think about for a while to convince myself it was correct, never a good sign. And depending on the difference between --i and i-- is very dangerous. I would prefer something like the following, even at the cost of an extra operation:

for (int i=list.size()-1; i >= 0; i--) {

The extra operation really isn’t significant, but it’s still more opaque than I’d like. If only Java started arrays at 1, where God intended them to begin.

By the way, a word to all the Collections API fetishists who are immediately going to yell that I should be using an Iterator here instead: This only came up because Hoschek proved that Iterator was a hot spot that could be eliminated by moving to the code shown here. There’s also the added benefit reducing the class’s coupling by eliminating the dependence on java.util.Iterator.

26 Responses to “On Iterators and Indexes”

  1. Andrew Stewart Says:

    Scope

    If one wants to call list.size() only once, why not keep the scope of the variable local to the loop like this:

    for (int i = 0, n = list.size(); i < n; i++) { }

    One then keeps the familiar form of a for loop. Counting backwards is less intuitive.

  2. O.Mason Says:

    Iterators

    Could you give us a link to the Iterators-are-hotspots decription? I’m always using iterators as I believe they’re less tied to the actual class (ie they can be used for both Set and List), and I wasn’t aware of any performance issues.

  3. zipwow Says:

    Iterators-as-hotspots description — Me too?

    I agree with O.Mason, I’d like to understand how much of an impact Iterators really are. I’ve always liked using Iterators, because they frequently allow me to work only with the Collection interface. I’ve found this very useful in my APIs, as the wide majority of the time I’m returning unordered information. Collection makes that clearer than List and javadoc.

  4. paulm Says:

    Obscurity

    As a C/C++/Java writer for 25 years I am well used to the for loop. I have always thought it a little obscure with the end statement as part of the loop. I think I would prefer to have a size integer with larger scope than make my own code that much less readable. Then there is the new 1.5/5 syntax.

  5. Lachlan O'Dea Says:

    Iterator is better

    Your for loop works fine if the list you’re iterating over implements java.util.RandomAccess (such as ArrayList). However, if you one day find it convenient to change to using a LinkedList, then every for loop you’ve written becomes appallingly inefficient. Iterator may not be quite as fast as a for loop over an ArrayList, but it works well on all types of Collections. Even if I’m using an ArrayList instance, I declare my references as Collection unless I need the functionality of a List. Using Iterator means you don’t need to tie your code to a particular kind of Collection implementation. I’ll take that over a miniscule performance increase any day. In the 0.000007% of loops where the performance difference is actually significant, you could mandate an ArrayList and use a indexed for loop. Or, you could do something like this, which falls back to Iterator if appropriate (I’ve never bothered though):

    void foo(List list)
    {
        if (list instanceof RandomAccess)
        {
            for (int i = list.size() - 1; i >=0; i--)
            {
                 Object o = list.get(i);
            }
        }
        else
        {
            for (Iterator iter = list.iterator(); iter.hasNext(); )
            {
                Object o = iter.next();
            }
        }
    }

    This code gives the best performance for LinkedLists and also list objects such as those returned by java.util.Arrays.asList – which is not an ArrayList, but implements RandomAccess.

  6. Viswanath Says:

    I definitely like counting backwards… but, the real benefit would be to have the condition as for (int i=list.size()-1; i != 0; i–) { Note that the >= is replaced by != .. I’m from an embedded background and checking for != 0 and jumping is proved to be much faster than any other condition. May be this is true for JVM too.

  7. Elliotte Rusty Harold Says:

    != vs. >=

    I’m skeptical of performance optimizations like >= vs. != absent proof, but I will try it out and see if it makes a noticeable impact.

  8. Elliotte Rusty Harold Says:

    ArrayList is better

    Iterator might make sense if I might someday want to use a LinkedList, but why would I ever want to do that? ArrayList is O(1) for essentially all operations: insert, append, delete, etc. LinkedList is not. JDOM got a massive speed-up when it switched from LinkedList to ArrayList, and when I wrote XOM I never even considered using anything else. I can see you might perhaps choose a different collection if you needed a non-List collection or a sorted collection. However in my use case preserving order is a critical part of the class’s semantics, so I don’t really need to consider unordered collections. This code is all private, not part of the class’s public interface, and performance has been proven to be an issue by profiling and benchmarking, so I don’t feel too bad about being explicit about the use of an ArrayList.

  9. Viswanath Says:

    != vs. >=

    I just tried a program to compare these two loops, Over a period of ~130 seconds, there is a difference of ~50 milli seconds. You can say this is insignificant. ๐Ÿ™‚ As I wrote this entry, I thought I’d run the program with -server switch. The results are surprising even for me. The whole program took only ~53 seconds and there is a difference of ~36 SECONDs. May be my program is wrong. Here it is.
    1 public class TimeArray {
    2

    3 public static void main (String args[]) {
    4 int something = 2;

    5 int k = 0;
    6 long startTime = System.currentTimeMillis();
    7

    8 for (k = 0; k < 10; k++) {
    9 for (int i=Integer.MAX_VALUE-1; i>=0; i–) {

    10 something = -something;
    11 }
    12 }
    13
    14 long midTime = System.currentTimeMillis();
    15 for (k = 0; k < 10; k++) {

    16 for (int i=Integer.MAX_VALUE-1; i!=0; i–) {
    17 something = -something;
    18 }

    19 }
    20 long endTime = System.currentTimeMillis();
    21 System.out.println(“Increasing Delta: ” + (midTime – startTime));
    22 System.out.println(“Decreasing Delta: ” + (endTime – midTime));

    23 }
    24 }
    25

    This is adapted from one Collections book I have.
    When executed with -server switch, this program printed,
    Increasing delta: 43801
    Decreasing delta: 8117

    If I run this without using the -server switch
    Increasing delta: 65074
    Decreasing delta: 65027
    Any views?

  10. andyjbs Says:

    Linked List vs. Array List

    The book “Java Performance Tuning” from O’Reilly has a whole bunch of analysis on ArrayList vs. alternatives, including LinkedList, on a range of different virtual machines and testing scenarios.

    IIRC ArrayList almost always won, but there were a few scenarios where other datastructures had the edge. It usually wasn’t significant though.

  11. hip Says:

    != v.s >=

    In for (int i=list.size()-1; i != 0; i–), what if list.size() returns 0?

  12. someoneelse Says:

    != v.s >=

    Hip’s on the right track, but doesn’t follow it far enough — the two loops:

    for (int i=Integer.MAX_VALUE-1; i>=0; i–)

    and

    for (int i=Integer.MAX_VALUE-1; i!=0; i–)

    DON’T DO THE SAME AMOUNT OF WORK: the second does one less iteration (the first one executes for the value zero, the second one doesn’t). Try printing out the numbers.

  13. ianp Says:

    JZ/JNZ

    Yes, the equivalent statements would be: for (int i = Integer.MAX_VALUE - 1; i >= 0; --i); and for (int i = Integer.MAX_VALUE; i != 0; --i); now I have a gut feeling that Viswanath may be on to something with his “jump if not zero” construct here, but taking a step back and thinking about it tells me that no, that’s my 20 years out of date EE training coming back to haunt me… .

    At the risk of invoking the mythical “sufficiently smart compiler” here, I would be surprised if a recent JIT compiled these two constructs into code which had mwidely different performance characteristics, and halfway decent peephole optimizer should be able to produce almost identical code from either construct.

  14. Viswanath Says:

    != v.s >=

    I corrected the loops as pointed out in this thread. Thanks Someoneelse@mailinator and IANP@IANP. But the output, when used -server switch, is the similar.

    Increasing delta: 44802
    Decreasing delta: 8713

    I’m using the J2SE 5.0 SDK. From my embedded experience, I can say that not many compilers do this JZ/JNZ optimization.
    By the way, is there a way in which i can turn-on some optimization settings to javac or jvm?

  15. Elliotte Rusty Harold Says:

    Re: Iterators-as-hotspots description

    The specific context where this arose was in my work on XOM. I originally did use an Iterator in the method in question. Wolfgang Hoschek benchmarked and profiled and found he could get significant speed-ups by not using an Iterator. I think this is the original message. My general prinicple on optimization is not to do it unless it can be proven useful. Wolfgang proved this particular optimization significant in this case, so I did it. I’ll have to experiment with the != vs. >= being suggested here.

  16. scholnicks Says:

    Same Loop But More Readable (IMHO)

    I often write that loop as:

    for(int i=0, n=list.size(); i< n; i++) { }

  17. Kevin Klinemeier Says:

    Hotspot optimization, not faster operation

    Since the -server flag instructs the VM to do more aggressive optimization, I wondered if this were affecting micro-benchmark. So I first ran it as posted above, and got the same results: Increasing Delta: 67063 Decreasing Delta: 32516

    Next, I switched the two for loops, so that the != comes before the >= comparison (code below). Suprisingly, I got the same kind of results, the first loop takes longer: Increasing Delta: 67173 Decreasing Delta: 32485 I think that hotspot is either creating overhead that isn’t there in the second loop, or finally performing some significant optimization while executing the second loop. The interesting bit is, of course, that it doesn’t matter which loop is second. Modified code for second run:

    public class TimeArray {
      public static void main(String args[]) {
        int something = 2;
        int k = 0;
        long startTime = System.currentTimeMillis();
            
        for (k = 0; k < 10; k++) {
          for (int i = Integer.MAX_VALUE - 1; i != 0; i--) {
             something = -something;
          }
        }
            
        long midTime = System.currentTimeMillis();
        for (k = 0; k < 10; k++) {
          for (int i = Integer.MAX_VALUE - 1; i >= 0; i--) {
            something = -something;
          }
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Increasing Delta: " + (midTime - startTime));
        System.out.println("Decreasing Delta: " + (endTime - midTime));
      }
    }
  18. hip Says:

    Hotspot optimization, not faster operation

    Hmm, the last example still has a “one-off” error, probably from a cut-and-paste. But does that matter? A error is a error no matter where it comes from. I can hear my old FORTRAN professor “Correctness first, Readability second, Documentation third.” And when someone would inevitably ask “what about speed?”, he would respond with “you have enough to worry about with the first three.

  19. bchapman Says:

    Reason to count backwards

    One place where we deliberately count backwards is when we are dispatching events to listeners. There is a strong likelyhood that some listeners will remove themselves during the event processing. If you count forward, you need to iterate over a copy to prevent skipping over a listener following one that removes itself, whereas if you count backward, and the current listener gets removed from the List, it doesn’t break the iterating logic. By the way I disagree with the FORTRAN professor, Given a choice I would take a maintainable incorrect program over an unmaintainable correct one. In the first case someone else can fix it. In the second case you cannot fix it when you find you were wrong about it being correct (or when the requirements change). I would put Readability ahead of Correctness. How can you possibly make it correct it if it is not readable?

  20. Kevin Klinemeier Says:

    One-off bug in counterexample (in response to hip@a.cs.okstate.edu)

    Yeah, I left the one-off bug pointed out by another poster in the code I used– pure laziness on my part. The results are still valid, perhaps even moreso. No matter what order you do the two tests in, the first one is always longer. No performance difference is shown between != and >=.

  21. Kevin Klinemeier Says:

    Re: Reason to count backwards But if you iterate backwards, and a listener is removed, don’t you dispatch an event twice to the current listener? An example:
    List location, listener
    1, A
    2, B
    3, C

    4, D
    You’re processing list location 4, listener D. During that time,
    listener B removes itself. Your list now reads:

    1, A
    2, C
    3, D

    When you process location 3, you get D again? Or are double-deliveries not a problem for your system? Also, don’t you face the problem that a listener can be removed between the time you called size() and the first list access? Small, but ArrayIndexOutOfBounds is kind of annoying at runtime. When I did listener dispatching (3 years ago) I never figured out a thread-safe approach to listener dispatch that didn’t involve synchronization, which quickly required list copying to reduce lock contention. I’ll admit I didn’t spend a ton of time on it, as the solution I had was “good enough”.

    -Kevin Klinemeier (who can’t figure out how to add his name directly to the tagline)

  22. Alex Blewitt Says:

    ArrayList is *not* O(1)

    The comment that “ArrayList is O(1) for all operations” is quite definitely not true. In fact, it’s only true for get; insert and delete are O(n) operations. If you’re dealing with a large (i.e. > 10) amounts of data ArrayList append degrades to O(n), because when you insert data into the array, every now and again it has to resize. (The default implementation is to allocates space for 10 entries, but can be changed in the constructor.) The resize operation is O(n). So in fact, ArrayList is only O(1) for get; all others are O(n).

    LinkedList, on the other hand, is O(1) for all operations *except* index -based access. But since most people really don’t need index based accses (they’re all using Iterators to go through the entire collection), iteration is O(1) per element anyway.

  23. hal Says:

    Reuse of iterators

    We can certainly discuss the difference in syntactic style when comparing indexes with iterators. But with the new for syntax in Java 1.5 (or was it 5.0?), the styles are very similar, so the biggest argument for using indexes is performance. The problem with iterators is of course that they have to be allocated, thus reducing speed and quickly generating garbage that must be reclaimed. Java is fairly good at both now, with the generational garbage collector, but one way of reducing the problem would be to define that iterators may be reused once they’ve returned false for hasMore(). Hence, a class may hold a pool of iterators that are in use and once an iterator’s hasMore() method returns false, it is put in a list of reusable iterators. The iterator method (in Iterable) may return a reusable iterator or create a new one. Usually iterators are shortlived and few are needed at once, so this should save both time and space.

  24. cowtowncoder Says:

    One reason to prefer forward indexing…

    caches are designed for it I’m surprised that no one commented on something that is usually brought up: one of main reasons to loop upwards is that CPU caches generally are most adpated to such usage patterns, and may not like the “wrong” direction (pre-fetching not working as well). Now whether this is the case for all JVMs I don’t know… and further, it may only matter for big arrays (array lists), not for smaller ones. Anyway, thought I’ll add the one remaining somewhat well-known argument in the soup. For what it’s worth, I do think that while Iterators have their place, the overhead is likely to always exist; and as such it makes sense to optimize accessing INTERNAL ArrayLists using direct indexing. For public API possibility of getting a LinkedList may be a valid concern, however Posted by on Thursday, December 2nd, 2004 at 11:08 AM

  25. Elliotte Rusty Harold Says:

    ArrayList *is* O(1)

    ArrayList is O(1) for all operations including insert and delete. The implementation is probably not what you think it is. The key is the use of System.arraycopy() to copy elements when the list is resized, rather than manually copying each element in turn. The naive descriptions of array lists given in most data structures text book simply do not describe how Java implements this very efficient class. Look at the code in ArrayList.java some time. You’ll be pleasantly surprised.

  26. cowtowncoder Says:

    O(1)… no, not really, you just need to dig deeper

    Elliotte, System.arraycopy does exactly the same copying under the hood (albeit possibly via a syscall that uses the most optimal native block copy available)? Complexity-wise it is still definitely O(N) for insertions in and deletions from the middle. There is no magic bullet available there; when elements need to be moved, they need to be moved, and the speed is relative to the size of the block moved. If this were not the case, insertion to bigger lists would not take any longer than those to smaller lists; you can benchmark and verify that longer lists do take longer to modify. It’s enlightening to check out relative speed difference between simple loop copy and System.arraycopy. Which is faster depends on platform and the size of the block — arraycopy has JNI overhead, and for smaller copies may be slower. More often though they are about as fast, for smallish blocks.