Why java.util.Arrays uses Two Sorting Algorithms

java.util.Arrays uses quicksort (actually dual pivot quicksort in the most recent version) for primitive types such as int and mergesort for objects that implement Comparable or use a Comparator. Why the difference? Why not pick one and use it for all cases? Robert Sedgewick suggests that “the designer’s assessment of the idea that if a programmer’s using objects maybe space is not a critically important consideration and so the extra space used by mergesort maybe’s not a problem and if the programmer’s using primitive types maybe performance is the most important thing so we use the quicksort”, but I think there’s a much more obvious reason.

Quicksort is faster in both cases. Mergesort is stable in both cases. But for primitive types quicksort is stable too! That’s because primitive types in Java are like elementary particles in quantum mechanics. You can’t tell the difference between one 7 and another 7. Their value is all that defines them. Sort the array such [7, 6, 6, 7, 6, 5, 4, 6, 0] into [0, 4, 5, 6, 6, 6, 6, 7, 7]. Not only do you not care which 6 ended up in which position. It’s a meaningless question. The array positions don’t hold pointers to the objects. They hold the actual values of the objects. We might as well say that all the original values were thrown away and replaced with new ones. Or not. It just doesn’t matter at all. There is no possible way you can tell the difference between the output of a stable and unstable sorting algorithm when all that’s sorted are primitive types. Stability is irrelevant with primitive types in Java.

By contrast when sorting objects, including sorting objects by a key of primitive type, you’re sorting pointers. The objects themselves do have an independent nature separate from their key values. Sometimes this may not matter all that much—e.g. if you’re sorting java.lang.Strings—but sometimes it matters a great deal. To borrow an example from Sedgewick’s Algorithms I class, suppose you’re sorting student records by section:

public class Student {

  String lastname;
  String firstName;
  int section; 
  
}

Suppose you start with a list sorted by last name and then first name:

John	Alisson	2
Nabeel	Aronowitz	3
Joe	Jones	2
James	Ledbetter	2
Ilya	Lessing	1
Betty	Lipschitz	2
Betty	Neubacher	2
John	Neubacher	3
Katie	Senya	1
Jim	Smith	3
Ping	Yi	1

When you sort this again by section, if the sort is stable then it will still be sorted by last name and first name within each section:

Ilya	Lessing	1
Katie	Senya	1
Ping	Yi	1
John	Alisson	2
Joe	Jones	2
James	Ledbetter	2
Betty	Lipschitz	2
Betty	Neubacher	2
Nabeel	Aronowitz	3
John	Neubacher	3
Jim	Smith	3

However if you use quicksort, you’ll end up with something like this and have to resort each section by name to maintain the sorting by name:

Ilya	Lessing	1
Katie	Senya	1
Ping	Yi	1
Betty	Lipschitz	2
Betty	Neubacher	2
John	Alisson	2
Joe	Jones	2
James	Ledbetter	2
Jim	Smith	3
John	Neubacher	3
Nabeel	Aronowitz	3

That’s why stable sorts make sense for object types, especially mutable object types and object types with more data than just the sort key; and mergesort is such a sort. But for primitive types stability is not only irrelevant. It’s meaningless.

This entry was posted on Saturday, March 30th, 2013 at 6:51 am and is filed under Java. You can follow any responses to this entry through the Atom feed. Both comments and pings are currently closed.

6 Responses to “Why java.util.Arrays uses Two Sorting Algorithms”

Hash r Says:
April 9th, 2013 at 10:45 pm
Very good observation and explanation on use of two different algorithms.

Very informative. Thanks buddy.
robert Says:
April 15th, 2014 at 11:53 pm
It is the most obvious comparison with sorting to both algorithm.
It improves the efficiency of function java.util.Array

Thank you for sharing.
John Cowan Says:
May 22nd, 2015 at 11:32 am
However, quicksort is O(N^2) in pathological cases. Recent versions of the C++ library therefore use introsort, a version of quicksort that watches whether the recursion is getting too deep, and if so it switches to heapsort, which is O(n log n) in the worst case. This makes introsort O(n log n) in all cases without incurring the greater overhead of heapsort in every case.

Introsort is almost as old as Java. Maybe some day Java will catch up.
Arpit Bhargava Says:
January 16th, 2018 at 9:41 pm
Very Simple and Effective explaination.
Jeetendra Says:
February 5th, 2018 at 9:40 pm
Oh Wow, I always wondered why we do that. Thanks bro, God bless!
Sum of Two – John Canessa Says:
February 19th, 2020 at 11:51 am
[…] I just wanted to explore the approach by finding out how Java sorts arrays. I found the article Why java.util.Arrays uses Two Sorting Algorithms. The article states that the class java.util.Arrays uses quicksort (actually dual pivot quicksort […]

Why java.util.Arrays uses Two Sorting Algorithms

6 Responses to “Why java.util.Arrays uses Two Sorting Algorithms”

Info

Archives

Categories

Feeds

Admin