Idioms
======
The optimistically-named `A Gentle Introduction to Haskell`_ includes
"this `concise definition`_ of everybody's favorite sorting algorithm".
.. _a gentle introduction to haskell: http://www.haskell.org/tutorial/
.. _concise definition: http://static.tobold.org/hadvocacy/quicksort.hs
.. include:: quicksort.hs
:literal:
The same idea can be
expressed in many different styles, here we explore some of them.
Our starting point is impressively brief. It's also opaquely idiomatic.
Let's start by removing some of those idioms. Actually, this example
demonstrates how easy it is to overuse the much-loved ``(x:xs)`` idiom.
I think it should be reserved for the case of treating each
quasi-anonymous element of a list identically (not unlike some uses of
``$_`` in Perl); the expectation is that ``x`` will walk the list while
``xs`` dwindles.
But that's not the case here: and ``x`` at least plays such an important
role in the algorithm that it must be worth naming it. I'd also lay it
out a bit differently. So here's `quicksort0.hs`_.
.. _quicksort0.hs: http://static.tobold.org/hadvocacy/quicksort0.hs
.. include:: quicksort0.hs
:literal:
Personally I think this is already a huge improvement. The algorithm is
much clearer (and also the fatal flaw in this version of it, but more on
that later).
The next idiom to squash is list comprehensions. They're neat, but I have
trouble remembering the syntax, so I rarely use them. In any case, it's nice to
give names to some other important parts of the algorithm: `quicksort1.hs`_.
.. _quicksort1.hs: http://static.tobold.org/hadvocacy/quicksort1.hs
.. include:: quicksort1.hs
:literal:
I suppose that introduces the idiom of *sections*, but I find them quite
intuitive. The section ``(< pivot)`` is the predicate "less than pivot":
in other words, it is a ``Bool``\ean valued function of one argument.
The section syntax works for any binary operator, so ``(+ 1)`` is the
function that adds one to something, ``(- 2)`` the function that
subtracts two from something, and ``(2 -)`` is the function that
subtracts something from two. Sections work really well in combination
with ``map``, and ``foldl``.
Another idiom is *pattern-matching* on function arguments. This is another
good and easy one, but we can do without it. A recovering *Scheme*
programmer might write `quicksort2.hs`_.
.. _quicksort2.hs: http://static.tobold.org/hadvocacy/quicksort2.hs
.. include:: quicksort2.hs
:literal:
Let's assume we've decided to keep pattern matching. Where else can we
go? The repitition of ``filter ... rest`` suggests we might be missing a
trick, and that trick is ``partition``, lurking just over there in
``Data.List``. That yields `quicksort3.hs`_.
.. _quicksort3.hs: http://static.tobold.org/hadvocacy/quicksort3.hs
.. include:: quicksort3.hs
:literal:
I think this one is very elegant. I like the fact that ``partition``
guarantees we've got the entire list, and we only use one comparison
operator. In all the preceding examples, if you mistype ``>=`` as ``>``,
then you get a different function (one that sorts but also removes
duplicates). So I claim that ``quicksort3`` is a definite improvement.
You might also suspect that it's more efficient, as the two-``filter``
(and equivalent list comprehension) versions appear to scan the list
twice, comparing each element against ``pivot`` twice: once with ``<``
and once with ``>=``. But you might be surprised: Haskell is better than
you have any reasonable right to expect at making your programs
blazingly fast, no matter how you write them. I suspect that the default
instances of ``<`` and ``>=`` in the ``Ord`` typeclass (which both
reduce to ``compare``), together with graph reduction, mean that there
is no difference in efficiency between the two-``filter`` versus the
one-``partition`` versions. (I will check this in reality, and report
back).
Unfortunately, so far, all these definitions of quick sort are for
pedagogical purposes only, as using the first element of the list for
the pivot is very bad. If the list is already sorted, ``below`` is
always empty, and we have a O(n\ :superscript:`2`) selection sort. The
usual way to avoid this is to select the median of the first, last, and
middle element of the list. Messing around with these ideas, I came up
with `quicksort4.hs`_.
.. _quicksort4.hs: http://static.tobold.org/hadvocacy/quicksort4.hs
.. include:: quicksort4.hs
:literal:
This is an unusual *stable* quicksort by virtue of the custom partition
function, inspired by Haskell's ``Ordering`` type, that also extracts
elements equal to the pivot. (I wonder if I've hit on something
significant here, but it's surely such an obvious optimization that
somebody must have thought of it before.)
So ``quicksort4`` is, perhaps, a halfway-decent implementation of quick
sort. The median-of-three cheaply improves performance in most cases,
but it can be defeated. Authorities also concur that it is better to
switch to a different sort algorithm when the list is short, which this
version doesn't.
Of course, unless you're a language / library implementor, you really
shouldn't be writing sort functions. There is a perfectly serviceable
``sort`` in the Prelude; it would be a poor Haskell implementation if
this weren't *at least* as good as ``quicksort4``.
If you really need blazing performance, you'll want to move from
standard lists to a data structure with linear lookups, such as
``Data.Vector``. There are a bunch of different ``sort`` algorithms
available for ``Vector``\s, including David R Musser's *introsort*,
which is an optimized quicksort (as used in the C++ standard template
library). It's a slight pain to use, as the result comes in a monad
(your choice of ``IO`` or ``ST``); `introsort.hs`_ shows the idea.
.. _introsort.hs: http://static.tobold.org/hadvocacy/introsort.hs
.. include:: introsort.hs
:literal: