I like that I spent a bunch of time really getting this one program performing well with threaded workloads because it used to run on old Pentium Ds, early Core CPUs - now that it's on modern Xeons, it barely touches 10% CPU and is IO bound on a 3x striped array.