Randomizing by Random-Comparison Sorting (Revisited)
Yesterday, I posted the results of my quick exploration of whether sorting the list {0,1,2,3,4} using a comparison function that randomly returns < or > (with equal probability). My exploration was prompted by a report on the non-uniformity of the distribution of the random orderings of the browsers in Microsoft’s EU browser ballot. I had said that it seemed likely that the distribution would vary based on the sorting algorithm used.
Today, I have data (and code) that confirms the distribution is sorting-algorithm-dependent. For each sorting algorithm, 1,000,000 instances of the list {0,1,2,3,4} were sorted with a random comparison function and the relative frequencies (rounded to the nearest whole percent) of each number in each position were computed.
Mathematica’s Sort[] |
position/number |
0 |
1 |
2 |
3 |
4 |
first |
18% |
12% |
12% |
12% |
46% |
second |
18% |
24% |
18% |
18% |
24% |
third |
20% |
20% |
26% |
20% |
12% |
fourth |
22% |
22% |
22% |
28% |
6% |
fifth |
22% |
22% |
22% |
22% |
12% |
|
BubbleSort |
position/number |
0 |
1 |
2 |
3 |
4 |
first |
36% |
28% |
20% |
10% |
6% |
second |
28% |
32% |
22% |
12% |
6% |
third |
20% |
22% |
32% |
18% |
10% |
fourth |
12% |
12% |
18% |
38% |
20% |
fifth |
6% |
6% |
10% |
20% |
60% |
|
QuickSort (random pivot) |
position/number |
0 |
1 |
2 |
3 |
4 |
first |
20% |
20% |
20% |
20% |
20% |
second |
20% |
20% |
20% |
20% |
20% |
third |
20% |
20% |
20% |
20% |
20% |
fourth |
20% |
20% |
20% |
20% |
20% |
fifth |
20% |
20% |
20% |
20% |
20% |
|
MergeSort |
position/number |
0 |
1 |
2 |
3 |
4 |
first |
24% |
24% |
26% |
12% |
12% |
second |
26% |
24% |
18% |
16% |
16% |
third |
18% |
18% |
22% |
20% |
20% |
fourth |
16% |
16% |
18% |
26% |
26% |
fifth |
16% |
16% |
18% |
26% |
26% |
|
SelectionSort |
position/number |
0 |
1 |
2 |
3 |
4 |
first |
6% |
6% |
12% |
26% |
50% |
second |
12% |
12% |
20% |
32% |
24% |
third |
20% |
20% |
26% |
20% |
12% |
fourth |
30% |
30% |
20% |
12% |
6% |
fifth |
30% |
30% |
20% |
12% |
6% |
|
The distributions are significantly different among these sorts. QuickSort appears to provide a uniform distribution. I believe that this is because QuickSort will only compare a particular pair of elements once, whereas each of the other sorting algorithms may compare a given pair of elements more than once (and with a random comparison function, receive a different result from one time to the next).
Here is the Mathematica notebook I used to generate this data: Randomize by Sorting.nb. As noted in the file, some of the code for the sorting algorithms was taken from other locations and may be/is subject to their copyrights and/or license terms (I reasonably believe that this use complies with their licenses and/or constitutes fair use. Also, some algorithms exhibited improper behavior when trying to sort lists with duplicate entries using a normal comparison function as noted in the file, though this should have no effect on the data above.
Trackbacks & Pingbacks 1
[...] is highly dependent on the sorting algorithm used, though I cannot readily verify it [edit: I verified it]. Regardless, this seems to be a very poor way to generate a random ordering. Share and [...]