Sorting Algo Experiment

Purpose:

The goal of this experiment is to measure the runtime of sorting algorithms. The experiment examined Insertion Sort, Randomized Quicksort, Quicksort with Insertion Sort Hybrid, and Merge Sort. This experiment will also determine when Insertion Sort is more efficient than Quicksort; this threshold will be used in the Quicksort with Insertion Sort hybrid algorithm.

Introduction:

The running time of an algorithm on a particular input is the number of primitive operations or steps executed when the algorithm is run on that input. An algorithm's running time typically depends the size of its input in bits. The asymptotic running time (asymptotic time complexity) of an algorithm allows us to reason about its running time as a function of its input size; however, this analysis hides constant factors. [2] This experiment will examine a few sorting algorithms:

Insertion Sort : O(n²)
Randomized-Quicksort : O(n²), O(n log n) in expectation
Merge Sort : O(n logn)

Based on the asymptotic running time, merge sort would guarantee a faster runtime to sort any size input. In this experiment we will measure the actual time it takes to excute these algorithms for different input sizes and to determine if the constants hidden by the asymptotic time are relevant in determining the most efficient algorithm. Base on this empirical analysis, we will create a hybrid of quicksort and insertion sort to try to acheive superior performance for all input sizes.

Hypothesis:

Quicksort is more efficient than insertion sort for sorting large arrays. However, when implementing each sorting algorithm, quicksort has greater overhead than insertion sort and is less efficient than insertion sort for sorting smaller arrays. Quicksort combined with insertion sort would be the most efficient algorithm to sort larger size arrays.

Experimental Procedure:

Part I. Measure Runtime for All 4 Sorting Algothrims.

The runtime of Merge Sort, Insertion Sort, Randomized-Quicksort, and Quicksort with Insertion Sort Hybrid algorithms where measured for sorting n elements, where 2¹ ≤ n ≤ 2²⁰ and was incremented by a factor 2. The runtime of each algorithm sorting each array size input were measured 20 times. Precision analysis was done for each array size by calculating the average and standard deviation of the runtimes in milliseconds.

Part II. Determining k for Quicksort with Insertion Sort.

To determine the size of the array (k) in which quicksort is faster than insertion sort. The runtime of insertion sort and quicksort algorithms where measured for sorting n elements, where 2² ≤ n ≤ 2¹⁰. There were 100,000 runtime measurements for each array size. The runtime for Quicksort with Insertion Sort Hybrid was measured, where 2³ ≤ k ≤ 2⁸ and 2² ≤ n ≤ 2²². Precision analysis was done for each array size and k by calculating the average and standard deviation of the runtimes in milliseconds. All the experiments were executed on a Mac laptop using OS X El Capitan, version 10.11.6 with a 2.7 GHz Intel Core i5 processor.

Results:

The runtimes for each algorithm is plotted in a log/ log plot as shown in Graph 1. The slope for insertion sort is 2, which is consistant with O(nk). The slope for merge sort is 1, which is consistant with O(n log n). The slope for quicksort is also 1 with the constant less than the constant for merge sort. Quicksort and quicksort with insertion sort where k = 8 had almost overlaying trend for increasing array size up to n = 2²⁰. In Graph 2, the runtimes of insertion sort and quicksort shows that insertion sort is more efficient than quicksort at sorting arrays that are smaller than approximately 103 elements. The k variable was calculated from the intersection point of insertion sort and quicksort to be approximately 103 elements.

Runtimes of Sorting Algorithms

Graph 1. The average runtimes of the sorting algorithms for increasing array size in a log/ log plot.

Table 1. The Average Runtimes and Standard Deviation of 4 Sorting Algothrims in Sorting Array Size (n).

The average runtimes of insertion sort and quicksort was measured to determine the array size, k value, at which quicksort is more efficient than insertion sort.

Insertion Sort vs. Quicksort Runtimes for Small Array Size

Graph 2. The average runtimes for insertion sort and quicksort in log/ log plot. The k value, or the intersecting point, is at approximately 103.46 elements.

Table 2: The Average Runtimes and Standard Deviation of Quicksort and Insertion Sort in Sorting Small Array Sizes.

k value Comparison for Quicksort with Insertion Sort Hybrid.

Graph 3. The average runtimes for quicksort with insertion sort at various k values where k is between 2³ and 2⁸.

Quicksort vs. Quicksort Insertion Hybrid Runtimes

k = 8

(a)

k = 103

(b)

Graph 4a-b. The average runtimes for quicksort verus quicksort with insertion sort hybrid where a) k = 8 and b) k = 103 for array sizes between 2² and 2²⁶.

Discussion:

The best case running time for insertion sort occurs if the array is already sorted. As each element in the array is visited, the comparative operation is at a constant cost and no additional operations need to be done to place the element in the correct index, thus giving insertion sort on an already sorted array O(n). The worst case running time would be if the array is in reverse sorted order which can be expressed as an² + bn + c, for constants a, b, c, or O(n²). A quadratic running time will take exponentially more time to execute as n increases and is not a recommended algorithm to use to sort large array sizes.[2]

To determine the running time for quicksort, we need to analyze the running time for partitioning. The worst case partitioning would be when the partitioning routinely produces a subarray with n-1 elements (-1 for the pivot) and a subarray with 0 elements. If this unbalanced partitioning occurs at each level of the recursion, the worst case running time will be Ө(n²), which is no better than insertion sort. The best case partitioning would be an even split that produces two subarrays where one subarray is at most 1 more than the other (n/2). The recursion tree has depth of Ө(log n) and O(n) work is performed at each level (placing the pivot element in the correct index). This is still true with a 9-to-1 proportional split. Therefore, the average running time of quicksort is much closer to the best case of O(n log n).[2] To ensure that all permutations of the input numbers are equally likely, we can randomize the partition by implementing random sampling. For randomized partitioning, an element is chosen at random from the subarray. The element is swapped with the element in the last index to be the pivot (see code in Experimental Procedure) . The expected behavior of both algorithms is shown in Graph 1. Merge sort, in which the array is also split into 2 subarrays, shows a running time of O(n log n). The cost of the merge operation is more expensive than the partition operation, because the merge operation is not done in place, rather elements are merged into another array, thus also requiring more memory allocation. The greater cost of the merge operation is represented in the greater constant of the merge sort's trendline.

To further optimize the quicksort algorithm, we can combine the algorithm with insertion sort and take advantage of the best case running time of O(n) for insertion sort of an already sorted array. Given an input of an array of 8 elements that is already close to sorted, the number of steps for insertion sort is far less than the number of steps for quicksort. Knowing that quicksort is much faster for large size arrays, much of the heavy lifting can be done with quicksort for subarrays of sizes larger than a certain threshold and leaving the nearly sorted below the threshold to be sorted with insertion sort. The threshold would be the point in which quicksort is faster than insertion sort. Focusing on small size arrays, we can run a simple math calculation for the cost of each algorithm in sorting small size arrays.

Insertion sort : C₁ x N x N (eq. 1)

Quicksort : C₂ x N x log(N) (eq. 2)

Table 3: Simple Computation of the Cost of Insertion Sort and Quicksort at Different Constants.

The behavior of both algorithms is seen in Table 3. for 2 sets of constants C₁ and C₂. The threshold is at N = 4 for C₂ = 2 and N = 16 for C₂ = 4. The cost for insertion sort where C₁ = 1 is the same as quicksort in these cases. The cost for insertion sort and quicksort is within 50% of each other for N = 8. This means k = 8 would be at most 50% worse than a true k value between 4 and 16. If we take k = 8 to be the closest optimal value, then we should see the point of intersection between insertion sort and quicksort to be closer to 8 in Graph 2 and faster runtimes of the hybrid algorithm for k = 8 in Graph 3. However, insertion sort and quicksort intersect at approximately 103 and the runtimes of the hybrid algorithm at the varying k values do not show significant difference in performance for the given range of k values. [3]

When the hybrid algorithm is compared to quicksort, Graph 4a-b, the hybrid algorithm where k = 103 shows a greater runtime separation from quicksort and consistent faster runtime performance as opposed to the hybrid algorithm where k = 8. This discrepancy can be further examined by conducting this experiment on different computers. Actual runtime is dependent on how fast the computer executes the code, how fast it caches data, and how fast it retrieves data from memory.[4] The latter is critical for n log (n) algorithms as that set of actions is needed for the stack in recursion.

The experiment was successful in showing the behavior of three commonly used sorting algorithms. Quicksort is the preferred algorithm to use for sorting very large arrays. The quadratic growth starts to become evident at n = 2⁷ and is quickly outperformed by quicksort at n = 2¹¹ by a factor of 10. The performance of quicksort can be enhanced when insertion sort is used to sort the subarrays with length equal to or less than the threshold. The most optimal threshold at which the rest of the array is sorted by insertion sort was determined to be approximately k = 103 on the computer that conducted the experiment.

References:

Bentley, Jon. 1999. Programming Pearls (2nd Ed.). USA: ACM Press/Addison-Wesley Publishing Co.
Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2001. Introduction to Algorithms. 2nd ed. The MIT Press.
(https://cs.stackexchange.com/users/12003/supercat). n.d. “Why Is the Optimal Cut-Off for Switching from Quicksort to Insertion Sort Machine Dependent?” Computer Science Stack Exchange. https://cs.stackexchange.com/q/37971.
(https://cs.stackexchange.com/users/9550/david-richerby), David Richerby. n.d. “Why Is the Optimal Cut-Off for Switching from Quicksort to Insertion Sort Machine Dependent?” Computer Science Stack Exchange. https://cs.stackexchange.com/q/37962.

Sorting Algothrim Runtimes

By Angela Vuong