Algorithms II - Sorting & Searching

What are some of the algorithms for sorting and searching?

Searching for an item and sorting a collection of items form an important part of many computer programs. In this lecture, we will analyze four algorithms; two for each of the following common tasks:

  • Searching: finding the position (or index) of a value within a list
  • Sorting: ordering a list of values

Searching

Not a day goes by that we don’t have to search for something: keys, books, food, or our cellphones. Computers have to do the same. There is so much data stored in a computer that whenever a user asks for something, the computer has to search its memory, looking for the data, and finally making it available to the user. Other searches involve looking for an entry in a database, such as looking up your grades in the school’s database. Other search algorithms may search through virtual space, hunting for the best chess move.

Emoji Searching

What if you have to write a program to search a given number in a list of numbers? How will you do it? How will a computer do it?

Linear search (aka sequential search) is the most basic kind of search algorithm. It involves checking each item of the list (beginning to end), until the desired item is found. It compares the desired item with all the items in the list and when the item is matched successfully, it returns the index (or position) of the item in the list, else it returns -1 (item not found).

1. Let index = 0 (the position of the first item in the list)
2. If (desired item = current item), output index of item and stop 
3. Else if (there are more items in the list), go to the next item (index = index + 1), then go to step 2
4. If you get to the end, output -1, card not found.

Linear search is applied on unsorted or unordered lists, and works well when there are fewer items in a list. Linear search gif

We say the complexity (aka speed) of algorithms is based on the size of our input. If we have a list of 9 numbers (shown above), we have to make at most 9 comparisons. If we have 100 items, we have to make at most 100 comparisons. We can graph the speed of the linear search algorithm as a straight line (equation: y = x, where y is the number of comparisons, and x is the number of items in your list).

We’ve considered the linear search algorithm. Can we do better? Yes, if the list of items is sorted.

Binary search is a more efficient search algorithm that relies on the items of the list already being sorted. Binary search gif

1. Pick the middle item (index = number of remaining items / 2)
2. If desired item found, output index and stop
3. Else if desired item < middle item, repeat with left half
4. Else if desired item > middle card, repeat with right half
5. If you get to the end, output -1, item not found, and stop.

With binary search, we perform less comparisons than we did with linear search. Searching through a list of 8 cards, we perform only 3 checks. For a list of 16, we perform at most 4 checks/comparisons. For 32 items, we perform how many comparisons? 5. Do you see a pattern?

The speed of the binary search algorithm is 2# of comparisons = # of items in the list. Or log2(x) = y

In the following animation, we can see that binary search may be more efficient than linear search in most cases. Can you think of a case when linear search beats binary search? In other words, what is the best case and worst case for the linear search algorithm? For binary search?

Binary vs. Linear Searching

Best & Worst Cases

Linear Search

  • Best Case: when the item you’re looking for is the first item of the list (# of comparisons performed = 1).
  • Worst Case: when the item you’re looking for is not in the list or at the end (# of comparisons performed = size of the list).

Binary Search

  • Best Case: when the item you’re looking for is the middle-most item of the list (# of comparisons performed = 1).
  • Worst Case: when the item you’re looking for is not in the list or at the end (# of comparisons performed = size of the list).

Therefore, linear search performs better than binary search when the item you’re looking for is the first item of the list.

Binary Worst Case

Sorting

Sometimes we may need to sort a collection of items in some particular order (e.g., alphabetically, numerically, chronologically). How do we sort? How can computers sort?

To clarify, the values might be integers, words, cards, or other kinds of objects. We will examine two algorithms: The values might be integers, or strings or even other kinds of objects. We will examine two algorithms:

  • Selection sort, which relies on searching and selecting the smallest item
  • Merge sort, which relies on repeated merging of sections of the list that are already sorted

Selection Sort

To order a given list using selection sort, we repeatedly select the smallest item and move it to the beginning of a growing sorted list.

1. Let index = 0 (the beginning of the list)
2. Use linear search to find the smallest value in the unsorted list
3. Put the current smallest value at index
4. Increment the index (index = index + 1)
5. Repeat step 2 on the remaining unsorted items
6. If you run out of unsorted item, then the list is now sorted, and stop

Example: given the following unsorted list, let’s sort it using selection sort.

5, 2, 8, 1, 9
  1. Let’s start at the beginning of the list.
  2. We find the smallest value of the list:
	1. assume 5 is the smallest value.
	2. 2 < 5, therefore 2 is the smallest value.
	3. 8 > 2, 2 is still the smallest value.
	4. 1 < 2, therefore 1 is the smallest value.
	5. 9 > 1, 1 is still the smallest value.
	6. We've reached the end of the list (after 5 comparisons). 
	We can conclude that 1 is the smallest value.
  1. We’ll put 1 at the beginning of our list.
1				||	5, 2, 8, 9
sorted					unsorted
  1. Then we’ll repeat step 2, find the new smallest value:
	1. assume 5 is the smallest value.
	2. 2 < 5, therefore 2 is the smallest value.
	3. 8 > 2, 2 is still the smallest value.
	4. 9 > 2, 2 is still the smallest value.
	5. We've reached the end of the list (after 4 comparisons). 
	We can conclude that 2 is the smallest value.
  1. We put 2 at the next index of our sorted list.
1, 2			||		5, 8, 9
sorted					unsorted
  1. Then we repeat step 2, finding the new smallest value:
	1. assume 5 is the smallest value.
	2. 8 > 5, 5 is still the smallest value.
	3. 9 > 5, 5 is still the smallest value.
	4. We've reached the end of the list (after 3 comparisons). 
	We can conclude that 5 is the smallest value.
  1. We put 5 at the next index of our sorted list.
1, 2, 5			||		8, 9
sorted					unsorted
  1. Repeat step 2, finding the new smallest value:
	1. assume 8 is the smallest value.
	2. 9 > 8, 8 is still the smallest value.
	3. We've reached the end of the list (after 2 comparisons). 
	We can conclude that 8 is the smallest value.
  1. We put 8 at the next index of our sorted list.
1, 2, 5, 8		||		9
sorted					unsorted
  1. Repeat step 2, finding the new smallest value:
	1. assume 9 is the smallest value.
	2. We've reached the end of the list (after 1 comparisons). 
	We can conclude that 9 is the smallest value.
  1. We put 9 at the next index of our sorted list.
1, 2, 5, 8, 9	||		
sorted					unsorted
  1. According to step 6, we have no more items left to sort. Our sorted list is 1, 2, 5, 8, 9.

Let’s analyze the speed of the selection sort algorithm. When we had a list of 5 items, it took 5 + 4 + 3 + 2 + 1 or 15 comparisons.

  • For 6 items, it would take us 6 + 5 + 4 + 3 + 2 + 1 or 21 comparisons.
  • For 7 items, it would take us 7 + 6 + 5 + 4 + 3 + 2 + 1 or 28 comparisons.
  • For 8 items, it would take us 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 or 36 comparisons.
  • For 9 items, it would take us 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 or 45 comparisons.
  • For 10 items, it would take us 10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 or 55 comparisons.

Do you see a pattern? Although we may not know the exact equation for the graph, we know the complexity of selection sort grows more and more each time. In fact, it looks a lot like the graph of n2. Exponential Graph

Therefore, we say that the complexity of the selection sort algorithm is exponential.

Merge Sort

When we use merge sort to order a list, we repeatedly merge sorted sub-lists of the entire list – starting from sub-lists consisting of a single item each. In other words, we sort by dividing and conquering.

1. Divide the unsorted list into sublists, each containing 1 element (a list of 1 element is considered sorted).
2. Merge sublists to produce new sorted sublists 
3. Stop when you've merged all sublists back into a single list. This will be the sorted list.

Example: let’s sort the list 14, 7, 3, 12, 9, 11, 6, 12 using merge sort. Merge Sort Example

Merge sort is quite fast and has a complexity of n log n where n is the size of the list. This is because whenever we divide a list in half in each step, it can be represented by a logarithmic function (log28 = 3 because 23 is 8) and we still have to split into n sublists.

Complexity

As a summary of the complexities of the algorithms we’ve reviewed, where n is the size of the list

  • linear search: average case is n, best is 1 (first item = desired item), and worst is n (desired item not in list)
  • binary search: average case is log n, best is 1 (middle item = desired item), and worst is log n (desired item not in list)
  • selection sort: average case is n2, best is n2, and worst is n2
  • selection sort: average case is n log n, best is n log n, and worst is n log n

The graphs may look like this: Complexity Graphs

Additional Resources


© 2020. Some rights reserved.