Midterm

Midterm#

Dave Friedman
CMPSC 465 Data Structures and Algorithms

1#

Consider the following sequence of integers.

1  2  3  4  5  6  7  8  9  10  11  12

1

Draw the binary search tree that would result from inserting these numbers in the order they are listed. Is this the worst-case insertion order in terms of tree height?

The integers are ordered such that each subsequent integer is greater than the last (if the order was such that each subsequent element were less than the last, the following would still hold). This is the worst-case insertion order and produces an unbalanced tree with a height linear in the number of integers in the sequence.

\(\text{height} = n\)

2

List a different insertion order which results in a better balanced (i.e., shorter) tree. State the order and draw the tree.

8  4  11  2  6  10  12  1  3  5  7  9

           8
      /          \
     4           11
   /   \        /  \
  2     6     10    12
 / \   / \    /
1   3 5   7  9

3

Devise an algorithm for preprocessing an arbitrary array of numbers with no worse than linear cost which will help to avoid the worst-case scenario for BST construction most of the time. It doesn’t need to provide any strong guarantees, but justify why you think the algorithm will usually avoid the worst case for any arbitrary input, and prove its complexity.

One approach might be to implement a random shuffle, or a random permutation, of the array. For a random shuffle, scan over the length of the array. For element \(i\) choose a random index \(j\) and then \(\text{swap}(A[i], A[j])\). A random permutation would simply exclude previously swapped elements from the choice of random index. A true random choice would yield an order that is unlikely to be sorted.

Another approach might be to incorporate the linear-cost median of medians algorithm to select an approximate median from the array and then swap it with the first element. This would have the effect of distributing about half the elements on one side of the root of the BST and the remaining nodes on the other side. In the worst case, there would be two linked lists of approximately half size.

4

What is the minimum possible height for a BST with these numbers? Draw a valid BST using these numbers that is of this height. Define the “height” of the tree to be the maximum depth of any one leaf node.

In the best-case scenario, the integers would be ordered in such a way as to produce a balanced binary tree with a height logarithmic in the number of integers.

\(\text{height} = \lg n\)

In this case, the minimum possible height is \(4\).

8  4  11  2  6  10  12  1  3  5  7  9

           8
      /          \
     4           11
   /   \        /  \
  2     6     10    12
 / \   / \    /
1   3 5   7  9

2#

Consider the following weighted directed graph.

A -> (F,  8)
B -> (C, -2), (D, -3)
C -> (A,  5)
D -> (A,  5), (E,  3)
E -> (G,  9)
F -> (C,  2)
G -> (A,  2), (E,  1), (F, 4)

1

Run Dijkstra’s algorithm starting on vertex B and provide the single-source shortest path distances from B to all other vertices in the graph. State the order in which vertices were visited.

(B,  0) <- (C, -2+0), pop(D, -3+0)
(D, -3) <- pop(C, -2), (A, 5-3), (E, 3-3)
(C, -2) <- (A, 2), pop(E, 0), (A, 5-2)
(E,  0) <- pop(A, 2), (A, 3), (G, 9+0)
(A,  2) <- skip(A, 3), (G, 9), pop(F, 8)
(F,  8) <- pop(G, 9), (C, 2+8)
(G,  9) <- (C, 10), (A, 2+9), (E, 1+9), (F, 4+9)

B -> D -> C -> E -> A -> F -> G

2

Technically speaking, the definition of the single-source shortest path problem doesn’t require that a given edge is only traversed one time by the shortest path between two vertices. However, all of our algorithms assume that this is the case. Prove that on a graph with nonnegative edge weights the shortest path between any two vertices will never include the same edge more than once.

Since SSSP is greedy, on each node traversal the node along the edge with the least weight is always visited before any other adjacent node, and the two nodes of the edge are marked as visited. Therefore, at a given time the path at the current node is always the shortest path from the start node up until that point, and the associated nodes of that path have already been visited and so won’t be traversed from now.

3

Consider a structure like the following within a weighted digraph.

A -> (B,   2)
B -> (C,   3)
C -> (A, -10)

Such a structure is called a negative cycle. Is it possible to define the shortest path between two vertices in a graph containing a negative cycle like this one, given a definition of shortest path that doesn’t exclude repeated traversals of an edge? Why or why not?

No, it is not possible to define the shortest path between two vertices in a graph containing a negative cycle given a definition of shortest path that doesn’t exclude repeated traversals of an edge. If a negative cycle is part of a path between two vertices then the negative cycle can be traversed indefinitely. If the negative-weighted edge is greater than the sum of the other edge weights in the cycle, the path weight tends towards negative infinity, and thus there is no shortest path.

3#

Given an array of arbitrary integers of length \(n\), we want to determine the maximum sum of a contiguous sequence of numbers within the array. For example, in the array

5  -7  5  6  -1  2  -10  2

the answer is \(12 = 5 + 6 - 1 + 2\).

1

For an array that is already known to contain only positive integers, provide a linear time algorithm for answering this question and state its complexity.

If an array contains only positive integers then the maximum sum of contiguous values is just the sum of all the values in the array.

 INPUT    an array A of n values
OUTPUT    the sum of the values in A

SUM_ARRAY(A, n)
1    sum = 0
2    for i = 0 to n - 1
3        sum = sum + A[i]
4    return sum

This algorithm runs in \(\Theta(n)\) time.

2

Devise a divide-and-conquer algorithm for solving this problem in the general case (positive and negative integers) and prove its time complexity bounds. No points will be awarded for an algorithm that does not use the divide and conquer approach, even if it is correct.

 INPUT    an array A of n values, the index of the lefthand element, the index of the righthand element
OUTPUT    the maximum sum of contiguous values in A

MAX_SUM_CONTIGUOUS (A, l, r)
  if l == r                              // base case, an array of just one element
      return A[l]
 3
  m = l + (r - l) / 2                    // compute the index of the middle element
  lMSC = MAX_SUM_CONTIGUOUS(A, l,     m) // sub problem of size n/2, lefthand  sub array
  rMSC = MAX_SUM_CONTIGUOUS(A, m + 1, r) // sub problem of size n/2, righthand sub array
 4
  // combine - find the maximum contiguous sum across the lefthand and righthand sides
  lmax = -infty
  lsum = 0
  for i = m to l                         // m - l + 1
      lsum += A[i]
      if  lsum > lmax
          lmax = lsum
12
  rmax = -infty
  rsum = 0
  for i = m + 1 to r                     // r - m
      rsum += A[i]
      if  rsum > rmax
          rmax = rsum
19
  return max(lMSC, rMSC, lmax + rmax)

The problem is divided into two sub problems of size \(n/2\).

The combine step computes the maximum contiguous sum across the lefthand and righthand sides and runs in time linear in the size of the subarray:

\( \begin{aligned} \text{lefthand} &= m - l + 1 \\ \text{righthand} &= r - m \\ \text{lefthand} + \text{righthand} &= m - l + 1 + r - m = r - l + 1 = \Theta(n) \end {aligned} \)

Therefore the total time complexity can be expressed as the recurrence relation

\(T(n) = 2T(n/2) + \Theta(n)\)

By the master theorem, this yields a time complexity of \(\Theta(n \log n)\)

\( \begin{aligned} a &= 2 \\ b &= 2 \\ d &= 1 \\ \end {aligned} \implies d = \log_b a \)

\(n = 2^k \iff k = \lg n\)

\( \begin{aligned} T(2^k) &= \Theta(n) + 2T(2^{k - 1}) \\ &= \Theta(n) + \Theta(n) + 2T(2^{k - 2}) \\ &= \Theta(n) + \Theta(n) + \Theta(n) + 2T(2^{k - 3}) \\ &= \dots \\ &= \underbrace{\Theta(n) + \Theta(n) + \Theta(n) + \Theta(n) + \dotsb + n}_{k \,\,\,\text{times}} \\ &= \Theta(n \log n) \end {aligned} \)

5  -7  5  6  -1  2  -10  2    l = 0, r = 7, m = 3, lMSC =  11, rMSC =  4, lmax = 11, rmax = 1, lmax+rmax = 12

5  -7  5  6                   l = 0, r = 3, m = 1, lMSC =   5, rMSC = 11,                      lmax+rmax =  9

5  -7                         l = 0, r = 1, m = 0, lMSC =   5, rMSC = -7, lmax = 5, rmax = -7, lmax+rmax = -2

       5  6                   l = 2, r = 3, m = 2, lMSC =   5, rMSC =  6, lmax = 5, rmax =  6, lmax+rmax = 11

             -1  2  -10  2    l = 4, r = 7, m = 5, lMSC =   2, rMSC =  2, lmax = 2, rmax =  2, lmax+rmax =  4

             -1  2            l = 4, r = 5, m = 4, lMSC =  -1, rMSC =  2

                    -10  2    l = 6, r = 7, m = 6, lMSC = -10, rMSC =  2

4#

Consider a set of three buckets: an 11 gallon bucket, a 4 gallon bucket, and a 5 gallon bucket. We begin with the 4 and 5 gallon buckets full, and the 11 gallon bucket empty. We are allowed to apply a single operation to these buckets: pour the contents of a bucket into another bucket until either the source bucket is empty or the target bucket is full.

1

Generally speaking, what are the states related to the above system, and the transitions between them?

The states of the system consist of the distribution of the nine gallons of water across the three buckets at a given time, and the transitions between these states consist of the operation of pouring water from one of the buckets into one of the other two buckets until the pouree is full or the pourer is empty.

2

Define a graph which models the above states and transitions. Provide specific definitions of the vertex and edge sets. Remember that a graph vertex can contain any information you like–you aren’t restricted to storing a single number.

The vertex set \(V\) consists of triples \((x, y, z)\) which represent the current values of the three buckets of capacity \(11, 4, 5\) respectively, such that

\( \begin{aligned} 0 && \le && x && \le && 11 \\ 0 && \le && y && \le && 4 \\ 0 && \le && z && \le && 5 \\ \end {aligned} \)

and such that \(x + y + z = 9\) (in other words, there are always \(9\) gallons of water distributed across the three buckets at a given time). Further, the pouring operation enforces a further constraint, such that not every triple that meets the prior criteria is included in the vertex set, since not every combination of values can be ge generated from the total volume of water, \(9\) gallons.

4,5,0
4,4,1
4,3,2
4,2,3
4,1,4
4,0,5

3,5,1
3,4,2 not possible
3,3,3 not possible
3,2,4 not possible
3,1,5 not possible
3,0,6

2,5,2
2,4,3 not possible
2,3,4 not possible
2,2,5 not possible
2,1,6 not possible
2,0,7

1,5,3
1,4,4 not possible
1,3,5 not possible
1,2,6 not possible
1,1,7 not possible
1,0,8

0,5,4
0,4,5
0,3,6
0,2,7
0,1,8
0,0,9

The edge set \(E = \{ ((x, y, z), (x', y', z'))\}\) where the triple \((x, y, z)\) is the set of buckets prior to a pour and the triple \((x', y', z')\) is the set of buckets after a pour, such that for some \(u = x, y, or z\) and some \(v = x, y, z, v \ne u\), \(u' = u - \min\{u, C_{v} - v\}\) and \(v' = v + \min\{u, C_{v} - v\}\).

3

We’d like to devise an algorithm that would allow us to determine if it is possible to construct a sequence of pourings, from the given initial state, that would leave x gallons of water in the bucket of C capacity where \(x \le C\). Using the graph defined above, what property of the graph would be present in the case that a solution exists to this problem? Devise an algorithm that would allow you to determine if that property holds, and state its complexity.

When a solution exists to the defined problem, the graph contains a path from the start node that represents the initial state of the system to the node that represents the state in which the specified bucket \(C = 4, 5, \,\text{or}\, 9\) has the value \(x\).

Therefore, an unweighted graph traversal algorithm such as BFS or DFS that runs in \(\Theta(|V| + |E|)\) time will do the trick: if the unweighted graph traversal algorithm reaches the node where bucket \(C\) has the value \(x \le C\) then the answer is in the affirmative. If the unweighted graph traversal algorithm terminates prior to finding the desired node, then the node that represents the requisted state is not connected to the starting node and the answer is in the negative.

 INPUT    a graph G = (V, E), an integer C designating a particular bucket via its capacity, and an integer x designating the current amount of water in that bucket
OUTPUT    a boolean value 1 upon success or -1 upon failure

IS_SEQUENCE_OF_POURS (G, C, x)

4-5-0 -> 0-5-4, 4-0-5
0-5-4 -> 4-1-4, 0-0-9, 4-5-0
4-0-5 -> 0-4-5, 0-0-9, 4-5-0
4-1-4 -> 0-5-4, 0-1-8, 4-0-5, 4-5-0
0-0-9 -> 4-0-5, 0-5-4
0-4-5 -> 4-0-5, 0-0-9, 4-4-1, 0-5-4
0-1-8 -> 1-0-8, 4-1-4, 0-5-4
1-0-8 -> 0-1-8, 0-0-9, 4-0-5, 1-5-3
4-4-1 -> 3-5-1, 0-4-5, 4-0-5, 4-5-0
3-5-1 -> 0-5-4, 4-4-1, 3-0-6, 4-5-0
3-0-6 -> 0-3-6, 0-0-9, 4-0-5, 3-5-1
0-3-6 -> 3-0-6, 0-0-9, 4-3-2, 0-5-4
4-3-2 -> 2-5-2, 0-3-6, 4-0-5, 4-5-0
2-5-2 -> 0-5-4, 4-3-2, 2-0-7, 4-5-0
2-0-7 -> 0-2-7, 0-0-9, 4-0-5, 2-5-2
0-2-7 -> 2-0-7, 0-0-9, 4-2-3, 0-5-4
4-2-3 -> 1-5-3, 0-2-7, 4-0-5, 4-5-0
1-5-3 -> 0-5-4, 4-2-3, 4-5-0

Midterm

Contents

Midterm#

1#

2#

3#

4#