Final#
Dave Friedman
CMPSC 465 Data Structures and Algorithms
1 - Dynamic Programming#
You are running a software-development company and employ \(N\) programmers. You have \(M\) ongoing projects and each programmer can only be assigned to work on one of them. Each project will produce a different amount of expected profit depending on how many programmers are assigned to it. The total profit from a project \(p\) with \(n\) programmers assigned is \(P(p, n)\). You want to assign programmers to projects so as to make as much profit as possible.
A
State the function to be optimized, with its inputs and output value, as well as whether this is a maximization or minimization problem.
\(f(p, n)\) represents the maximum profit that can be achieved by assigning \(n\) programmers to the first \(p\) projects where \(N\) represents the total number of programmers available to assign to projects and \(M\) represents the total number of projects that programmers can be assigned to.
This is a maximization problem: the objective is to maximize \(f(M, N)\), which represents the maximum profit that can be achieved by optimally assigning all \(N\) programmers to all \(M\) projects.
B
Define the above function using a recurrence relation.
\(f(p, n) = \max_{k = 0}^n \{ P(p, k) + f(p - 1, n - k) \}\)
where
\(P(p, k)\) is the total profit from assigning \(k\) programmers to project \(p\)
and
\(f(p - 1, n - k)\) is the maximum profit from assigning the remaining \(n - k\) programmers to the first \(p - 1\) projects.
C
State the base case of the above recurrence.
\( \begin{aligned} f(0, n) &= 0 \quad \forall n \ge 0 && \text{programmers assigned to no project produce no profit} \\ f(p, 0) &= 0 \quad \forall p \ge 0 && \text{projects without assigned programmers produce no profit} \\ \end {aligned} \)
D
State the time complexity of the dynamic programming algorithm based on the above recurrence.
\(f(p, n)\) is computed for every \(p\) from \(1\) to \(M\) and for every \(n\) from \(0\) to \(N\)
\( \text{programmers }n \overset{\text{project }p}{ \begin{array}{c|c|c|c|c|c} & 0 & 1 & 2 & \dots & M \\ \hline 0 & 0 & 0 & 0 & \dots & 0 \\ \hline 1 & 0 & f(1, 1) & f(2, 1) & \dots & f(M, 1) \\ \hline 2 & 0 & f(1, 2) & f(2, 2) & \dots & f(M, 2) \\ \hline \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ \hline N & 0 & f(1, N) & f(2, N) & \dots & f(M, N) \\ \hline \end {array}} \)
and \(\max_{k = 0}^n\) is computed for each subproblem \((p, n)\).
Therefore, the time complexity of the dynamic programming algorithm based on the above recurrence is pseudo-polynomial:
\(T(M, N) = \underbrace{\Theta(MN)}_{\text{\# subproblems}} \times \underbrace{\Theta(N)}_{\text{per subproblem}} = \Theta(MN^2)\)
Example
\( \begin{aligned} N &= 4 \\ M &= 4 \\ \end {aligned} \)
\( \begin{aligned} f(4, 4) &= \max_{k = 0}^4 \{ P(4, k) + f(3, 4 - k) \} \\ &= \max \{ \cancel{P(4, 0)} + f(3, 4), P(4, 1) + f(3, 3), P(4, 2) + f(3, 2), P(4, 3) + f(3, 1), P(4, 4) + \cancel{f(3, 0)} \} \\ \end {aligned} \)
profit of 0 programmers to project 4 + maximum profit of 4 programmers to first 3 projects
profit of 1 programmers to project 4 + maximum profit of 3 programmers to first 3 projects
profit of 2 programmers to project 4 + maximum profit of 2 programmers to first 3 projects
profit of 3 programmers to project 4 + maximum profit of 1 programmers to first 3 projects
profit of 4 programmers to project 4 + maximum profit of 0 programmers to first 3 projects
\( \begin{aligned} f(3, 4) &= \max_{k = 0}^4 \{ P(3, k) + f(2, 4 - k) \} \\ f(3, 3) &= \max_{k = 0}^3 \{ P(3, k) + f(2, 3 - k) \} \\ f(3, 2) &= \max_{k = 0}^2 \{ P(3, k) + f(2, 2 - k) \} \\ f(3, 1) &= \max_{k = 0}^1 \{ P(3, k) + f(2, 1 - k) \} \\ \end {aligned} \)
2 - Max Flow#
Consider the following flow network.
D ---------------------5-> G ----------------------3-> I
D -10-> C -5-> E I
D --------20-> E ------------------5-> F ----------5-> I
E -15-> A -15-> B -10-> F -15-> H -20-> I
B ---------10-> H
A
Identify the source and sink node in this network.
D is the source node and I is the sink node.
B
Using Ford-Fulkerson, determine the max-flow through this network. Be sure to provide a list of the sequence of \(s\)-\(t\) paths that you check and the bottleneck associated with each one.
(1)
<-0- <-0-
D -5-> G -3-> I
becomes
<-3- <-3-
D -2-> G -0-> I
(2)
<--0- <--0- <--0- <--0- <--0-
D -20-> E -15-> A -15-> B -10-> H -20-> I
becomes
<-10- <-10- <-10- <-10- <-10-
D -10-> E --5-> A --5-> B --0-> H -10-> I
(3)
<-10- <-10- <-10- <--0- <--0- <-10-
D -10-> E --5-> A --5-> B -10-> F -15-> H -10-> I
becomes
<-15- <-15- <-15- <--5- <--5- <-15-
D --5-> E --0-> A --0-> B --5-> F -10-> H --5-> I
(4)
D <-15- E <-0- F <-0- I
D --5-> E -5-> F -5-> I
becomes
D <-20- E <-5- F <-5- I
D --0-> E -0-> F -0-> I
The \(s\)-\(t\) cuts and their bottlenecks are
D -> G -> I 3
D -> E -> A -> B -> H -> I 10
D -> E -> A -> B -> F -> H -> I 5
D -> E -> F -> I 5
and so the max-flow through this network is \(23\).
C
Identify all the min-cuts and critical edges within this flow network.
Critical edges
E -> A
E -> F
G -> I
A -> B
F -> I
Minimum Cuts
P = {C, D, E, G}, Q = {A, B, F, H, I}
P = {A, C, D, E, G}, Q = {B, F, H, I}
3 - Linear Programming#
You are playing a real-time strategy game in which you must allocate resource gatherers to collect three different types of resources: saltpeter, iron, and coal. The ultimate goal is to maximize resource production, but there are a few twists involved. While it is necessary to assign at least 10 gatherers to saltpeter, saltpeter collection is considered to actually reduce overall resource production at a rate of -2 per gatherer. Additionally, you can allocate no more than 50 gatherers to iron and coal combined, and must allocate at least 15 gatherers to coal. Iron collection increases resource production by 3 per gatherer, and coal by 1.
A
Formalize a linear program for solving the above problem using inequalities (not standard form).
\(x_1\) is the number of resource gatherers allocated to collecting saltpeter
\(x_2\) is the number of resource gatherers allocated to collecting iron
\(x_3\) is the number of resource gatherers allocated to collecting coal
\( \max \{ -2x_1 + 3x_2 + x_3 \} \\ \begin{aligned} x_1 &\ge 10 \\ x_2 + x_3 &\le 50 \\ x_3 &\ge 15 \\ x_1, x_2, x_3 &\ge 0 \\ \end {aligned} \)
B
Convert your above formalization into standard form.
\( \min \{ 2x_1 - 3x_2 - x_3 \} \\ \begin{aligned} x_1 - s_1 &= 10 \\ x_2 + x_3 + s_2 &= 50 \\ x_3 - s_3 &= 15 \\ x_1, x_2, x_3, s_1, s_2, s_3 &\ge 0 \\ \end {aligned} \)
C
Does a solution to this linear program exist? Why or why not?
A solution does exist: \(15 \le x_3 \le 50\) requires that \(35 \ge x_2 \ge 0\) given the second constraint \(x_2 + x_3 \le 50\) (along with nonnegativity). Since \(x_2\) increases resource production faster than \(x_3\) we have
\(x_1 = 10, x_3 = 15, x_2 = 35 \implies -2(10) + 3(35) + (15) = 100\)
4 - Huffman Encoding#
Consider a message from an alphabet containing six symbols: \(\Gamma = \{ A, B, C, D, E, F \}\). These symbols each have the following relative frequencies:
\( \begin{aligned} f(A) &= 0.25 \\ f(B) &= 0.10 \\ f(C) &= 0.15 \\ f(D) &= 0.05 \\ f(E) &= 0.10 \\ f(F) &= 0.35 \\ \end {aligned} \)
A
What is the minimum number of bits required per symbol to construct a fixed-length encoding over this alphabet? If you transmit a message containing \(1,000,000\) symbols using this encoding, how many bits do you need to encode the message?
No less than \(3\) bits per symbol are required to construct a fixed-length encoding over \(\Gamma\). An encoding of a message containing \(1,000,000\) symbols thus requires \(3,000,000\) bits.
Here is an encoding scheme in which the only unused bit string is \(111\), which is an arbitrary choice.
\( \begin{aligned} A &= 000 \\ B &= 001 \\ C &= 010 \\ D &= 011 \\ E &= 100 \\ F &= 101 \\ \end {aligned} \)
B
Construct a Huffman encoding over these symbols for a message of \(1,000,000\) symbols. Show each step of the process, including the partial Huffman trees.
Priority Queue, from highest priority--least frequent--to lowest--most frequent--left to right and lexicographic tie breaks
D(0.05), B(0.10), E(0.10), C(0.15), A(0.25), F(0.35)
B+D(0.15)
/ \
D(0.05) B(0.10)
Priority Queue
E(0.10), B+D(0.15), C(0.15), A(0.25), F(0.35)
B+D+E(0.25)
/ \
E(0.10) B+D(0.15)
Priority Queue
C(0.15), A(0.25), B+D+E(0.25), F(0.35)
A+C(0.40)
/ \
C(0.15) A(0.25)
Priority Queue
B+D+E(0.25), F(0.35), A+C(0.40)
B+D+E+F(0.60)
/ \
B+D+E(0.25) F(0.35)
Priority Queue
A+C(0.40), B+D+E+F(0.60)
A+B+C+D+E+F(1)
/ \
A+C(0.40) B+D+E+F(0.60)
Huffman tree:
A+B+C+D+E+F(1)
0/ \1
A+C(0.40) B+D+E+F(0.60)
0/ \1 0/ \1
C(0.15) A(0.25) B+D+E(0.25) F(0.35)
0/ \1
E(0.10) B+D(0.15)
0/ \1
D(0.05) B(0.10)
\( \begin{aligned} f(A) &= 0.25 && A = 01 \\ f(B) &= 0.10 && B = 1011 \\ f(C) &= 0.15 && C = 00 \\ f(D) &= 0.05 && D = 1010 \\ f(E) &= 0.10 && E = 100 \\ f(F) &= 0.35 && F = 11 \\ \end {aligned} \)
C
How many bits are needed to transmit the message with 1,000,000 symbols using your Huffman encoding? Additionally, suppose that for some reason the frequencies of symbols were estimated in error, and it’s actually the case that \(f(D) = 0.35\) and \(f(F) = 0.5\). How many bits would be required in this case, using the Huffman Encoding you determined in part b. In this case, would the fixed-length encoding be better?
Using the variable-length encoding, \(2,400,000\) bits are required to an encode a \(1,000,000\)-symbol message, which is an improvement over the fixed-length encoding. However, if the relative frequencies of symbols \(D\) and \(F\) are swapped then the variable-length encoding requires \(3,000,000\) bits instead, which unfortunately doesn’t make a difference.
\( 2 \times (0.25 \times 10^6) + 4 \times (0.10 \times 10^6) + 2 \times (0.15 \times 10^6) + 4 \times (0.05 \times 10^6) + 3 \times (0.10 \times 10^6) + 2 \times (0.35 \times 10^6) = \\ 2 \times 250,000 + 4 \times 100,000 + 2 \times 150,000 + 4 \times 50,000 + 3 \times 100,000 + 2 \times 350,000 = \\ 500,000 + 400,000 + 300,000 + 200,000 + 300,000 + 700,000 \boxed{= 2,400,000} \)
fA = 0.25; bA = 2
fB = 0.10; bB = 4
fC = 0.15; bC = 2
fD = 0.05; bD = 4
fE = 0.10; bE = 3
fF = 0.35; bF = 2
assert(fA + fB + fC + fD + fE + fF == 1)
print(f'{(1e6 * fA) * bA + (1e6 * fB) * bB + (1e6 * fC) * bC + (1e6 * fD) * bD + (1e6 * fE) * bE + (1e6 * fF) * bF:,.0f} bits')
2,400,000 bits
\( 2 \times (0.25 \times 10^6) + 4 \times (0.10 \times 10^6) + 2 \times (0.15 \times 10^6) + 4 \times (0.35 \times 10^6) + 3 \times (0.10 \times 10^6) + 2 \times (0.05 \times 10^6) = \\ 2 \times 250,000 + 4 \times 100,000 + 2 \times 150,000 + 4 \times 350,000 + 3 \times 100,000 + 2 \times 50,000 = \\ 500,000 + 400,000 + 300,000 + 1,400,000 + 300,000 + 100,000 \boxed{= 3,000,000} \)
fA = 0.25; bA = 2
fB = 0.10; bB = 4
fC = 0.15; bC = 2
fD = 0.35; bD = 4
fE = 0.10; bE = 3
fF = 0.05; bF = 2
assert(fA + fB + fC + fD + fE + fF == 1)
print(f'{(1e6 * fA) * bA + (1e6 * fB) * bB + (1e6 * fC) * bC + (1e6 * fD) * bD + (1e6 * fE) * bE + (1e6 * fF) * bF:,.0f} bits')
3,000,000 bits
5 - Minimum Spanning Tree#
Consider an undirected graph \(G = (V, E)\) with nonnegative edge weights such that \(w(e) \ge 1\) \(\forall e \in E\). Suppose that you have computed a minimum spanning tree for \(G\) as well as the shortest paths to all nodes from a particular node \(s \in V\). Decrease the weights of all edges in the graph such that \(w'(e) = w(e) - 1\) \(\forall e \in E\).
A
Does the MST of \(G\) change under this weight adjustment? Either provide an example of a graph for which it does change, or prove that it cannot change.
No, the MST of \(G\) does not change under this weight adjustment.
B
Do the single-source shortest paths from \(s\) change? Either provide an example of a graph for which it does change, or prove that they cannot change.
Yes, the single-source shortest paths from \(s\) can change. For example:
5
A---------D
| |
2| |2
| |
B---------C
2
SSSP(A -> B) = A -> B 2
SSSP(A -> C) = A -> B -> C 4
SSSP(A -> D) = A -> D 5
4
A---------D
| |
1| |1
| |
B---------C
1
SSSP(A -> B) = A -> B 1
SSSP(A -> C) = A -> B -> C 2
SSSP(A -> D) = A -> B -> C -> D 3