Monday, June 27, 2016

LeetCode OJ - Water and Jug Problem

Problem:

Analysis:

The key to this problem is number theory. Consider the following sum

$ax + by = z$ for $a,b \in \mathbf{Z}$

We can express $x$ and $y$ by multiple of there GCD $d$, so we have

$apd + bqd = (ap + bq)d = z$ for $a,b,p,q \in \mathbf{Z}$, so we conclude $z$ has to be an integer (could be 0) multiple of $d$.

Next, we know the total jug volume is $x + y$, so if $z > x + y$, there is no way we can achieve $z$.

The remaining is the interesting part. We claim that for all $k \in [0, p+q]$, $z = kd$ is achievable.

Without loss of generality, we assume $a > b$ .Consider the pouring strategy as follow:

1. Fill b
2. Pour to a until it is full
3. If a is full, empty it, pour any remainder to a
4. Go back to 1
The strategy is depicted in the diagram below:

Consider $a = 6$, $b = 4$, we have reached from (0, 0) to (0, 4) by step 1, then we reach (4, 0) by step 2, step 3 is skipped, then we reach (4, 4) by step 1, (6, 2) by step 2, (2,0) by step 3, and so on ...

The strategy essentially produce us, at the end of step 3 (skipped or not), $kb (\mod a)$. As per elementary number theory, we know these numbers will run through all the multiple of $d$ at most $a$, so we have proved that $kd$ is achievable for all $k \in [0, p]$.

But how about large multiple? For large multiple we would first achieve $kd - b$, in that case it must be a small multiple, then we just fill the jug b to achieve it.

Solution:

The analysis above shown a pretty simple criterion what is achievable. Writing the code to check that is relatively easy. Care is taken to make sure we deal with 0 correctly.

Code:

#include "stdafx.h"

// https://leetcode.com/problems/water-and-jug-problem/

#include "LEET_WATER_AND_JUG_PROBLEM.h"
#include <map>
#include <algorithm>
#include <iostream>
#include <sstream>
#include <set>
#include <queue>
#include <string>

using namespace std;

namespace _LEET_WATER_AND_JUG_PROBLEM
{
class Solution
{
public:
int gcd(int x, int y)
{
if (y > x)
{
return gcd(y, x);
}
else if (y == 0)
{
return x;
}
else
{
return gcd(y, x % y);
}
}
bool canMeasureWater(int x, int y, int z)
{
if (x == 0)
{
return z == 0 || z == y;
}
else if (y == 0)
{
return z == 0 || z == x;
}
else if (z > x + y)
{
return false;
}
else
{
return (z % gcd(x, y)) == 0;
}
}
};
};

using namespace _LEET_WATER_AND_JUG_PROBLEM;

int LEET_WATER_AND_JUG_PROBLEM()
{
Solution solution;
cout << solution.canMeasureWater(104579, 104593, 12444) << endl;
return 0;
}

Sunday, June 26, 2016

LeetCode OJ - Valid Perfect Square

Problem:

Analysis:

It is really easy to write a solution based on the identity

$n^2 = \sum\limits_{i=1}^{n}{(2i - 1)}$

Just subtract the odd numbers one by one, if we reach 0, then it is a perfect square. If it is negative, then it is not.

The complexity of the solution above is $O(\sqrt{n})$, can we do better?

Solution:

We can binary search for the square root. That requires multiplication, but only a handful of them because $O(\log n)$ is really small. Care is taken in the code to make sure we do not overflow.

Code:

#include "stdafx.h"

// https://leetcode.com/problems/valid-perfect-square/

#include "LEET_VALID_PERFECT_SQUARE.h"
#include <map>
#include <iostream>
#include <sstream>
#include <vector>
#include <string>

using namespace std;

namespace _LEET_VALID_PERFECT_SQUARE
{
class Solution
{
public:
bool isPerfectSquare(int num)
{
if (num == 0) { return true; }
if (num == 1) { return true; }
int exclusiveLowerBound = 0;
int exclusiveUpperBound = num;
while (exclusiveUpperBound - exclusiveLowerBound > 1)
{
// Note: this will never overflow
// This will also always return a value strictly between the two bounds
int mid = (exclusiveLowerBound + exclusiveUpperBound) / 2;

// Squaring could overflow, so using 64 bit arithmetic here
long long midl = (long long)mid;
long long square = midl * midl;
if (square < num)
{
exclusiveLowerBound = mid;
}
else if (square > num)
{
exclusiveUpperBound = mid;
}
else
{
return true;
}
}

return false;
}
};
};

using namespace _LEET_VALID_PERFECT_SQUARE;

int LEET_VALID_PERFECT_SQUARE()
{
Solution s;
cout << s.isPerfectSquare(899 * 899) << endl;
return 0;
}

Saturday, June 25, 2016

Coursera Approximation Algorithms Part I - Peer Assignment 5

Problem:

Solution:

Question 1

We had a lot of notations, but after we understand all the notations, this one is easy.

When $i = 1$, the left hand side is $w(P_0)$, $P_0$ is simply $V$, the set of all vertexes. $w(P_0)$ is the sum of all edge weights that cross partition. But there is no partitioning yet. So the left hand side is simply 0.

The right hand side is $\sum_{j=1}^{i-1}(w(\Gamma_j))$, the sum is simply empty and therefore it is also 0.

So we have $w(P_0) = 0 \le 0 = \sum_{j=1}^{i-1}(w(\Gamma_j))$

Question 2

In every iteration, we break a graph into two, so we have 1 graph in the 0th iteration, 2 graphs in the 1st iteration, and so on. Therefore, for $i \ge 2$, we have $i - 1$ graph in the $i - 2$ iteration.

Question 3

In $P_{i-2}$, we have $i - 1$ graphs. In $\Gamma$, we have $i$ graphs. Suppose we pick a part in $\Gamma$, pick one terminal in it, and dump the corresponding partition in $P_{i-2}$, eventually, we will have problem finding the corresponding graph (you can at most dump $i - 1$ times in $P_{i-2}$, but we have to dump $i$ times in $\Gamma$). In that moment, we found the pair of terminal that is separated in $\Gamma$ but not in $P_{i-2}$.

Question 4

Notice that $B \cap \Gamma_h$ and $B \backslash \Gamma_h$ is a cut of B. The algorithm picks the best minimum cut out of all possible cuts, so we have the cost of the best possible cut is less than the cost of the particular cut. In other words

$w(P_{i-1}) - w(P_{i-2}) = cost(\text{best possible cut}) \le cost(B \cap \Gamma_h$ and $B \backslash \Gamma_h)$

Question 5

In the above, we know that $w(P_{i-1}) - w(P_{i-2}) = cost(\text{best possible cut}) \le cost(B \cap \Gamma_h$ and $B \backslash \Gamma_h)$.

We also know that $cost(B \cap \Gamma_h$ and $B \backslash \Gamma_h) \le w(\Gamma_h)$.

So we take the sum of $w(P_{i-1}) - w(P_{i-2}) \le w(\Gamma_h)$ with the inductive hypothesis, we get the inequality we wanted and therefore we proved the lemma by induction.

Question 6

By the lecture, we know $\sum w(\Gamma) <2 OPT$, so we have a 2-approximation algorithm.

Question 7

At the end of the algorithm, the while loop terminates, there are $k$ graphs. Each of the graph has at least one terminal by construction. There are only $k$ terminals. So we have one terminal in each of the graph, therefore the solution is a valid multi-way cut.

Question 8

The obvious way to find a minimum cut separating any two vertices is to try all pairs. Assuming there are $p$ terminals in $G_i$, the cost would be $\frac{p(p-1)}{2}T$.

Question 9

There are at most $k$ graph in $P$. Each graph has at most $n - k$ vertices.

Question 10

The worst case of the algorithm happens when in every iteration, we isolated exactly one terminal as a graph and add to $P$. The first iteration would need $\frac{k(k-1)}{2} T$ , the second iteration would need $\frac{(k-1)(k-2)}{2} T$, and so on.

The overall complexity is $\sum\limits_{i = 1}^{k-1} \frac{i(i+1)}{2}T$

We can simplify the sum as

$\sum\limits_{i = 1}^{k-1} \frac{i(i+1)}{2}T$
$= \frac{T}{2}\sum\limits_{i = 1}^{k-1} i(i+1)$
$= \frac{T}{2}\sum\limits_{i = 1}^{k-1} (i^2 + i)$
$= \frac{T}{2}(\frac{(k-1)(k)(2(k-1)+1)}{6} + \frac{(k-1)(k)}{2})$
$= \frac{T}{2}(\frac{(k-1)(k)(2k-1)}{6} + \frac{3(k-1)(k)}{6})$
$= \frac{T}{2}(\frac{(k-1)(k)(2k+2)}{6})$
$= \frac{T}{2}(\frac{(k-1)(k)(k+1)}{3})$
$= \frac{(k-1)(k)(k+1)T}{6}$

If we like, we can also express that as $O(k^3T)$.

Friday, June 24, 2016

LeetCode OJ - Count Numbers with Unique Digits

Problem:

Analysis:

It is easy to see there if we have more than 10 digits, then it is impossible for us to not have a repeated digit. So the program can only accept 1 - 10 as input, for any other input we just return 0.

We can think of the the numbers be arranged in a tree like this:

The root is an empty string that we don't care. Each node has a unique path back to the root, and that path (from top to bottom) represents a number. By carefully crafting the tree, we can be sure that all numbers with unique digits are covered, so the remaining is just counting the number of nodes.

Note that the number 0 is not there, and we can associate that with the root node.

Solution:

The number of nodes in the root layer is 1
The number of nodes in the second layer is 9
The number of nodes in the third layer is 9 x 8
... and so on ...

So it is easy to see the answer is $1 + 9 + 9 \times 8 + \cdots$

To make the submission super quick, we can pre-compute the answers (after all, there are only 10 answers, so we can compactly switch it)

Code:

#include "stdafx.h"

// https://leetcode.com/problems/count-numbers-with-unique-digits/

#include "LEET_COUNT_NUMBERS_WITH_UNIQUE_DIGITS.h"
#include <map>
#include <iostream>
#include <sstream>
#include <set>
#include <string>

using namespace std;

namespace _LEET_COUNT_NUMBERS_WITH_UNIQUE_DIGITS
{
class Solution
{
public:
int countNumbersWithUniqueDigits(int n)
{
switch (n)
{
case 0: return 1;
case 1: return 10;
case 2: return 91;
case 3: return 739;
case 4: return 5275;
case 5: return 32491;
case 6: return 168571;
case 7: return 712891;
case 8: return 2345851;
case 9: return 5611771;
case 10: return 8877691;
default: return 0;
}
}
};
};

using namespace _LEET_COUNT_NUMBERS_WITH_UNIQUE_DIGITS;

int LEET_COUNT_NUMBERS_WITH_UNIQUE_DIGITS()
{
Solution s;
cout << s.countNumbersWithUniqueDigits(3) << endl;
return 0;
}

Saturday, June 18, 2016

Coursera Approximation Algorithms Part I - Peer Assignment 4 - Question 2

Problem:

Solution:

(Part 1)

This probability is the same as drawing one particular coin from 10 coins, that would be $\frac{1}{10}$.

(Part 2)

Let $X_i$ be the indicator variable that the i friend get back his own coin. In the above, we already show that $P[X_i = 1] = \frac{1}{10}$.

The expected number of people getting their own coin is simply $E[\sum\limits_{i = 1}^{10} X_i] = \sum\limits_{i = 1}^{10} E[X_i] = \sum\limits_{i = 1}^{10} P[X_i = 1] = \sum\limits_{i = 1}^{10} \frac{1}{10} = 1$.

(Part 3)

Let $Y_i$ be the indicator variable that the i friend get back his own coin in the buy coffee case. Let's us imagine we simply replace the spent coin with a phantom coin and do the experiment as above again. Now everyone have the same chance of getting the phantom coin, and this time around $P[Y_i = 1] = \frac{1}{10} \times \frac {9}{10}$ because you need to get your own coin and the coin is not phantom.

The rest just follows exactly like above and the expected number of people getting their own coin is $\frac{9}{10}$

Again, to verify the expectation, I wrote this simple Monte Carlos simulation to prove that the bound is indeed the case.

using System;
using System.Linq;

class Program
{
static void Main(string[] args)
{
Random random = new Random();
int NUM_EXPERIMENT = 1000000;
int d = 0;
for (int e = 0; e < NUM_EXPERIMENT; e++)
{
int c = 0;
int unlucky = random.Next(10);
int[] a = Enumerable.Range(0, 10).ToArray();
for (int i = 0; i < 9; i++)
{
int j = random.Next(10 - i) + i;
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}
for (int i = 0; i < 10; i++)
{
if (a[i] == i && i != unlucky)
{
c++;
}
}
d += c;
}
Console.WriteLine((d + 0.0) / NUM_EXPERIMENT);
}
}

Friday, June 17, 2016

Coursera Approximation Algorithms Part I - Peer Assignment 4 - Question 1

Problem:

Solution:

(Part 1)

With probability $\frac{1}{2}$, $n_{t + 1} = \frac{9}{8}n_t$, so $n_t - n_{t+1} = n_t - \frac{9}{8}n_t = -\frac{1}{8}n_t$
With probability $\frac{1}{4}$, $n_{t + 1} = n_t$, so $n_t - n_{t+1} = n_t - n_t = 0$
With probability $\frac{1}{4}$, $n_{t + 1} = \frac{1}{2}n_t$, so $n_t - n_{t+1} = n_t - \frac{1}{2}n_t = \frac{1}{2}n_t$

Therefore $E[n_t - n_{t-1}|n_t] = \frac{1}{2}(-\frac{1}{8}n_t) + \frac{1}{2}(0) + \frac{1}{4}(\frac{1}{2}n_t) = \frac{1}{16}n_t$

(Part 2)

Obviously, we can set $f(n_t) = \frac{1}{16}n_t$

(Part 3)

We simply apply the given Wald's equation:

$E[T] \le \frac{1}{2f(\frac{1}{2})}+ E\left[\int\limits_{n_{T-1}}^{n_0}{\frac{1}{f(z)}dz}\right]$

$E[T] \le \frac{1}{2\frac{1}{16}\frac{1}{2}}+ E\left[\int\limits_{n_{T-1}}^{n_0}{\frac{1}{\frac{1}{16}z}dz}\right]$

$E[T] \le 16 + E\left[16 \int\limits_{n_{T-1}}^{n_0}{\frac{1}{z}dz}\right]$

$E[T] \le 16 + 16 E\left[\int\limits_{n_{T-1}}^{n_0}{\frac{1}{z}dz}\right]$

$E[T] \le 16 + 16 E[\ln(n_0) - \ln(n_{T-1})]$

Now $n_0 = x$, we know $n_{T-1} > 1$ (since he hasn't leave yet), so the final bound is

$E[T] \le 16 + 16 \ln x$

To verify the bound, I wrote this simple C# program to empirically estimate the expectation using Monte Carlos simulation and compare it to the bound, the bound seems to consistently bound the expectation, so we should be good.

using System;

public class Test
{
public static void Main()
{
Random r = new Random();
double cs = 0;
double X = 10;

for (int i = 0; i <= 1000; i++)
{
double x = X;
int c = 0;
while (x > 1)
{
double p = r.NextDouble();
if (p < 0.5)
{
x *= 9.0 / 8;
}
else if (p < 0.75)
{
x *= 1.0 / 2;
}
c++;
}
cs += c;
}

double bound = 16 + 16 * Math.Log(X);
double expectation = cs/1000.0;
Console.WriteLine(bound / expectation);
}
}

Saturday, June 11, 2016

Data structure challenge

The goal of this challenge is to build a data structure that can:

1. Insert a tuple (a, b) in $O(\log n)$ time, and
2. For a given $A$, return the value of $max_b { (a, b) | a < A }$.
For simplicity, we assume all values of $a$ are distinct.

Here is the solution:
1. Build a balanced binary search tree (e.g. AVL tree) over the tuples using a as its key.
2. At each node, maintain the maximum b over itself and its subtree.
The tree would look like these

case 1

case 2

The tricky part is to answer the query. We can answer the query recursively as follow:

max_b(A)
{
if (A <= self.a)
{
return a.left.max_b(A)
}
else
{
return max(a.left.max_b, a.b, r.right.max_b(A))
}
}

Without further ado, here is the code for the idea

#include <iostream>
#include <algorithm>
using namespace std;

class tree
{
public:
tree();
void insert(int a, int b);
int query(int a) const;
private:
class node
{
public:
node(int a, int b);
node* insert(int a, int b);
void update();
int balance() const;
int query(int a) const;
int m_a;
int m_b;
int m_max_b;
node* m_left;
node* m_right;
int m_height;
};
node* m_root;
};

tree::tree() : m_root(nullptr)
{

}

void tree::insert(int a, int b)
{
if (this->m_root == nullptr)
{
this->m_root = new node(a, b);
}
else
{
this->m_root = this->m_root->insert(a, b);
}
}

int tree::query(int a) const
{
if (this->m_root == nullptr)
{
return -1;
}
else
{
return this->m_root->query(a);
}
}

tree::node::node(int a, int b) : m_a(a), m_b(b), m_max_b(b), m_left(nullptr), m_right(nullptr), m_height(1)
{

}

int tree::node::balance() const
{
int left_height = this->m_left == nullptr ? 0 : this->m_left->m_height;
int right_height = this->m_right == nullptr ? 0 : this->m_right->m_height;
return right_height - left_height;
}

tree::node* tree::node::insert(int a, int b)
{
if (this == nullptr)
{
return new node(a, b);
}
else
{
if (a == this->m_a)
{
this->m_b = max(this->m_b, b);
}
else if (a < this->m_a)
{
this->m_left = this->m_left->insert(a, b);
}
else if (a > this->m_a)
{
this->m_right = this->m_right->insert(a, b);
}
this->update();
if (this->balance() == 2)
{
if (this->m_right->balance() >= 0)
{
node* a = this;
node* b = this->m_right;
node* c = b->m_left;
b->m_left = a;
a->m_right = c;
a->update();
b->update();
return b;
}
else if (this->m_right->balance() == -1)
{
node* a = this;
node* b = this->m_right;
node* c = b->m_left;
node* d = c->m_left;
node* e = c->m_right;
c->m_left = a;
c->m_right = b;
a->m_right = d;
b->m_left = e;
a->update();
b->update();
c->update();
return c;
}
}
else if (this->balance() == -2)
{
if (this->m_left->balance() <= 0)
{
node* a = this;
node* b = this->m_left;
node* c = b->m_right;
b->m_right = a;
a->m_left = c;
a->update();
b->update();
return b;
}
else if (this->m_left->balance() == 1)
{
node* a = this;
node* b = this->m_left;
node* c = b->m_right;
node* d = c->m_right;
node* e = c->m_left;
c->m_right = a;
c->m_left = b;
a->m_left = d;
b->m_right = e;
a->update();
b->update();
c->update();
return c;
}
}
return this;
}
}

void tree::node::update()
{
int max_b = this->m_b;
int height = 1;
if (this->m_left != nullptr)
{
max_b = max(max_b, this->m_left->m_max_b);
height = max(height, this->m_left->m_height + 1);
}
if (this->m_right != nullptr)
{
max_b = max(max_b, this->m_right->m_max_b);
height = max(height, this->m_right->m_height + 1);
}
this->m_max_b = max_b;
this->m_height = height;
}

int tree::node::query(int a) const
{
if (this == nullptr)
{
return -1;
}
else if (a <= this->m_a)
{
return this->m_left->query(a);
}
else
{
int result = this->m_right->query(a);
result = max(result, this->m_b);
if (this->m_left != nullptr)
{
result = max(result, this->m_left->m_max_b);
}

return result;
}
}

int main(int argc, char** argv)
{
tree data;
data.insert(1, 1);
data.insert(2, 2);
data.insert(3, 3);
data.insert(4, 4);
data.insert(5, 5);
data.insert(6, 6);
data.insert(7, 7);
cout << data.query(5) << endl;
return 0;
}

Coursera Approximation Algorithms Part I - Peer Assignment 3

Problem:

Solution:

(1) is pretty easy, it is obvious that we can have at most 5 items of size 6, and at most 3 items of size 10, so a simple addition table would show what the configurations are

 0 10 20 30 0 0 10 20 30 6 6 16 26 36 12 12 22 32 42 18 18 28 38 48 24 24 34 44 54 30 30 40 50 60

So there are 13 configurations.

(2) The linear program is

Minimize $\sum{x_c}$
Subject to $\sum{x_c a_{s,c}} \ge n_s$

To see if a given solution is feasible for the linear program, we simply check. First, recall $a_{s,c}$ is the number of items of size $s$ in configuration $c$, so we have:

$a_{6,1} = 5$
$a_{10,1} = 0$
$a_{6,2} = 0$
$a_{10,2} = 3$

And also $n_s$ is the number of items of size $s$, so

$n_{6} = 14$
$n_{10} = 8$

The constraints are now

$x_{c_1} a_{6,1} + x_{c_2} a_{6,2} \ge n_{6}$
$x_{c_1} a_{10,1} + x_{c_2} a_{10,2} \ge n_{10}$

Putting in the values, now we get

$\frac{14}{5} 5 + \frac{8}{3} 0 \ge 14$
$\frac{14}{5} 0 + \frac{8}{3} 3 \ge 8$

Now it is obvious the given solution is feasible!

(3) The solution is sparse, as we have only two non-zero entries.

(4) Following the given algorithm, we would round down the given $x_c$ values, so we have

$\lfloor x_{c_1} \rfloor = \lfloor \frac{14}{5} \rfloor = 2$ bins in configuration 1 and
$\lfloor x_{c_2} \rfloor = \lfloor \frac{8}{3} \rfloor = 2$ bins in configuration 2.

(5) With the solution above, we have covered

$2 \times 5 = 10$ items of size $6$
$2 \times 3 = 6$ items of size $10$

So we have $14 - 10 = 4$ items of size $6$ and $8 - 6 = 2$ items of size $10$ to go in the problem $\bar I$.

(6) For $P_1$, we will be opening two more bins, one for each configuration.

(7) Now we run next fit as follow

[6,6,6,6] [10,10]

So we also open two bins with next fit, so again, we are using two more bins.

(8) The total size is $6 \times 14 + 8 \times 10 = 164$, we will need at least $\lceil \frac{164}{30} \rceil = 6$ bins. The solution above use 4 bins in the linear programming solution and 2 more bins with the rest using either method, so it is using 6 bins. As there is no way we can do any better than 6 bins, so the solution is optimal.