Friday, June 26, 2015

Reaching our final trick, which is yet another time saver.

This time we rely on a rule named "Once a leaf, always a leaf". The book used the same name.

At the face of it, it is a simple rule. There is no rule that turn a leaf into something else, so this rule is obvious.

Not so obvious is that all strings with a leaf as a prefix will also be a leaf in future extensions.

Let's say, the string "Happy" lead to a leaf, we can be sure that "Happy Birthday" will also end up in a leaf, because in every extension, we reach a leaf, and just extended it.

Now the interesting part comes. Extending a leaf is simply increasing its end index. What if we never explicitly store the end index and simply rely on the current iterations end number?

In that case the leaf extension rule is simply a no-op, and we can skip it. Once the last iteration ends with either leaf extension or split rule can be skipped, because the next iteration will definitely be a leaf extension rule.

That result in our last iteration. Since this is the final product, I just name it the final name, SuffixTree!

Okay, for real, two more tricks, done.

This rule is a real time saver. I call it, once no-op, always no-op within the same phase.

Consider the no-op rule is applied for the string aX. What does that imply?

It means the string aX is already in the tree, inserted in a previous phase.

But if aX is inserted in a previous phase, then so is all the suffixes of aX. So we can simply ignore all the insertion after we see the first no-op because the rest will be no-op as well.

In the book, this trick is called "Rule 3 is a show stopper".

A very simple solution based on returning a flag is used to indicate whether the subsequent insertion can be skipped!

This result in SuffixTree7!

Part 10, really? Fortunately, we are going to finish the series soon. A few more tricks, and we are done.

In this post we will speed up the search even more. Notice that we are searching in the link character by character, and we require all characters in a link to match the key because we knew the search string is already in the tree. What if we just skip through it.

Concretely, when we reach a node, determine the longest length we can move by min(remainingSearchKeyLength, edgeLength), and we can just increase the cursor to reach there right away!

Also, we can save just the substring indexes in the edge, there is no point constructing a storing the whole substring!

This results in SuffixTree6.

In the book - this trick is called the skip count trick. A neat trick indeed.

In the last post we left a question open. What so interesting about the suffix links?

Imagine you are doing an extension to insert aXY, in the next iteration so will be inserting XY, if you knew the suffix link of the node represented by aX, what would you do?

But wait, suppose we followed the suffix link, how much of the string to insert can I skip? With the current structure, there is no way to tell by just following the suffix link, so we need the depth information to pass as well when we follow a suffix link.

Luckily, by the time we found a suffix link, we are searching in the tree, so we actually know what is the current number of character that we can skip!

This result in SuffixTree5.

In the last post, we finally come back to a working solution that can build a suffix tree. Remember the cost of insertion is mostly about finding where to insert, and after that is done, the insertion alone take only constant time, so it make sense to optimize that search.

To that end, we introduce suffix link. This is the most complicated concept in the Ukkonen's algorithm.

Suppose we have an internal node 'v' (i.e. not root, not leaf) representing 'aX', where X is a string, possibly empty, and 'a' is a single character, then the suffix link of 'v' is a pointer to the node 'w' representing X.

Three questions immediately arise.
1.) Why should 'w' exist? In general the string 'X' might not be represented by a node, and
2.) Suppose it does, how do one efficiently find it?

We will answer the first two questions at the same time by a lemma.

If a new internal node is introduced in extension j. It's suffix link will be found in the next extension j+1.

Proof:
Notice that a new internal is introduced in the tree only by the split rule. When inserting 'aXyY', we notice the tree already contains 'aXzZ' where X, Y, Z are strings, possibly empty, a, y and z are single character, and $y \ne z$.

In the next extension, we will be inserting XyY. By the insertion order, we can be sure that XzZ is already there, so the tree must split at the point X, and therefore the suffix link will always be there and it will always be found in the next extension!

With that in mind, we can build the suffix link relatively easily. That result in SuffixTree4.

We will answer the last question in the next post.

Thursday, June 25, 2015

In the last post I described the algorithm for inserting a node. I intentionally left vague what do I mean by a location in the tree. A location could be a node or in the middle of an edge. It is always simpler to have only a single concept, so I merged the node class and the edge class together.

To do that, note that there is always exactly one node associated with an edge, so we can just lump the two together. The only exception is the root node, where we can simply associate it with an edge of 0 length.

With this change, we save a lot of pointers too :)

With that, we split the Add code to do a two phase search. In the first phase, we search the key for all but the last character. We assert that the string is already in the tree. In the second phase, we add the last character to the string, according to the three cases.

That result in SuffixTree3, this time around the code is building a real compressed trie now.

Wednesday, June 24, 2015

The algorithm in the previous post has a fatal flaw - what is that? It is wrong! Let's look at the generated 'compressed' trie now!

It is now building a trie - not a compressed trie. Can you see why? The compressed trie will always generate a node at the end of the suffix, so if 'a' is a string inserted in the compressed trie. 'a' will always lead to a node. If we wanted the compressed trie as in the baseline suffix tree algorithm. We need to make sure we do NOT create those immediate nodes with only one child.

To do that, let examine more closely in the string insertion order. There is a property we can leverage in the insertion order here. Suppose, we are trying to insert "xac" to the tree. We can be sure that the string "xa" is already installed in the tree.

So the insertion procedure can be split into two steps:
Finding where is "xa", and then install the "c" in the tree.

Depending on input, the step 1 search can end up somewhere in the tree. If we are lucky, it ends up in a node, but it could be in the middle of an edge as well, but conceptually it is just a location in the tree. It has three cases

Case 1) The location is a leaf node
Case 2) The location is not a leaf, and it extends to the next character we want.
Case 3) The location is not a leaf, and it does not extend to the next character we want.

Case 1 is simple - our current code in compressed trie create a new edge and new node in this case, we need to avoid that. Just add one more character on the edge label and we are done. This is what the book calls rule 1.

Case 2 is also simple - do nothing at all! The tree already have the string we need, why bother doing anything at all? The book call this rule 3. (Notice we call this case 2, but book say rule 3)

Case 3 is the complicated beast. At this point we must branch out. The book call this rule 2. I personally like to put the simple case up front, this usually make the code easier to read.

The mismatch of the case number and the rule number is obviously annoying. In fact, I hate the rule numbering thing in the book. So I conceptually replaced those rule numbers by rule name, it made the exposition much more easy to follow:

Case 1, rule 1 is the leaf extension rule.
Case 2, rule 3 is the no-op rule.
Case 3, rule 2 is the split rule.

So hereafter I will always refer to the rules by name - never by numbers.

Too much concepts in this post, we will do the coding in the next one.

In the last part we talked about the $O(m^2)$ algorithm for building a suffix tree. This time, we will talk about a $O(m^3)$ time solution.

Why would one wanted to do that? This is because we will improve the $O(m^3)$ algorithm all the way to linear time. We are following a similar exposition as in Algorithms on String, Tree and Sequences.

The algorithm is really simple, here it is, this is the only change from SuffixTree1.

void SuffixTree2Builder::BuildSuffixTree(string input, SuffixTree2* suffixTree)
{
string s = "xabxac";
for (unsigned int end = 1; end <= s.length(); end++)
{
for (unsigned int start = 0; start < end; start++)
{
}
}
}

If we ignore the outer for loop, this is simply the $O(m^2)$ algorithm we talked in the last post. The algorithm install the 'suffixes' in the compressed trie in the following order

x

xa
a

xab
ab
b

...

And so on - many of the 'suffix' are actually prefix of a suffix of the whole string, but that does not matter for a suffix tree because a suffix tree include all the prefixes anyway.

The blank lines are intentionally added between outer loop iterations. Following the book's convention, we call an outer loop iteration a phase, and an inner loop iteration an extension. So for a string with length $m$, we have $m$ phases, from phase 1 to phase $m$. For phase $k$, we have $k$ extensions, from extension 1 to extension $k$.

Unlike the book's convention, however, we will ALWAYS use 0 based index for string, this is so convenient, and we will ALWAYS use inclusive, exclusive index for string indexes, for example, [3, 7) for "Hello World" is "lo W", as indicated below:

0123456789A
Hello World

This algorithm is really stupid? Yes - it is, we will move forward in the next post.

Monday, June 22, 2015

It is relatively easy to start with a $O(m^2)$ algorithm. It takes $O(m^2)$ time to generate all the suffixes, and insert them one by one into an empty trie. Since insertion take time proportional to the length of the string, the total time for all the insertion is $O(m^2)$. That's it.

The advantage of this algorithm is that we can have a relatively quick way to test if our advanced algorithm works, we can compare the trie and therefore check the correctness of the advanced algorithm.

Using our compressed trie code, we can generate a visualization of the suffix tree for 'xabxac' as follow:

Notice we have 6 leaves, that is because we have 6 suffixes. In general, we want to have a leaf for each suffix. Without lost of generality, we can simply insert a end of string character to the original string that never appear in the string itself at the end of the string. Implementation wise, a null character would be very convenient for C/C++. With the special end character, we can guarantee a suffix is never a prefix of some other suffixes, and therefore we will always have each suffix ends up as a leaf.

For a suffix tree, we do not care about the values associated with the suffix, at least for now, so we get rid of the payload in the Add method. Without the payload, the Get method is meaningless, so we will also remove it. That results in the SuffixTree1 implementation.

In the last part, we explained compressed trie. In this part we will start talking about suffix tree. A suffix tree is simple a compressed trie of all the suffix of a string.

Consider the string 'xabxac', the suffixes are

xabxac
abxac
bxac
xac
ac
c

So the final suffix tree look like this

xa->bxac
->c
a->bxac
->c
bxac
c

So that is the suffix tree, the amazing thing about suffix tree is that it can be build in $O(m)$ time, where m is the length of the string, while we have $O(m^2)$ characters in total in all the suffixes.

The linear time algorithm is a complicated beast, so we will NOT attempt to explain it at once, but instead explain it by doing step-wise refinement of a rather trivial algorithm.

Before we goes to the details of how to construct a suffix tree, let start with seeing how to use a suffix tree! What so interesting about having a compressed trie with all the suffixes?

Consider the problem of text searching - suppose our task is to determine whether or not 'ab' is a sub-string of 'xabxac'. With a suffix tree, we can simply walk the tree to find that out, because a sub-string must be a prefix of a suffix!

Being able to to sub-string search in a string in time complexity proportional to the length of the search pattern is simply amazing. Without the suffix tree, one must at least scan the length of the text. For huge text (e.g. DNA), that would dominate the running time! Having a suffix tree on hand is extremely handy, and it takes only linear time to construct - which most algorithm have to spend (asymptotically) anyway.

In the last part, we mentioned about trie and its superiority above other data structures. There is some inefficiency associated with the data structure though - it is the memory consumption.

Notice that we have a node per character. That implies we need a pointer for each character in the tree, assuming a one byte character (or sometimes more), we need 4 bytes for the pointers in x86 and 8 bytes for pointers on amd64, this is a clear waste of space.

To save space (and for time complexity later on), we introduce compression. For example, the compressed trie of the same key-value pair will be illustrated as follow:

(Hello, 5)
(Happy, 5)
(Halloween, 9)

will be represented as a compressed trie like this

H->(ello,4)
->a->(ppy, 3)
->(lloween, 9)

In general, each node could be a string instead of a single character. For each subtree, however, should always split with different characters, so the class definition look like this

class CompressedTrieNode
{
}

{
CompressedTrieNode* child;
}

The introduction of CompressedTrieLink is just to make the intent clear. It is just pair<string, CompressedTrieNode*>. Note that the tree depth is reduced, and this can be used to our advantage later on in suffix trees.

See below for some sample code on how to work with a compressed trie

With the code, we can generate the visualization of the compressed trie as follow

Compared with the tree last time, it is significant memory saving.

To explain what a suffix tree is, I need to explain what is a trie and how to compress it. So let's start there.

Consider an abstract dictionary like this:

class Dictionary<TKey, TValue>
{
TValue Get(TKey key)
}

Don't worry about the syntax for now, it is just pseudo code.

To realize a dictionary, there are many different ways. One can build a simple association list, or a binary search tree, or a hash table.

Now we consider a special case where the keys are strings. It is very common for us to use string as key. String has a sub-structure, it is a sequence of characters. Is it possible for us to use that sub-structure to build a better data structure?

Yes - and that is a trie. Consider these string value pairs and the associated trie will explain how a trie works.

(Hello, 5)
(Happy, 5)
(Halloween, 9)

The tree looks like

H->e->l->l->(o,5)
->a->p->p->(y, 5)
->l->l->o->w->e->e->(n,9)

With the tree, it is easy to imagine how would one implement the Add and Get method out of it. In code, the tree look like this

class TrieNode
{
map<char, TrieNode*> children;
TValue* value; // non-null if this is a leaf.
}

The data structure allows a worst case bound for Add and Get to have time proportional to the length of the key. This cannot be done with association list, binary search tree or hash table. That shows to superiority of this data structure compared to them.

See here for some sample code for work with a Trie.

Using the code, we can generate a visualization of the trie with our example string as follow:

This is probably the largest tree we will see, in the next post we will compress it.

Monday, June 15, 2015

LeetCode OJ - Invert Binary Tree

Problem:

Solution:

This problem is trivial - invert the two sub trees recursively, and then swap them.

#include "stdafx.h"

// https://leetcode.com/problems/invert-binary-tree/

#include "LEET_INVERT_BINARY_TREE.h"
#include <iostream>
#include <algorithm>

using namespace std;

namespace _LEET_INVERT_BINARY_TREE
{
struct TreeNode
{
int val;
TreeNode *left;
TreeNode *right;
TreeNode(int x) : val(x), left(NULL), right(NULL) {}
};
class Solution {
public:
TreeNode* invertTree(TreeNode* root) {
if (root == NULL)
{
return root;
}
else
{
invertTree(root->left);
invertTree(root->right);
swap(root->left, root->right);
return root;
}
}
};
}

using namespace _LEET_INVERT_BINARY_TREE;

int LEET_INVERT_BINARY_TREE()
{
TreeNode a(4);
TreeNode b(2);
TreeNode c(7);
TreeNode d(1);
TreeNode e(3);
TreeNode f(6);
TreeNode g(9);
a.left = &b;      a.right = &c;
b.left = &d;      b.right = &e;
c.left = &f;      c.right = &g;
d.left = NULL;    d.right = NULL;
e.left = NULL;    e.right = NULL;
f.left = NULL;    f.right = NULL;
g.left = NULL;    g.right = NULL;
Solution solution;
solution.invertTree(&a);
cout << (a.left == &c) << endl;
cout << (a.right == &b) << endl;
cout << (c.left == &g) << endl;
cout << (c.right == &f) << endl;
cout << (b.left == &e) << endl;
cout << (b.right == &d) << endl;
return 0;
}

Saturday, June 13, 2015

I have written a BinomialHeap in Haskell!

Wednesday, June 10, 2015

Problem

Solution:

Please see the solution as follow:

Leveraging the previous problem I just compute the primes once and build the tree once. It results in a blazing fast algorithm, it can check the goldbach conjecture is valid for up to 10,000,000 in less than a second.

Problem

Solution:

Please see the solution as follow:

This problem is known as 2-sum problem. Given a sorted list of numbers find two of them that sums to a particular value. Leveraging the order we already have, we fix one value and binary search the another. For simplicity I reused the AVL tree I had.

Problem

Solution:

Please see the solution as follow:

A straightforward implementation just filtering out the primes with Problem 31.

Tuesday, June 9, 2015

Problem

Solution:

The new version is faster on large input, totient 30624700 hangs, while phi 30624700 outputs 12249840 almost instantly.
This can be explained as totient is running the Euclidean algorithm 30624700 times, while phi is doing $\sqrt {30624700}$ trial divisions only.

Problem

Solution:

Please see the solution as follow:

A straightforward implementation of the formula.

Problem

Solution:

Please see the solution as follow:

Reusing Problem 35 - this is just folding the list of prime factors.

Problem

Solution:

Please see the solution as follow:

Brute force factoring - just try each number in the range just like the brute force prime detection. Care is taken to make sure we don't try the same factor again once we know it does not divide.

Problem

Solution:

Please see the solution as follow:

I know, there is more efficient way to compute Euler's Totient function - but let's stick with this simple method for now - in the coming problems we will optimize it.

Problem

Solution:

Please see the solution as follow:

Leveraging GCD, coprime is a piece of cake - easy composition. Wonder if it is possible to consider Either Int String as a Monad and lift the == operator?

Problem

Solution:

Please see the solution as follow:

Standard implementation for Euclidean algorithm - 0 and negatives are handled properly

Problem

Solution:

Please see the solution as follow:

A simple brute force prime detector - test up to ceiling of square root. There are faster prime detection algorithm, but with the input range I am interested in I don't need it right now. Be careful with invalid inputs and the special case for 2. Ceiling of square root of 2 is 2 and we should not test 2 % 2.

Wonder where is problem 29 and 30? There are no such problems! There are gaps in the Haskell 99 problems.

Monday, June 8, 2015

Problem

Solution:

Please see the solution as follow:

Initially I thought this is a simple sorting problem, so I used quicksort. But there is a catch, isn't there always a catch? Turn out the expectation requires a stable sort, so I implemented mergesort as well. Be careful with the key equals case in merge, in order to be stable, we need to make sure we pick the value from the first list in that case to be stable.

Problem

Solution:

Please see the solution as follow:

While the code is concise - it needs to be explained (even to myself). First, a helper function combineWithRest is defined so that instead of just computing the combinations, the function also capture the rest of the list that is not in the combination.

To ease the description, Let's call [a] a combination (an example would be [1,2]), [[a]] a group (an example would be [[1,2],[3, 4]]. The basic algorithm is to first combination size requirements. For each combination, we then use the rest of the list to form groups with the remaining combination size requirements. To get the correct groups, we append the first combination to the groups. Therefore, we get a list of group per combination, and therefore we concatenate them to get what we need.

Problem

Solution:

Please see the solution as follow:

Now we switch gear to combinatorial problems. This is really the sweet spot of functional programming. The program is also the same as its definition. Writing such a thing is an imperative language would span much more lines.

Sunday, June 7, 2015

In order to optimize Haskell 99 problem 23, I built an AVL tree in Haskell as follow:

This is pretty standard, the implementation gears towards actual usage in the problem. For example, there isn't a contains function or deleteByValue function. They can be added easily if wished.

Saturday, June 6, 2015

Problem

Solution:

Please see the solution as follow:

Yet another easy composition.

Problem

Solution:

Please see the solution as follow:

With the previous problems - this is an easy composition.

Friday, June 5, 2015

Problem

Solution:

Please see the solution as follow:

Moving forward, we are going to do something cool with random numbers. This problem is about generating a random selection without replacement. The original algorithm used a list to represent the remaining items, so every time we need to pick an item it takes $O(n)$ time, and therefore a total of $O(kn)$ time, if $k = n$, then we have $O(n^2)$ time. With the current version, I used an AVL tree to represent the items, so it takes an overall $O(k \log n)$ time for the selection, but remember we need to populate the tree at the first place, so an overall $O(n \log n)$ time.

Thursday, June 4, 2015

Problem

Solution:

Please see the solution as follow:

What else would you expect for this one?

Problem

Solution:

Please see the solution as follow:

Yet another easy one - this worked very much like split in problem 20. The secret is that the index is the index of the inserted element.

Problem

Solution:

Please see the solution as follow:

Another easy one - this worked very much like split in problem 17.

Problem

Solution:

Please see the solution as follow:

With the hint, this is a easy one. Loved the definition of mod in Haskell that always output positive number when the modulus is positive.

Problem

Solution:

Please see the solution as follow:

Fixed my laziness problem - now I output different error messages - in this case I sort of have to. This make me ponder one thing - is the parameter name of a function part of the signature that the caller can see? Does q < r some message that the user can make sense of?

Problem

Solution:

Please see the solution as follow:

The basic recursion is to take the first element p and reduce the split index q and recurse, then just add back p in the first element of the sub-problem answer.

Was a bit lazy - didn't bother to differentiate different types of index out of range error.

Problem

Solution:

Please see the solution as follow:

This problem is similar to the previous one, I used one more integer parameter to keep track of how many elements we skipped.

Problem

Solution:

Please see the solution as follow:

I could have use map, replicate and ++ to implement this. But I think using ++ is not good, we will repeatedly scan built string. So I used a more direct algorithm for this. It appears that I am not using tail recursion, but with Haskell's lazy evaluation, I think that is not a problem, because the rest of the list will not be evaluated until I want more element, not sure if that's is right though.

Problem

Solution:

Please see the solution as follow:

I read ahead on problem 15 and noticed that repli is a generalization of, dupli and therefore we can use repli to implement dupli as a special case, so I did.

Wednesday, June 3, 2015

Problem

Solution:

Please see the solution as follow:

This problem is funny, actually I was not using Problem 9 for Problem 10, so this is basically the combination of my problem 10 and problem 11 code. Problem 10, however, is now edited to base on Problem 9 to fit the story :)

Problem

Solution:

Please see the solution as follow:

Another standard use of map with fold. I could have use replicate, but writing a function worked just fine.

Problem

Solution:

Please see the solution as follow:

Reusing Q10, this is really just a change in representation, ideal for using map.

Tuesday, June 2, 2015

Problem

Solution:

Please see the solution as follow:

This is a relatively simple solution based on problem 9. As I really hated error, I have to wonder, is it possible to specialize list so that it is impossible to have empty list?

Creating a new data type called NonEmptyList is pretty trivial

data NonEmptyList a = Single a | Cons a (NonEmptyList a)

That's it. We could have use that instead.

Problem

Solution:

Please see the solution as follow:

This look very similar to the previous problem and therefore the skeleton and test are almost identical.

Problem

Solution:

Please see the solution as follow:

Intuitively, I think of fold in this case. Pictorially, it does look like folding, isn't it? Ironically, in fact, almost all previous problem could be coded using fold instead, but I didn't.

I personally feel like fold does reduce the number of characters sometimes, but it less readable. Reading code involve understanding, and understanding is a pretty personal thing. There might not be an objective measure of readability at all, so the main problem really is, to fold or not to fold?

Problem

Solution:

Please see the solution as follow:

My original solution involved using the ++ operator, which is not optimal. Now I am using difference list. The problem with list is whenever a list a created, it cannot be appended anymore. So instead of having a list created, we create a function to produce a list. that function takes the suffix of the list as an argument and produce the list as a result. That avoid creating the list and postpone it to the time until we actually want the list. Now composing these functions are concatenation and that takes constant time! .

Problem

Solution:

Please see the solution as follow:

This is a rather trivial problem given the previous one. Of course we should reuse.

Problem:

Solution:

Please see the solution as follow:

Accumulator pattern again, nothing too special. It is interesting, however, to note we can interpret a program written using the accumulator pattern in terms of iteration as follow

myreverse' [1,2,3] [] = myreverse' [2,3] [1] = myreverse' [3] [2,1] = myreverse' [] [3,2,1] = [3,2,1]

The code would not be as easy to interpret otherwise.

Problem:

Solution:

Please see the solution as follow:

The problem give me a chance to practice the accumulator pattern. A more intuitive approach to this problem is simply mylength (x:xs) = 1 + mylength xs. Notice that after the recursion call, there is something else to do (the increment by 1), in other words, the call is not tail recursive. Haskell can optimize tail recursive call by simply jumping to the beginning of the function, but for non tail recursive call that must be done by a real function call, which use up stack space.

The solution to this is essentially saving the information in the next function call's stack frame, as an accumulator.

Also notice how much more elegant the code is without error condition! I just love code that don't have error - wouldn't you?

Problem:

Solution:

Please see the solution as follow:

In this solution, I have a problem - using pattern matching alone, it is hard to differentiate the case where I have an empty list and at the same time the index is less than 0, and in that case I wanted to output a the not supported message instead of the element not found one.

To that end, I simply used a helper function. I could have used a nested if expression instead, but Haskell is all about elegant code and clarity, so I chose to use helper function instead.

Now the wrapper function a simple condition, therefore I can use guard expression instead of an if statement!

Problem:

Solution:

Please see the solution as follow:

The more one code with Haskell, the more one simply hate error conditions. The same idea apply from the last problem, but in this case we have to use Either a b instead of Maybe a to communicate more than 1 error conditions. The rest follow the same pattern pretty much.

Problem:

Solution:

Please see the solution as follow

The code needed some explanation. First unit test does not support error nicely, so instead of having the code throws error - it is easier to return a Maybe a instead. That's why I used myLastTest to return a Maybe a and unit tested that one. The public one used to fit the problem specification is simply a wrapper on top of it.

Next, the unit test derived assertEqual to assertEqualMaybeInt, this is a workaround that if I used assertEqual "" Nothing Nothing, that will not work because Haskell don't know how to show a Nothing (which has type Maybe a, with a unknown)

Problem:

Please find all the problem here.

Solution:

In this post, we start with the design of the test harness. Remember in the old days I made a project for each Project Euler problem - once it scales, there are something a hundred executable files. It is taking significant amount of space and hard to clean up there.

Haskell problems don't have this problem. As a matter of fact, as we are simply doing programming exercise, I don't even bother to compile them. They could be compiled, but why would I wanted to do that?

One might also ask - why bother writing about your own programming exercises? If I have learnt a lesson - time educated me that non-trivial code really needs documentation. It is hard to read, even my own code, after some times like a couple years, so better document now (right after writing code, sometimes even before)

So, let's get back to the test harness. Haskell code can be executed in GHCi, initially, I thought it is okay to just capture standard I/O and error of GHCi by piping file in and out, it worked, but less professional. I used HUnit instead.

A typical problem look like follow:

Most readers are probably like me, who is not familiar with Haskell. The code needed some explanations. First, to allow one file references another file, that file need to be a module. Second, a module's file name must match the file's name. The rest of the code is pretty trivial, of course, one need to learn Haskell to read the code. The module concept tripped me up, so I decided to write about it. If you knew Haskell, that is pretty trivial too.

I am learning Haskell through learn you a Haskell, this is a great resource, but sadly without programming exercises, and that's why I am working on all these problems.