Huffman encoding is one of the foundational algorithms in data compression — elegant, efficient, and entirely based on simple mathematical principles. It converts symbols and their frequencies into binary codes where the most frequent symbols receive shorter codes.
The algorithm is used in many compression systems, including ZIP and JPEG, and fits naturally into functional programming thanks to its recursive and tree-based nature.
In this project, I implemented Huffman coding in Haskell from scratch. The goal was to explore tree construction, recursion, and symbol encoding rather than optimize for absolute performance. Interestingly, depending on input characteristics, my approach sometimes generates a code table that uses slightly fewer bits than a classic implementation — but that is highly input-dependent.
Given a set of symbols and their frequencies:
[('a',45), ('b',13), ('c',12), ('d',16), ('e',9), ('f',5)]
The goal is to produce a binary encoding that minimizes the total weighted code length. A typical expected result might look like:
[('a',"0"), ('b',"101"), ('c',"100"), ('d',"111"), ('e',"1101"), ('f',"1100")]
However, multiple valid Huffman trees can exist for the same frequency table, all producing codes of the same or near-optimal length. My implementation produces a different — but still fully correct — encoding.
Here is the full implementation of the algorithm:
data Tree a b = Leaf b | Node a (Tree a b) (Tree a b) deriving (Show)
huffmanTree :: [(Char, Int)] -> [(Char, [Char])]
huffmanTree xs =
let (fs, hs) = subHuffmanTreePrepare xs [] []
ts = subHuffmanTree1 fs hs []
newts = subHuffmanTree2 ts
in subHuffmanTree3 newts []
subHuffmanTreePrepare :: [(Char, Int)] -> [Int] -> [Char]
-> ([Int], [Char])
subHuffmanTreePrepare [] fs hs = (fs, hs)
subHuffmanTreePrepare (x:xs) fs hs =
let f = snd x
val = fst x
in subHuffmanTreePrepare xs (f:fs) (val:hs)
subHuffmanTree1 :: [Int] -> [Char] -> [(Tree Int Char)]
-> [(Tree Int Char)]
subHuffmanTree1 [] _ ts = ts
subHuffmanTree1 fs hs ts =
let minxs = myMinN2 fs 2
(min1, min2) = getMins minxs
(newfs3, newhs3, newts3) = if (min1 /= -1) `myAnd` (min2 /= -1)
then let (newfs, newhs, newts) = subHuffmanTreePair fs hs ts [min1, min2]
ids1 = grep2 min1 newfs
ids2 = grep2 min2 newfs
valxs = filter (\(x, y) -> x /= (-1)) [(ids1, min1), (ids2, min2)]
(newfs2, newhs2, newts2) = subHuffmanTreeSingle newfs newhs newts valxs
in (newfs2, newhs2, newts2)
else let ids1 = grep2 min1 fs
ids2 = grep2 min2 fs
valxs = filter (\(x, y) -> x /= (-1)) [(ids1, min1), (ids2, min2)]
(newfs2, newhs2, newts2) = subHuffmanTreeSingle fs hs ts valxs
in (newfs2, newhs2, newts2)
in subHuffmanTree1 newfs3 newhs3 newts3
subHuffmanTreePair :: [Int] -> [Char] ->
[(Tree Int Char)] -> [Int] ->
([Int], [Char], [(Tree Int Char)])
subHuffmanTreePair fs hs ts minxs
| length minxs == 2 =
let [(min2, idx2), (min1, idx1)] = myMinIdxN fs 2
chr1 = (hs !! idx1)
chr2 = (hs !! idx2)
ftree = if min1 >= min2
then Node (min1 + min2) (Leaf chr2) (Leaf chr1)
else Node (min1 + min2) (Leaf chr1) (Leaf chr2)
newts = ts ++ [ftree]
(newfs, newhs) = if idx2 > idx1
then (deleteListElemn fs [idx1, idx2],
deleteListElemn hs [idx1, idx2])
else (deleteListElemn fs [idx2, idx1],
deleteListElemn hs [idx2, idx1])
newminxs = myMinN2 newfs 2
in subHuffmanTreePair newfs newhs newts newminxs
| otherwise = (fs, hs, ts)
subHuffmanTreeSingle :: [Int] -> [Char] ->
[(Tree Int Char)] -> [(Int, Int)] ->
([Int], [Char], [(Tree Int Char)])
subHuffmanTreeSingle fs hs ts [] = (fs, hs, ts)
subHuffmanTreeSingle fs hs ts [(_, minx)] =
let idx = closerIdxSpe minx ts
(Node sm l r) = (ts !! idx)
chrvl = (hs !! idx)
newtree = if minx >= sm
then (Node (sm + minx) (Node sm l r) (Leaf chrvl))
else (Node (sm + minx) (Leaf chrvl) (Node sm l r))
newts = updateListElem ts idx newtree
newfs = deleteListElem fs idx
newhs = deleteListElem hs idx
in (newfs, newhs, newts)
subHuffmanTree2 :: [(Tree Int Char)] -> (Tree Int Char)
subHuffmanTree2 [x] = x
subHuffmanTree2 ((Node x1 l1 r1):xs) =
let idx = closerIdxSpe x1 xs
(Node x2 l2 r2) = (xs !! idx)
tree = if x1 >= x2
then Node (x1 + x2) (Node x2 l2 r2) (Node x1 l1 r1)
else Node (x1 + x2) (Node x1 l1 r1) (Node x2 l2 r2)
newxs = updateListElem xs idx tree
in subHuffmanTree2 newxs
subHuffmanTree3 :: (Tree Int Char) -> [Char] -> [(Char, [Char])]
subHuffmanTree3 (Leaf chr) hs = [(chr, hs)]
subHuffmanTree3 (Node _ l r) hs = subHuffmanTree3 l (hs ++ ['0']) ++ subHuffmanTree3 r (hs ++ ['1'])
Running the function gives:
ghci> huffmanTree [('a',45), ('b',13), ('c',12), ('d',16), ('e',9), ('f',5)]
[('f',"000"),('e',"001"),('c',"010"),('b',"011"),('d',"10"),('a',"11")]
This output differs from the standard example, but still follows all Huffman properties:
The implementation builds the Huffman tree in stages:
Because Haskell functions are pure and recursive, each transformation builds a new tree without mutating any previous state — a great example of how declarative logic naturally models hierarchical structures.
Building a Huffman encoder from scratch in Haskell is a rewarding exercise. It demonstrates how to express complex algorithms through pure functions and recursion, while also deepening understanding of how trees, weights, and ordering affect encoding efficiency.
While my version can produce slightly different bit distributions depending on the input, it always generates a correct Huffman encoding that respects prefix-free structure and optimal compression logic.
In the end, this is a great example of how mathematical ideas and functional programming come together beautifully in Haskell.