Why Parser were invented ?

A lot of text-processing tools can get surprisingly far without a real parser.

For simple patterns, tools like sed, awk, or small ad-hoc string transformations are often enough. You match a fragment, replace it, split on a delimiter, maybe scan left to right, and the job is done. For flat data, this is not only acceptable — it is often the most direct and elegant solution.

But this approach starts to break down when the input has structure.

As soon as you have nesting, precedence, escaping rules, ambiguous symbols, or context-dependent meaning, the input is no longer “just a string”. It becomes a small language. At that point, continuing to manipulate characters directly means you are building a parser anyway — just an implicit, fragile, and usually inefficient one.

This article is about discovering that boundary.

We start with a calculator implemented through direct string manipulation: finding parentheses, replacing subexpressions, normalizing operators, and repeatedly scanning the input. It works, and that is precisely why it is interesting. Then we move toward the cleaner architecture: tokenize the input, give names to the pieces, and evaluate the structure according to clear rules.

That shift is the real lesson: for simple text, string logic is enough. For structured input, parsers are not academic overhead. They are the natural tool.

Context

In this project, we build a complete calculator in Haskell that can evaluate normal arithmetic expressions written in parenthesis notation (e.g. (3+5)*(2-(4/2))) as string of course. This goes far beyond Reverse Polish Notation: we now need to handle nested parentheses, operator precedence, and negative values, all from scratch.

This is a programm i like to do when discovering a language because algorithmically speaking it ia not really eleguant, so here we test the elegantness of a language measured on how much we can condense code.

Boring Fundamental functions

Here we got some fundamentals function haskell base does not provide so we reimplement it.

So, first random acess.

  
  

getRangeList :: [a] -> [Int] -> [a]
getRangeList [] _ = []
getRangeList _ [] = []
getRangeList xs (idx:ids) = (xs !! idx):(getRangeList xs ids)

So we got that for example.

  
  

ghci> getRangeList [1, 2, 3] [2, 1]
[3,2]

Here, just the grep command -> all indices of elements in a data structure that matches a pattern

Of course elements must support equality check Eq.

  
  

grepn2 :: (Eq a) => a -> [a] -> [Int]
grepn2 cmp xs = subGrepn2 xs cmp 0 []

subGrepn2 :: (Eq a) => [a] -> a -> Int -> [Int] -> [Int]
subGrepn2 [] _ _ nxs = nxs
subGrepn2 (x:xs) cmp n nxs
    | cmp == x  = subGrepn2 xs cmp (n + 1) (n:nxs)
    | otherwise = subGrepn2 xs cmp (n + 1) nxs

grepmn2 :: (Eq a) => [a] -> [a] -> [Int]
grepmn2 [] _ = []
grepmn2 (x2:xs2) xs = (grepn2 x2 xs) ++ (grepmn2 xs2 xs)

So we got that for example.

  
  
ghci> grepn2 "A" ["A", "B", "A", "C", "A"]
[4,2,0]

The infamous max.

Then elements must be ordonable Ord.

  
  

myMax :: (Ord a) => [a] -> a
myMax xs = subMyMax xs (head xs)

subMyMax :: (Ord a) => [a] -> a -> a
subMyMax [] cmp = cmp
subMyMax (x:xs) cmp = 
    let cmp2 = if cmp >= x
              then cmp
              else x
    in subMyMax xs cmp2

Then, we go that for example:

  
  
ghci> myMax [2, 4, 6, 1, 0]
6

Naive version

Parenthesis tokenization

Now, te fun begins.

Basically we need a way to tokenize parenthesis, then find thei index in the string then compute the result for the operations insde the prioritary nested parenthesis, the reconstruct the base string with this new result --> replacing the parethesis and its content already computed by its result.

Until there is no more matching operation (+, -, / *) or and parenthesis.

First, we need a parenthesis tokenizer that will take a string and output a pair of 2 lists.

One will give the index of each parethesis ( and ) in the string.

And the second list will be the same length of the first list but will encode the pair information.

For example:

  
  
[0, 0, 1, 2, 2, 1]

Will descrobe the following structure:

  
  
(...)(...(...)...)

The number of pairs is then length of the list / 2.

So at first abstraction layr we just ge the entry point.

  
  

parserPar :: [Char] -> ([Int], [Int])
parserPar xs = subParserPar xs [] [] [] 0 0

xs is the string.

But we also provide 3 empty lists an 2 integers, why is that ?

Because it calls subParserPar which does all the parsing work essentially.

  
  

subParserPar :: [Char] -> [Int] -> [Int] -> [Int] -> Int -> Int
                -> ([Int], [Int])
subParserPar [] ids nums _ _ _ = (ids, nums)
subParserPar (x:xs) ids nums valxs n n2
    | x == '(' = 
        let newids = ids ++ [n]
            newnums = nums ++ [n2]
            newvalxs = map (\x -> x + 1) valxs
            newvalxs2 = newvalxs ++ [1]
        in subParserPar xs newids newnums newvalxs2 (n + 1) (n2 + 1)
    | x == ')' = 
        let newvalxs = map (\x -> x - 1) valxs 
            idx = findFirstZero (reverse newvalxs) 0
            idx2 = (length valxs) - idx - 1
            newids = ids ++ [n]
            newnums = nums ++ [(nums !! idx2)]
        in subParserPar xs newids newnums (newvalxs ++ [0]) (n + 1) n2
    | otherwise = subParserPar xs ids nums valxs (n + 1) n2

You may directly note that n is just the index of each parenthesis -> idx.

Always incrementing and just taken into account for the index list when either '(' is discovered with the construction of ids:

  
  

let newids = ids ++ [n]

Then inputed in the recursive call.

  
  
subParserPar xs newids newnums newvalxs2 (n + 1) (n2 + 1)

And when the function matches the character ')'.

  
  

newids = ids ++ [n]

And them same logic cals it with newly constructed ids.

And by defaut it just increments it.

  
  

otherwise = subParserPar xs ids nums valxs (n + 1) n2

Now the list that encodes pair -> nums.

Ok this one changes when we encounter an opening parenthesis.

  
  

newnums = nums ++ [n2]

It just appends n2, n2 only increments when a new opening parenthesis is found, so like that a new pair is a new opening parenthesis.

  
  

in subParserPar xs newids newnums newvalxs2 (n + 1) (n2 + 1)

But now the real work !

How to decide which pair is what ?

I mean we have to get a third (hidden) data structure / list that maintain all the logic, that's why we passed 3 lists and not only 2 to subParserPar.

  
  

parserPar xs = subParserPar xs [] [] [] 0 0

Yess, its name is valxs.

The logic to find parenthesis pais is the following.

When an opening parenthesis is found, just append 1 to the list.

If there were already opening OR ending parenthesis discovered, then just increments them by 1.

So here we distinguish the deph level by always incrementing one to those alredy discovered -> We maintain order.

But, when we discover an ending parenthesis, then semantically speaking it must be attached to an opening parenthesis.

So we just substract 1 to all discovered parenthesis and the most recent one where the result is equal to 0 is the opening parenthesis it pairs to. (if the structure is correct which is the assumption for the function)

That is the steps.

  
  

let newvalxs = map (\x -> x - 1) valxs

So here because we blindly mapped the substraction by 1, we must match 0, meaning get the index of the first 0 in the data structure.

In fact that's more lie a reverse match, because we need to get the index of last 0.

That's why we got:

  
  

idx = findFirstZero (reverse newvalxs) 0

idx is the index indexed on the reverse list so we must take the symmetric one.

  
  

idx2 = (length valxs) - idx - 1

findFirstZero is just a simple match with 0 as the matching pattern.

  
  

findFirstZero :: [Int] -> Int -> Int
findFirstZero (xi:xsi) n
              | xi == 0 = n
              | otherwise = findFirstZero xsi (n + 1)

That's all for the tokenization.

Proto computation

First, not even talking about parenthesis, we got to handle PEMDAS, that' just the prioritary operators.

P -> Parenthesis, already handled
E -> expoent
M -> multiplication
D -> division
A -> addition
S -> substraction

Also here we will do convertions from string [Char] to numeric value so we can compute the resulst of each computation.

For this we use read ... :: Double.

So this works:

  
  
ghci> calc "(4.5+5.6) * 3/2"
"15.149999999999999"

First we need one way to compute additions and substraction that are on the same prioritary level, done from left to right.

  
  

subProtoCalc2 :: [Char] -> [Char] -> Int -> [Char]
subProtoCalc2 [] outxs _ = outxs
subProtoCalc2 (x:xs) outxs n
    | x == '+' =
            let val1raw = read . reverse $ takeBack2 (reverse outxs) 0 :: Double
                val2 = read $ takeBack2 xs 0 :: Double

                newoutxsRaw = reverse $ takeTailN2 (reverse outxs) 0

                (newoutxs, val1) =
                    if newoutxsRaw == "-"
                    then ("", -val1raw)
                    else (newoutxsRaw, val1raw)

                newxs = takeTailN2 xs 0

            in subProtoCalc2 newxs (newoutxs ++ show (val1 + val2)) (n + 1)

    | x == '-' && n /= 0 =
            let val1raw = read . reverse $ takeBack2 (reverse outxs) 0 :: Double
                val2 = read $ takeBack2 xs 0 :: Double

                newoutxsRaw = reverse $ takeTailN2 (reverse outxs) 0

                (newoutxs, val1) =
                    if newoutxsRaw == "-"
                    then ("", -val1raw)
                    else (newoutxsRaw, val1raw)

                newxs = takeTailN2 xs 0

            in subProtoCalc2 newxs (newoutxs ++ show (val1 - val2)) (n + 1)

    | otherwise =
            subProtoCalc2 xs (outxs ++ [x]) (n + 1)

n here is just the index that always increments to go to the next Char.

But it is not used for accessing char, for that we got data structure decomposition (x:xs).

It just tells wether we should dtop the computation or not.

  
  

| x == '-' && n /= 0 =

Here, it just says that - is an operator ans not just describing the value.

This function handles this type of computation "A OPERATOR B".

That's why we got:

  
  

takeBack2 :: [Char] -> Int -> [Char]
takeBack2 [] _ = []
takeBack2 (x:xs) n 
    | not (x `elem` "+-*/") = (x:takeBack2 xs (n+1))
    | otherwise = if n == 0 then (x:takeBack2 xs (n+1)) else []

This just takes the first numeric value:

  
  

ghci> takeBack2 "-4+6" 0
"-4"

And used in this context.

  
  

let val1raw = read . reverse $ takeBack2 (reverse outxs) 0 :: Double
    val2 = read $ takeBack2 xs 0 :: Double

From 2 diferent sources.

Because remember when it does not match the operator + or -, it just constructs outxs with the Char it finds.

  
  

| otherwise = subProtoCalc2 xs (outxs ++ [x])

But when it matches an operator, it just takes the first value, so the one already found in outxs and since it appended it, it must reverse it first and then input it in takeBack2.

Because it used val1 for the computation, then this value must be removed from outxs.

  
  

newoutxsRaw = reverse $ takeTailN2 (reverse outxs) 0

With:

  
  

takeTailN2 :: [Char] -> Int -> [Char]
takeTailN2 [] _ = []
takeTailN2 (x:xs) n
    | not (x `elem` "+-*/") = takeTailN2 xs (n+1)
    | otherwise = if n == 0 then takeTailN2 xs (n+1) else  x:xs

Note, that here takeTailN2 and takeBack2 considers the sign of the value, that's important nd simplify A LOT of algorithmic work.

Same thing for newxs, because val2 has been used.

  
  

newxs = takeTailN2 xs 0

And we replace val1 in outxs by the result for next recursive call.

  
  

in subProtoCalc2 newxs (newoutxs ++ show (val1 - val2)) (n + 1)

Note that here it is not exactly val1, why ?

Because we need to check its sign, meaning if it comes with "-" attcahced as a sign and not an operator, meaning that "-" must be alone in newoutxsRaw (which is newoutxs after taking the last numeric value).

That's why we got:

  
  

(newoutxs, val1) =
    if newoutxsRaw == "-"
    then ("", -val1raw)
    else (newoutxsRaw, val1raw)

Execution model

Do you know where subprotoCalc2 lives in the model ?

For now i can not tell you a clear view of the model but i want you to understand that is must executes just after subProtocalc which does the same thing than subProtocalc2 but with * and /.

That is totaly normal, because they are prioritary operators.

Here it is.

  
  

subProtoCalc :: [Char] -> [Char] -> [Char]
subProtoCalc [] outxs = outxs
subProtoCalc (x:xs) outxs
    | x == '*' =
            let val1 = read . reverse $ takeBack2 (reverse outxs) 0 :: Double
                val2 = read $ takeBack2 xs 0 :: Double
                newoutxs = reverse $ takeTailN2 (reverse outxs) 0
                newxs = takeTailN2 xs 0
            in subProtoCalc newxs (newoutxs ++ show (val1 * val2))

    | x == '/' =
            let val1 = read . reverse $ takeBack2 (reverse outxs) 0 :: Double
                val2 = read $ takeBack2 xs 0 :: Double
                newoutxs = reverse $ takeTailN2 (reverse outxs) 0
                newxs = takeTailN2 xs 0
            in subProtoCalc newxs (newoutxs ++ show (val1 / val2))

    | otherwise =
            subProtoCalc xs (outxs ++ [x])

Note that it follows the same structure than subProtoCalc2 btut does not need to check if - is an operator or just describing the value because it just cares about * and /.

Here is protoCalc, see the execution model respecting PEMDAS ?

  
  

protoCalc :: [Char] -> [Char]
protoCalc xs =
    let step0 = clearOperator xs

        step1 = subProtoCalcIdentity step0 []
        step2 = clearOperator step1

        step3 = subProtoCalcExponent step2 []
        step4 = clearOperator step3

        step5 = subProtoCalc step4 [] -- we are here
        step6 = clearOperator step5

        step7 = subProtoCalc2 step6 [] 0
    in clearOperator step7

In fact you see that we are first computing result of multiplication and division operations.

Clearing Operators

Because each computation step can lead to a switch of sign for each valuee it has tom compute we need to normalize the string with clearOperator.

  
  

clearOperator :: [Char] -> [Char]
clearOperator [] = []
clearOperator [x] = [x]
clearOperator (x1:x2:xs)
    | x1 == '+' && x2 == '-' = clearOperator ('-':xs)
    | x1 == '-' && x2 == '+' = clearOperator ('-':xs)
    | x1 == '+' && x2 == '+' = clearOperator ('+':xs)
    | x1 == '-' && x2 == '-' = clearOperator ('+':xs)
    | otherwise = x1 : clearOperator (x2:xs)

Explicit enough, if 2 consecutives '-' -> '+'.

If '+' then '-' or inversely -> '-'.

If not of the above -> '+'.

Going back to the execution model

For example, look what it does:

  
  

ghci> subProtoCalc "3+4*3-3/2" []
"3+12.0-1.5"

And after that we are just passing the result that now jst contains at most addition and / or substraction operators to subProtocalc2.

  
  

ghci> subProtoCalc2 (subProtoCalc "3+4*3-3/2" []) [] 0
"13.5"

But, can we go further, because in PEMDAS, that's not explicitely said, but value identity is absolutely prioritary.

What i mean by that is that EXP(X), LOG(X) or even Factorial !... must first be substituated by the value.

Because here that's not properly an operator but rather part of the value identity.

And between identity and multiplication and division, we need to compute the exponentiation.

We do this with:

  
  

subProtoCalcExponent :: [Char] -> [Char] -> [Char]
subProtoCalcExponent [] outxs = outxs
subProtoCalcExponent (x:xs) outxs
    | x == '^' = 
        let val1 = read . reverse $ takeBack2 (reverse outxs) 0 :: Double
            val2 = read $ takeBack2 xs 0 :: Double
            newoutxs = reverse $ takeTailN2 (reverse outxs) 0
            newxs = takeTailN2 xs 0 
        in subProtoCalcExponent newxs (newoutxs ++ (show (val1**(val2))))
    | otherwise = subProtoCalcExponent xs (outxs ++ [x])

Example.

  
  

ghci> subProtoCalcExponent "1+3*4+4^2" []
"1+3*4+16.0"

Or:

  
  

ghci> subProtoCalcExponent "1+3*4+4^-1.1" []
"1+3*4+0.217637640824031"

So, now identity !

  
  

subProtoCalcIdentity :: [Char] -> [Char] -> [Char]
subProtoCalcIdentity [] outxs = outxs
subProtoCalcIdentity (x:xs) outxs
    | x == 'E' = 
        let val = read $ takeBack2 xs 0 :: Double
            newxs = takeTailN2 xs 0
        in subProtoCalcIdentity newxs (outxs ++ printf "%8f" ((exp(val)) :: Double) :: String)
    | x == 'L' = 
        let val = read $ takeBack2 xs 0 :: Double
            newxs = takeTailN2 xs 0
        in subProtoCalcIdentity newxs (outxs ++ printf "%8f" ((log(val)) :: Double) :: String)
    | x == '!' = 
        let val = read $ takeBack2 xs 0 :: Int
            newxs = takeTailN2 xs 0
        in subProtoCalcIdentity newxs (outxs ++ show (factorial val))
    | otherwise = subProtoCalcIdentity xs (outxs ++ [x])

Here, for convenience, the factorial expression is just !X and not X!.

Factoial function is just.

  
  

factorial :: Int -> Int
factorial 1 = 1
factorial n = n * factorial (n - 1)

Example:

  
  

ghci> subProtoCalcIdentity "1+3*4+4^-1.1+E4-!5" []
"1+3*4+4^-1.1+54.598150033144236-120"

Final Pipeline

So, now we got the pipeline.

But the missing piece is the identification of prioritary computation inside parenthesis.

For that, we'll use the encoding that parserPar gave us.

  
  

calc :: [Char] -> [Char]
calc xs = 
    let (ids, nums) = parserPar xs
        newxs = subCalc xs ids nums
    in protoCalc newxs

calc is the entry fucntion, the function that is exposed to the user.

But, here the important function is subCalc.

It takes the result of the toenization of parenthesis and the string to compute from.

  
  

subCalc :: [Char] -> [Int] -> [Int] -> [Char]
subCalc xs [] [] = xs
subCalc xs ids nums =
    let curmax = myMax nums
        [id1, id2] = grepn2 curmax nums
        idstrt = ids !! id2
        idstop = ids !! id1

        xsstrt = if idstrt > 0
                 then getRangeList xs [0..(idstrt - 1)]
                 else []

        xsstop = if idstop + 1 < length xs
                 then getRangeList xs [(idstop + 1)..(length xs - 1)]
                 else []

        xsbetween = getRangeList xs [(idstrt + 1)..(idstop - 1)]
        rslt = protoCalc xsbetween

        newxs = xsstrt ++ rslt ++ xsstop

        (newids, newnums) = parserPar newxs
    in subCalc newxs newids newnums

So, what's nice about the encoding of parenthesis is nums, because it allows to identify most prioritary parenthesis, which are the most nested one, which are the one with the maximum value of pair.

So we just grep the index of the current maximum pair value inside nums.

Here.

  
  

let curmax = myMax nums
    [id1, id2] = grepn2 curmax nums

Then we use those index to get the position of the parenthesis with ids, so we got the associated computation.

  
  

idstrt = ids !! id2
idstop = ids !! id1

xsstrt = if idstrt > 0
         then getRangeList xs [0..(idstrt - 1)]
         else []

xsstop = if idstop + 1 < length xs
         then getRangeList xs [(idstop + 1)..(length xs - 1)]
         else []

xsbetween = getRangeList xs [(idstrt + 1)..(idstop - 1)]

And now we get its result.

  
  

rslt = protoCalc xsbetween

And replace it in the new string to compute before reitering on it.

It means reparsing the parenthesis and repassing the string to the fucntion recursively.

  
  

        newxs = xsstrt ++ rslt ++ xsstop

        (newids, newnums) = parserPar newxs
    in subCalc newxs newids newnums

That's all.

And a very naive approach.

Benchmark & Results

Let's see how it preforms.

  
  

ghci> calc "-6+-(-7+E-3/0.2)*4"
"21.00425863264272"

Compared to native:

  
  

ghci> -6 + (-( -7 + exp (-3) / 0.2)) * 4
21.00425863264272

Great, now for bechmarks !

We'll compute 100K time this expression and see how fast it did it and how much it asked for allocation and how much the Haskell Garbage Collector was smart.

  
  

benchCalc :: Int -> String -> String
benchCalc 1 expr = calc expr
benchCalc n expr =
    let r = calc expr
    in r `seq` benchCalc (n - 1) expr

main :: IO ()
main = do
    let expr = "-6+-(-7+E-3/0.2)*4"
    let result = benchCalc 100000 expr
    putStrLn result

Now we compile.

  
  

❯ ghc -O2 -rtsopts calculator.hs -o calculator
Loaded package environment from /home/juju/.ghc/x86_64-linux-9.4.7/environments/default
[1 of 2] Compiling Main             ( calculator.hs, calculator.o ) [Source file changed]
[2 of 2] Linking calculator [Objects changed]

-rtsopts flag is for teling that the binary will accept RTS options, RTS is the Haskell Runtime System that manages GC, heap, stack, lightweight threads, statistics, profiling hooks, etc.

So when we run with.

  
  
❯ ./calculator +RTS -s

means:

  
  
./calculator        run my program
+RTS               from here, arguments are for GHC's runtime system
-s                 print GC/runtime allocation statistics

Usually you can end with -RTS:

  
  
./calculator +RTS -s -RTS

For my case, since there are no program arguments after -s, leaving out -RTS is fine.

The important relation is:

  
  

compiled with -rtsopts  ->  executable accepts +RTS -s
compiled without it     ->  many RTS options are rejected

GHC’s docs say that -rtsopts with no argument is equivalent to -rtsopts=all, enabling all RTS option processing, while the default without it is only some, meaning basically safe/minimal options like -? and --info

And in ghc compiler, -O2 is the maximum optimization flag.

So what are the results ?

  
  

❯ ./calculator +RTS -s -RTS
21.00425863264272
  19,459,337,528 bytes allocated in the heap
       4,652,384 bytes copied during GC
         107,416 bytes maximum residency (2 sample(s))
          29,400 bytes maximum slop
               6 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4671 colls,     0 par    0.018s   0.020s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.000s   0.000s     0.0002s    0.0003s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    4.022s  (  4.019s elapsed)
  GC      time    0.018s  (  0.020s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    4.041s  (  4.040s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,838,108,614 bytes per MUT second

  Productivity  99.5% of total user, 99.5% of total elapsed

Wow, 19 GB of allocations over the time of the programm running !!

  
  

  19,459,337,528 bytes allocated in the heap

But at peak moment, only 6MiB was in used.

  
  

  6 MiB total memory in use (0 MB lost due to fragmentation)

A tons of very short live object were created.

The Garbage Collector do a very little of work.

Only about 4Mib of data was moved by the GC over the time.

So only about 4MiB of data lived long enough to be moved by the GC.

  
  

4,652,384 bytes copied during GC

and its peak is arround.

  
  
         
107,416 bytes maximum residency (2 sample(s))

We also got:

  
  

29,400 bytes maximum slop

Which is maximum wasted heap space due to imperfect packing/alignment/block granularity.

Now, how to explaine the big difference betwen these two?

  
  

  19,459,337,528 bytes allocated in the heap

AND

  
  

  4,652,384 bytes copied during GC

Which is wasted/unused heap space inside allocated blocks, caused by the runtime’s block allocation granularity.

It is mostly GHC’s runtime system / generational GC discovering that at runtime.

For temporary reverse, takeBack2, ++, show, etc., the flow is more like:

Program allocates many temporary lists/strings.
They are used for a very short time.
After that, nothing points to them anymore.
Minor GC runs.
GC follows the live pointers from roots: stack, registers, globals, etc.
It copies only objects still reachable.
Everything not reachable is garbage.
The whole old nursery region can be reused.

So the GC does not say:

this reverse object is short-lived, free exactly this object

It says more like:

I found the few objects that are still alive.

Everything else in this nursery is dead.

Reset/reuse the nursery.

That’s why it’s so fast.

GHC assumes most new objects die young. This is usually true in functional programs because you create lots of temporary values during expression evaluation.

So in this case:

  
  
reverse outxs
takeBack2 xs 0
newoutxs ++ show value

probably creates tons of short-lived list cells and strings. But by the time GC checks, most of them are unreachable. Therefore:

allocated: huge
copied by GC: tiny

So the clean mental model is:

  
  

GHC compiler:
  tries to optimize away some allocations if possible

GHC runtime / GC:
  manages allocated heap objects
  keeps reachable objects
  discards unreachable nursery memory cheaply

Naive version, but better representation

Changing representation of List `[Char]`

A list in Haskell is basically:

  
  

data List a
    = Empty
    | Cons a (List a)

That we construct like that:

  
  

ghci> 1 : 2: 3 : []
[1,2,3]

That is in fact a linked list, not ideal for data compactnes and acessing speed.

So we can change the representation from standard list ByteString, which is an array of data.

Coming from:

  
  

import qualified Data.ByteString.Char8 as B
import Data.ByteString.Char8 (ByteString)

Note, here we are importing the Data.ByteString.Char8 "class" that provides usefull functio to work with ASCII characters that works well with ByteString, for convertions between [Char] etcetera.

All is scoped around B apart from the type itself it depends on with:

  
  

import Data.ByteString.Char8 (ByteString)

What's new ?

So for function signature, no need to do B.ByteString.

We keep the exact same architecture, just tweak the signature and the methods, for example from reverse to B.reverse etcetera.

Also, no need for custom getRangeList which is inneficient because need workng on linked list (overhead for acessing).

Here, we can directly do:

  
  

xsstrt = B.take idstrt xs
xsstop = B.drop (idstop + 1) xs

Example for drop method.

  
  

ghci> B.drop 3 $ B.pack "abcdefghikklmnopqrst"
"defghikklmnopqrst"

Example for take method.

  
  

ghci> B.take 3 $ B.pack "abcdefghikklmnopqrst"
"abc"

Also the construction of ByteString is different.

For example for appending a Char (with Char8 snoc method).

  
  

ghci> B.snoc (B.pack "abcde") 'f'
"abcdef"

And the constructor is cons:

  
  

ghci> B.cons 'a' $ B.pack "bcdef"
"abcdef"

And surely for pattenr matching which we do with (x:xs) -> head + tail is different here.

We still have the concept of head and tailbut that is done like:

  
  

takeTailN2 :: ByteString -> Int -> ByteString
takeTailN2 bs n = 
    case B.uncons bs of
        
        Nothing -> B.empty

        Just (x, xs)
            | not (x `elem` "+-*/^") -> takeTailN2 xs (n+1)
            | otherwise -> if n == 0 then takeTailN2 xs (n+1) else B.cons x xs

So here the case ... of replace the role of basic pattern matching.

And the associated logic after pattern matching inside case ... of is done with ->.

Because remember the old TailN2 was:

  
  

takeTailN2 :: [Char] -> Int -> [Char]
takeTailN2 [] _ = []
takeTailN2 (x:xs) n
    | not (x `elem` "+-*/") = takeTailN2 xs (n+1)
    | otherwise = if n == 0 then takeTailN2 xs (n+1) else  x:xs

Completely different synthax for pattern matching but same logic.

Actual Code

Here it is.

  
  

import Text.Printf

import qualified Data.ByteString.Char8 as B
import Data.ByteString.Char8 (ByteString)

factorial :: Int -> Int
factorial 1 = 1
factorial n = n * factorial (n - 1)

grepn2 :: (Eq a) => a -> [a] -> [Int]
grepn2 cmp xs = subGrepn2 xs cmp 0 []

subGrepn2 :: (Eq a) => [a] -> a -> Int -> [Int] -> [Int]
subGrepn2 [] _ _ nxs = nxs
subGrepn2 (x:xs) cmp n nxs
    | cmp == x  = subGrepn2 xs cmp (n + 1) (n:nxs)
    | otherwise = subGrepn2 xs cmp (n + 1) nxs

myMax :: (Ord a) => [a] -> a
myMax xs = subMyMax xs (head xs)

subMyMax :: (Ord a) => [a] -> a -> a
subMyMax [] cmp = cmp
subMyMax (x:xs) cmp = 
    let cmp2 = if cmp >= x
              then cmp
              else x
    in subMyMax xs cmp2

parserPar :: ByteString -> ([Int], [Int])
parserPar xs = subParserPar xs [] [] [] 0 0

subParserPar :: ByteString -> [Int] -> [Int] -> [Int] -> Int -> Int
             -> ([Int], [Int])
subParserPar bs ids nums valxs n n2 =
    case B.uncons bs of
        Nothing ->
            (ids, nums)

        Just (x, xs)
            | x == '(' ->
                let newids = ids ++ [n]
                    newnums = nums ++ [n2]
                    newvalxs = map (\v -> v + 1) valxs
                    newvalxs2 = newvalxs ++ [1]
                in subParserPar xs newids newnums newvalxs2 (n + 1) (n2 + 1)

            | x == ')' ->
                let newvalxs = map (\v -> v - 1) valxs
                    idx = findFirstZero (reverse newvalxs) 0
                    idx2 = length valxs - idx - 1
                    newids = ids ++ [n]
                    newnums = nums ++ [nums !! idx2]
                in subParserPar xs newids newnums (newvalxs ++ [0]) (n + 1) n2

            | otherwise ->
                subParserPar xs ids nums valxs (n + 1) n2

findFirstZero :: [Int] -> Int -> Int
findFirstZero (xi:xsi) n
              | xi == 0 = n
              | otherwise = findFirstZero xsi (n + 1)

calc :: ByteString -> ByteString
calc xs = 
    let (ids, nums) = parserPar xs
        newxs = subCalc xs ids nums
    in protoCalc newxs
    
subCalc :: ByteString -> [Int] -> [Int] -> ByteString
subCalc xs [] [] = xs
subCalc xs ids nums =
    let curmax = myMax nums
        [id1, id2] = grepn2 curmax nums

        idstrt = ids !! id2
        idstop = ids !! id1

        xsstrt = B.take idstrt xs
        xsstop = B.drop (idstop + 1) xs
        xsbetween = B.take (idstop - idstrt - 1) (B.drop (idstrt + 1) xs)

        rslt = protoCalc xsbetween

        newxs = xsstrt <> rslt <> xsstop

        (newids, newnums) = parserPar newxs
    in subCalc newxs newids newnums

protoCalc :: ByteString -> ByteString
protoCalc xs =
    let step0 = clearOperator xs

        step1 = subProtoCalcIdentity step0 B.empty
        step2 = clearOperator step1

        step3 = subProtoCalcExponent step2 B.empty
        step4 = clearOperator step3

        step5 = subProtoCalc step4 B.empty
        step6 = clearOperator step5

        step7 = subProtoCalc2 step6 B.empty 0
    in clearOperator step7

takeBack2 :: ByteString -> Int -> ByteString
takeBack2 bs n = 
    case B.uncons bs of
        
        Nothing -> B.empty

        Just (x, xs) 
            | not (x `elem` "+-*/^") -> B.cons x (takeBack2 xs (n+1))
            | otherwise -> if n == 0 then B.cons x (takeBack2 xs (n+1)) else B.empty

takeTailN2 :: ByteString -> Int -> ByteString
takeTailN2 bs n = 
    case B.uncons bs of
        
        Nothing -> B.empty

        Just (x, xs)
            | not (x `elem` "+-*/^") -> takeTailN2 xs (n+1)
            | otherwise -> if n == 0 then takeTailN2 xs (n+1) else B.cons x xs

subProtoCalc :: B.ByteString -> B.ByteString -> B.ByteString
subProtoCalc bs outxs =
    case B.uncons bs of

        Nothing ->
            outxs

        Just (x, xs)
            | x == '*' ->
                let val1 =
                        read . B.unpack . B.reverse $
                            takeBack2 (B.reverse outxs) 0 :: Double

                    val2 =
                        read . B.unpack $
                            takeBack2 xs 0 :: Double

                    newoutxs =
                        B.reverse $ takeTailN2 (B.reverse outxs) 0

                    newxs =
                        takeTailN2 xs 0

                    result =
                        B.pack (show (val1 * val2))

                in subProtoCalc newxs (newoutxs <> result)

            | x == '/' ->
                let val1 =
                        read . B.unpack . B.reverse $
                            takeBack2 (B.reverse outxs) 0 :: Double

                    val2 =
                        read . B.unpack $
                            takeBack2 xs 0 :: Double

                    newoutxs =
                        B.reverse $ takeTailN2 (B.reverse outxs) 0

                    newxs =
                        takeTailN2 xs 0

                    result =
                        B.pack (show (val1 / val2))

                in subProtoCalc newxs (newoutxs <> result)

            | otherwise ->
                subProtoCalc xs (B.snoc outxs x)

clearOperator :: B.ByteString -> B.ByteString
clearOperator bs =
    case B.uncons bs of
        Nothing ->
            B.empty

        Just (x1, rest) ->
            case B.uncons rest of
                Nothing ->
                    B.singleton x1

                Just (x2, xs)
                    | x1 == '+' && x2 == '-' ->
                        clearOperator (B.cons '-' xs)

                    | x1 == '-' && x2 == '+' ->
                        clearOperator (B.cons '-' xs)

                    | x1 == '+' && x2 == '+' ->
                        clearOperator (B.cons '+' xs)

                    | x1 == '-' && x2 == '-' ->
                        clearOperator (B.cons '+' xs)

                    | otherwise ->
                        B.cons x1 (clearOperator (B.cons x2 xs))

subProtoCalc2 :: ByteString -> ByteString -> Int -> ByteString
subProtoCalc2 bs outxs n =
    case B.uncons bs of
        Nothing ->
            outxs

        Just (x, xs)
            | x == '+' ->
                let val1raw =
                        read . B.unpack . B.reverse $
                            takeBack2 (B.reverse outxs) 0 :: Double

                    val2 =
                        read . B.unpack $
                            takeBack2 xs 0 :: Double

                    newoutxsRaw =
                        B.reverse $ takeTailN2 (B.reverse outxs) 0

                    (newoutxs, val1) =
                        if newoutxsRaw == B.singleton '-'
                        then (B.empty, -val1raw)
                        else (newoutxsRaw, val1raw)

                    newxs =
                        takeTailN2 xs 0

                    result =
                        B.pack (show (val1 + val2))

                in subProtoCalc2 newxs (newoutxs <> result) (n + 1)

            | x == '-' && n /= 0 ->
                let val1raw =
                        read . B.unpack . B.reverse $
                            takeBack2 (B.reverse outxs) 0 :: Double

                    val2 =
                        read . B.unpack $
                            takeBack2 xs 0 :: Double

                    newoutxsRaw =
                        B.reverse $ takeTailN2 (B.reverse outxs) 0

                    (newoutxs, val1) =
                        if newoutxsRaw == B.singleton '-'
                        then (B.empty, -val1raw)
                        else (newoutxsRaw, val1raw)

                    newxs =
                        takeTailN2 xs 0

                    result =
                        B.pack (show (val1 - val2))

                in subProtoCalc2 newxs (newoutxs <> result) (n + 1)

            | otherwise ->
                subProtoCalc2 xs (B.snoc outxs x) (n + 1)

subProtoCalcIdentity :: ByteString -> ByteString -> ByteString
subProtoCalcIdentity bs outxs = 
    case B.uncons bs of

        Nothing -> outxs

        Just(x, xs)
            | x == 'E' -> 
                let val = read . B.unpack $ takeBack2 xs 0 :: Double
                    newxs = takeTailN2 xs 0
                    result = B.pack (printf "%8f" (exp val) :: String)
                in subProtoCalcIdentity newxs (outxs <> result)
            | x == 'L' -> 
                let val = read . B.unpack $ takeBack2 xs 0 :: Double
                    newxs = takeTailN2 xs 0
                    result = B.pack (printf "%8f" (log val) :: String)
                in subProtoCalcIdentity newxs (outxs <> result)
            | x == '!' ->
                let val = read . B.unpack $ takeBack2 xs 0 :: Int
                    newxs = takeTailN2 xs 0
                    result = B.pack $ show (factorial val)
                in subProtoCalcIdentity newxs (outxs <> result)
            | otherwise -> subProtoCalcIdentity xs (B.snoc outxs x)


subProtoCalcExponent :: ByteString -> ByteString -> ByteString
subProtoCalcExponent bs outxs = 
    case B.uncons bs of

        Nothing -> outxs

        Just(x, xs)
            | x == '^' ->
                let val1 = read . B.unpack . B.reverse $ takeBack2 (B.reverse outxs) 0 :: Double
                    val2 = read . B.unpack $ takeBack2 xs 0 :: Double
                    newoutxs = B.reverse $ takeTailN2 (B.reverse outxs) 0
                    newxs = takeTailN2 xs 0 
                    result = B.pack (show (val1**(val2)))
                in subProtoCalcExponent newxs (newoutxs <> result)
            | otherwise -> subProtoCalcExponent xs (B.snoc outxs x)

Note i could have done above:

When you want to debug with "printing", do it with traces, for example add the import.

  
  

import Debug.Trace

And here is the fucntion where it is good to print intermediate results.

  
  

protoCalc :: [Char] -> [Char]
protoCalc xs =
    let step0 = clearOperator xs

        step1 = subProtoCalcIdentity step0 []
        step2 = clearOperator step1

        step3 = subProtoCalcExponent step2 []
        step4 = clearOperator step3

        step5 = subProtoCalc step4 []
        step6 = clearOperator step5

        step7 = subProtoCalc2 step6 [] 0
    in trace ("step0 input:    " ++ step0) $
       trace ("step1 identity: " ++ step1) $
       trace ("step2 clear:    " ++ step2) $
       trace ("step3 exponent: " ++ step3) $
       trace ("step4 clear:    " ++ step4) $
       trace ("step5 */:       " ++ step5) $
       trace ("step6 clear:    " ++ step6) $
       trace ("step7 +-:       " ++ step7) $
       clearOperator step7

And we still in pure stateless state lol, -> just intermediate trace functions.

Benchmarks & Results - Architecture 1 - Representation 2

Same compiling and execution than for the first one.

  
  

❯ ghc -O2 -rtsopts calculator3.hs -o calculator3
Loaded package environment from /home/juju/.ghc/x86_64-linux-9.4.7/environments/default
juju@juju-System-Product-Name ~/99HaskellProbs
❯ ./calculator3 +RTS -s
21.00425863264272
  14,051,918,568 bytes allocated in the heap
       2,341,552 bytes copied during GC
          80,368 bytes maximum residency (2 sample(s))
          29,512 bytes maximum slop
               6 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3359 colls,     0 par    0.018s   0.019s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.000s   0.000s     0.0002s    0.0003s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    3.405s  (  3.404s elapsed)
  GC      time    0.018s  (  0.020s elapsed)
  EXIT    time    0.000s  (  0.006s elapsed)
  Total   time    3.424s  (  3.430s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,126,271,880 bytes per MUT second

  Productivity  99.5% of total user, 99.2% of total elapsed

That's already better.

total of heap allocation over time = 19GB -> 14GB
GC's bytes copied = -> 4.7MiB -> 2.3MiB
max bytes copied at a moment = 107.4KiB -> 80.37Kib
and the wasted memory = 29.4KiB -> 29.5KiB

So significantively better performance, but still slow.

Tokenizer and Direct WaterFall Evaluation

Instead of applying logic to strings on the fly, we just pay a small cost of tokenizing the string into Tokens and then we apply logic at an extremely lower cost that will avoid creating temprorary object for nothing.

We will just evaluate the Tokens ONCE !

And not replacing parenthesis computations with its result and then re-evaluating this one (parenthesis tokenization + parsing of values).

We still work with ByteString since it gave use clear advantages for data representation.

The Tokenizer

First, we just pay a small cost of tokenizing, but not only parenthesis, ALL.

ALL is of course parenthesis, but also values and operators.

There won't be a weird representation of parnthesis like the previous parserPar had.

No just raw tokens in a List.

Imports

First the infamous import for ByteString.

  
  

import qualified Data.ByteString.Char8 as B
import Data.ByteString.Char8 (ByteString)

Then, an import for functions that will use, isDigit tells us if the current Char is a digit or not:

  
  

import Data.Char (isDigit, isSpace)

For example:

  
  

ghci> import Data.Char (isDigit)
ghci> isDigit '.'
False
ghci> isDigit '1'
True

And isSpace:

  
  

ghci> isSpace ' '
True

Structure for a Token

We then define our own type that can take several values, operators, Value, Identity (Exp, Log...), Parenthesis...

  
  

data Token
    = TNum Double
    | TPlus
    | TMinus
    | TMul
    | TDiv
    | TPow
    | TExp
    | TLog
    | TLParen
    | TRParen
    deriving (Show, Eq)

Of course each value it takes must be showable Show and must be comparable Eq.

Logic

Here is the code.

  
  
tokenize :: ByteString -> Either String [Token]
tokenize bs0 =
    let bs = B.dropWhile isSpace bs0
    in if B.null bs
       then Right []
       else if B.isPrefixOf (B.pack "exp") bs
            then (TExp :) <$> tokenize (B.drop 3 bs)
       else if B.isPrefixOf (B.pack "log") bs
            then (TLog :) <$> tokenize (B.drop 3 bs)
       else
            case B.uncons bs of
                Nothing ->
                    Right []

                Just (x, xs)
                    | isDigit x || x == '.' ->
                        let (numTxt, rest) =
                                B.span (\c -> isDigit c || c == '.') bs
                        in case reads (B.unpack numTxt) :: [(Double, String)] of
                            [(n, "")] -> --parser succeeds
                                (TNum n :) <$> tokenize rest

                            _ ->
                                Left ("Invalid number: " ++ B.unpack numTxt)

                    | x == '+' ->
                        (TPlus :) <$> tokenize xs

                    | x == '-' ->
                        (TMinus :) <$> tokenize xs

                    | x == '*' ->
                        (TMul :) <$> tokenize xs

                    | x == '/' ->
                        (TDiv :) <$> tokenize xs

                    | x == '^' ->
                        (TPow :) <$> tokenize xs

                    | x == 'E' ->
                        (TExp :) <$> tokenize xs

                    | x == 'L' ->
                        (TLog :) <$> tokenize xs

                    | x == '(' ->
                        (TLParen :) <$> tokenize xs

                    | x == ')' ->
                        (TRParen :) <$> tokenize xs

                    | otherwise ->
                        Left ("Unknown character: " ++ [x])

We just consumes tokens that do not interest us (which is space) with:

  
  

    let bs = B.dropWhile isSpace bs0

Before ingesting tokens.

So about the identity of the value, we'll use

  
  

      else if B.isPrefixOf (B.pack "exp") bs
            then (TExp :) <$> tokenize (B.drop 3 bs)
       else if B.isPrefixOf (B.pack "log") bs
            then (TLog :) <$> tokenize (B.drop 3 bs)

So just checking if the current start of the ByteString matches "exp" or "log", if yes just appending a Token TExp or TLog to the Tokens List and dropping the 3 characters that compose these tokens, with a recursive call.

Example of Exonential.

  
  
            
    then (TExp :) <$> tokenize (B.drop 3 bs)

If tthat does not match, then we just unconstruct the token and pattern match on it.

  
  

    case B.uncons bs of

If this outputs nothing, then we know that we consume all the characters, hence this is the end of the recursive calls.

  
  

Nothing ->
    Right []

Of course we just check the head for a known simple char token:

  
  

| x == '+' ->
    (TPlus :) <$> tokenize xs

| x == '-' ->
    (TMinus :) <$> tokenize xs

| x == '*' ->
    (TMul :) <$> tokenize xs

| x == '/' ->
    (TDiv :) <$> tokenize xs

| x == '^' ->
    (TPow :) <$> tokenize xs

| x == 'E' ->
    (TExp :) <$> tokenize xs

| x == 'L' ->
    (TLog :) <$> tokenize xs

| x == '(' ->
    (TLParen :) <$> tokenize xs

| x == ')' ->
    (TRParen :) <$> tokenize xs

But for more complex token that may describe a numeric value, here is the code:

  
  

| isDigit x || x == '.' ->
    let (numTxt, rest) =
            B.span (\c -> isDigit c || c == '.') bs
    in case reads (B.unpack numTxt) :: [(Double, String)] of
        [(n, "")] -> --parser succeeds
            (TNum n :) <$> tokenize rest

        _ ->
            Left ("Invalid number: " ++ B.unpack numTxt)

With that:

  
  

| isDigit x || x == '.' ->

We consider for example 0.23 or .023 being a numeric value.

But to be sure of it we go further with B.span method.

  
  

    let (numTxt, rest) =
            B.span (\c -> isDigit c || c == '.') bs

It just creates a String from a another String from left to right while the condition is True.

And it also outputs the rest.

Think of it like a B.take idx and a B.drop idx where idx is computed beeing incremented while the condition is True from the first Char on the input ByteString bs.

And yess, c is a valid Char extracted from bs which is a ByteString.

Example:

  
  

ghci> B.span (\x -> x `elem` "PA") (B.pack "PARIS")
("PA","RIS")

The infamous `reads` function

This one s tricky.

Remember when we convert from String to Numeric value, like Double with:

  
  

ghci> read "1.3" :: Double
1.3

The reads function will do approxiately the same but Char by Char until it does not succed anymore to convert to the given type.

In the parser, on the String B.unpack numTxt and try to convert it to the type Double (for each Char).

But why is:

  
  

reads (B.unpack numTxt) :: [(Double, String)]

then ?

That's because reads will output a list of pairs.

The pairs are:

  
  
(TypeYouWantToConvertTo, RemaindingString)

And now i hear you saying:

"Okok, but why a List of pairs, i mean here we jst want to convert to a Type, it's explicit enough!"

Right, in this case it will indeed output a singleton with only one pair.

This is also why we check like that to see if our numeric value parsing succeded:

  
  

[(n, "")] -> --parser succeeds -- singleton here
    (TNum n :) <$> tokenize rest

_ ->
    Left ("Invalid number: " ++ B.unpack numTxt)

But reads is more general, and in some custom type a user can define the logic of read InputString to a custom type can succedd but givng 2 or more values for its custom type.

Imagine this custom type:

  
  

data Weird = A | B
    deriving Show

Now we'll define their behaviour when a read InputString :: Weird is applied.

In order to do that, we just import.

  
  

import Text.ParserCombinators.ReadP

That gives us the opportunity to do so by defining a special instance for type Weird.

  
  

instance Read Weird where
    readsPrec _ input =
        readP_to_S parser input
      where
        parser =
            (string "France" >> return A)
            +++
            (string "France" >> return B)

Because read will use readsPrec so we just define the function when it tries to convert to type Weird.

We see here that it must call readP_to_S parser on input.

Where parser is THE custom parser logic we implement to tell which values of the type Weird a read on a String should return depending on the value of String.

Here, we just say that it should at the mean time return A and B when the string starts with "France".

Then when we apply reads with this casting [(Weird, String)].

  
  

ghci> reads "France is where i live" :: [(Weird, String)]
[(A," is where i live"),(B," is where i live")]

You see that it returned one pair per type found.

The Parser (With direct evaluator)

So, we just have tokenized our string, great !

But now we have to evaluate it !

How to do it then ?

I mean we still have to preserver the PEMDAS.

And also we DO NOT WANT to evaluate the input more than one time.

So a big shift in the mental odel is necessary.

We will do a weterfall evaluation.

What i mean by that is starting from the least prioritary to the most prioritary operations.

Evaluating like that allow us to evaluate the input once, because note what happens when we do that conceptually.

Evaluation of + and - (left to right)

And there maybe i come accross a token that belongs to another prioritary level, a token that for this level i do not know so i'll ask the next level to evaluate it.

I do this every time in fact even at the very beginning with parseExpr.

With this concept, B.pack "3^2+3*2+(5-6)" can be properly evaluated.

That's what i call a waterfall evaluation.

  
  

parseExpr :: [Token] -> Either String (Double, [Token])
parseExpr tokens = do
    (lhs, rest) <- parseTerm tokens
    parseExprRest lhs rest

parseExprRest :: Double -> [Token] -> Either String (Double, [Token])
parseExprRest acc tokens =
    case tokens of
        TPlus : rest -> do
            (rhs, rest') <- parseTerm rest
            parseExprRest (acc + rhs) rest'

        TMinus : rest -> do
            (rhs, rest') <- parseTerm rest
            parseExprRest (acc - rhs) rest'

        _ ->
            Right (acc, tokens)

Evaluating * and / (left to right)

  
  

parseTerm :: [Token] -> Either String (Double, [Token])
parseTerm tokens = do
    (lhs, rest) <- parsePower tokens
    parseTermRest lhs rest

parseTermRest :: Double -> [Token] -> Either String (Double, [Token])
parseTermRest acc tokens =
    case tokens of
        TMul : rest -> do
            (rhs, rest') <- parsePower rest
            parseTermRest (acc * rhs) rest'

        TDiv : rest -> do
            (rhs, rest') <- parsePower rest
            parseTermRest (acc / rhs) rest'

        _ ->
            Right (acc, tokens)

Same thing, the only architectural difference here is that i explicitely disociated the function that will parse * and / from the call to the next prioritary level, because i must do it anyway, so i do once when parseTerm is called.

Evaluating exponentialtion (power)

Here i'm at the most prioritary level operator wise, so just before computing i will evaluate the identity.

  
  

parsePower :: [Token] -> Either String (Double, [Token])
parsePower tokens = do
    (base, rest) <- parseUnary tokens
    case rest of
        TPow : rest' -> do
            (exponent, rest'') <- parsePower rest'
            Right (base ** exponent, rest'')

        _ ->
            Right (base, rest)

Evaluating exp, log, but also normalizin operators.

  
  

parseUnary :: [Token] -> Either String (Double, [Token])
parseUnary tokens =
    case tokens of
        TPlus : rest ->
            parseUnary rest

        TMinus : rest -> do -- impressive, very nice
            (v, rest') <- parseUnary rest
            Right (-v, rest')

        TExp : rest -> do
            (v, rest') <- parseUnary rest
            Right (exp v, rest')

        TLog : rest -> do
            (v, rest') <- parseUnary rest
            Right (log v, rest')

        _ ->
            parsePrimary tokens

And after that if i again find a token i do not know for this level, i call parsePrimary.

  
  

_ ->
    parsePrimary tokens

Here this is the absolute prioritary level !

  
  

parsePrimary :: [Token] -> Either String (Double, [Token])
parsePrimary tokens =
    case tokens of
        TNum n : rest ->
            Right (n, rest)

        TLParen : rest -> do
            (v, rest') <- parseExpr rest
            case rest' of
                TRParen : rest'' ->
                    Right (v, rest'')

                _ ->
                    Left "Expected closing parenthesis"

        [] ->
            Left "Unexpected end of expression"

        tok : _ ->
            Left ("Unexpected token: " ++ show tok)

In one case, it can just return a TNum, when for example caled from parsePower -> parseUnary -> ParsePrimary.

  
  

TNum n : rest ->
    Right (n, rest)

But, in another case it can encounter an opening parenthesis, then it must call te first priritary level for the computation inside the parenthesis:

  
  

TLParen : rest -> do
    (v, rest') <- parseExpr rest
    case rest' of
        TRParen : rest'' ->
            Right (v, rest'')

        _ ->
            Left "Expected closing parenthesis"

And, that's also here that i can be sure to return error from unknown tokens for example or ending abruptly.

  
  

[] ->
    Left "Unexpected end of expression"

tok : _ ->
    Left ("Unexpected token: " ++ show tok)

The entry point.

It's is just.

  
  

calc :: ByteString -> Either String Double
calc input = do
    tokens <- tokenize input
    parseCalc tokens

With parseCalc doing the last error handling (when tokens can still be there).

  
  

parseCalc :: [Token] -> Either String Double
parseCalc tokens =
    case parseExpr tokens of
        Left err ->
            Left err

        Right (result, rest) ->
            case rest of
                [] ->
                    Right result

                _ ->
                    Left ("Unexpected tokens at end: " ++ show rest)

Benchmark & Results

First, we just test it.

  
  

ghci> calc $ B.pack "-6+-(-7+exp-3/0.2)*4"
Right 21.00425863264272

Or with default exp synthax.

  
  

ghci> calc $ B.pack "-6+-(-7+exp-3/0.2)*4"
Right 21.00425863264272

Good, now the benchmarks !

  
  

❯ ghc -O2 -rtsopts calculator4.hs -o calculator4
Loaded package environment from /home/juju/.ghc/x86_64-linux-9.4.7/environments/default
[1 of 2] Compiling Main             ( calculator4.hs, calculator4.o ) [Source file changed]
[2 of 2] Linking calculator4 [Objects changed]

  
  

❯ ./calculator4 +RTS -s
21.00425863264272
   1,891,338,976 bytes allocated in the heap
         305,128 bytes copied during GC
          44,328 bytes maximum residency (2 sample(s))
          33,496 bytes maximum slop
               6 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       452 colls,     0 par    0.002s   0.002s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.000s   0.000s     0.0002s    0.0003s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.304s  (  0.304s elapsed)
  GC      time    0.002s  (  0.002s elapsed)
  EXIT    time    0.000s  (  0.004s elapsed)
  Total   time    0.306s  (  0.310s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    6,219,790,958 bytes per MUT second

  Productivity  99.2% of total user, 98.0% of total elapsed

Damn it.

Massive improovements.

Conclusion

Tokenizer + Parser + WaterFall Evaluator allocates ~10.3× less heap than calculator.

Tokenizer + Parser + WaterFall Evaluator runs ~13.2× faster than calculator.

The speedup mostly comes from doing much less temporary string/list allocation work, not from reducing GC overhead, because GC time was already tiny in all three versions.

That's why Parsers were invented.

Context

Boring Fundamental functions

Naive version

Parenthesis tokenization

Proto computation

Execution model

Clearing Operators

Going back to the execution model

Final Pipeline

Benchmark & Results

Naive version, but better representation

Changing representation of List [Char]

What's new ?

Actual Code

Benchmarks & Results - Architecture 1 - Representation 2

Tokenizer and Direct WaterFall Evaluation

The Tokenizer

Imports

Structure for a Token

Logic

The infamous reads function

The Parser (With direct evaluator)

Benchmark & Results

Conclusion

Changing representation of List `[Char]`

The infamous `reads` function