Writing a Regular Expression parser in Haskell: Part 4

With the previous two modules in place we are now set up to use a DFA to match against a string.  In my implementation I support either a greedy match or an short match.  In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match. 

To do the matching I have a general function which will create a list of all matches.  Then the difference between short and greedy matching is which of the candidate solutions does it choose.

This is the method:

doMatch func machine st [] = doAccept  machine st []
doMatch func machine st string =  func $ map (\f -> doMatch’ st f []) (tails string)
    where
      doMatch’ state [] soFar = doAccept machine st soFar
      doMatch’ state (s:str) soFar = 
          case findTransition machine s state of
            Nothing -> doAccept machine state soFar
            Just (from, to, val) -> case doMatch’ to str (soFar ++ [s]) of
                                      (False,_) -> case canAccept machine to of
                                                    True -> (True, soFar ++ [s])
                                                    False -> doMatch’ to str (soFar ++ [s])
                                      (True,res) -> (True,res)

This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.

For short or long matches I pass in one of these two functions:

-- Get the shortest match
shortest matches = case  filter (\s->fst s) (sort matches) of
                     [] -> (False,”")
                     ms -> head ms
 
—- Get the longest match
longest matches = last.sort $ matches

I created aliases for the functions to make it more handy:

(=~) = greedyMatch
(=~?) = shortMatch

And then the final result:

SimpleRegex> “hiphiphiphorray” =~? “hip(hip)“
(True,”hip”)
 
SimpleRegex> “hiphiphiphorray” =~ “hip(hip)“
(True,”hiphiphip”)

I attached the whole regex parser here: [download id=”6″ format=”3″]