With the previous two modules in place we are now set up to use a DFA to match against a string. In my implementation I support either a greedy match or an short match. In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match.
To do the matching I have a general function which will create a list of all matches. Then the difference between short and greedy matching is which of the candidate solutions does it choose.
This is the method:
doMatch func machine st [] = doAccept machine st [] doMatch func machine st string = func $ map (\f -> doMatch’ st f []) (tails string) where doMatch’ state [] soFar = doAccept machine st soFar doMatch’ state (s:str) soFar = case findTransition machine s state of Nothing -> doAccept machine state soFar Just (from, to, val) -> case doMatch’ to str (soFar ++ [s]) of (False,_) -> case canAccept machine to of True -> (True, soFar ++ [s]) False -> doMatch’ to str (soFar ++ [s]) (True,res) -> (True,res)
This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.
For short or long matches I pass in one of these two functions:
-- Get the shortest match shortest matches = case filter (\s->fst s) (sort matches) of [] -> (False,”") ms -> head ms —- Get the longest match longest matches = last.sort $ matches
I created aliases for the functions to make it more handy:
(=~) = greedyMatch (=~?) = shortMatch
And then the final result:
SimpleRegex> “hiphiphiphorray” =~? “hip(hip)“ (True,”hip”) SimpleRegex> “hiphiphiphorray” =~ “hip(hip)“ (True,”hiphiphip”)
I attached the whole regex parser here: [download id=”6″ format=”3″]