With the previous two modules in place we are now set up to use a DFA to match against a string. In my implementation I support either a greedy match or an short match. In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match.
To do the matching I have a general function which will create a list of all matches. Then the difference between short and greedy matching is which of the candidate solutions does it choose.
This is the method:
doMatch func machine st [] = doAccept machine st []
doMatch func machine st string = func $ map (\f -> doMatch’ st f []) (tails string)
where
doMatch’ state [] soFar = doAccept machine st soFar
doMatch’ state (s:str) soFar =
case findTransition machine s state of
Nothing -> doAccept machine state soFar
Just (from, to, val) -> case doMatch’ to str (soFar ++ [s]) of
(False,_) -> case canAccept machine to of
True -> (True, soFar ++ [s])
False -> doMatch’ to str (soFar ++ [s])
(True,res) -> (True,res)
This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.
For short or long matches I pass in one of these two functions:
-- Get the shortest match
shortest matches = case filter (\s->fst s) (sort matches) of
[] -> (False,”")
ms -> head ms
—- Get the longest match
longest matches = last.sort $ matches
I created aliases for the functions to make it more handy:
(=~) = greedyMatch (=~?) = shortMatch
And then the final result:
SimpleRegex> “hiphiphiphorray” =~? “hip(hip)“ (True,”hip”) SimpleRegex> “hiphiphiphorray” =~ “hip(hip)“ (True,”hiphiphip”)
I attached the whole regex parser here: [download id=”6″ format=”3″]
