<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Farblondzshet in Code &#187; Regular Expression</title>
	<atom:link href="http://matthewmanela.com/category/regular-expression/feed/" rel="self" type="application/rss+xml" />
	<link>http://matthewmanela.com</link>
	<description>The life and work of Matthew Manela</description>
	<lastBuildDate>Wed, 01 Sep 2010 21:54:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Inline Regular Expression Options</title>
		<link>http://matthewmanela.com/2009/01/01/inline-regular-expression-options/</link>
		<comments>http://matthewmanela.com/2009/01/01/inline-regular-expression-options/#comments</comments>
		<pubDate>Thu, 01 Jan 2009 23:10:15 +0000</pubDate>
		<dc:creator>Matthew</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Regular Expression]]></category>

		<guid isPermaLink="false">http://blogs.msdn.com/matt/archive/2009/01/01/inline-regular-expression-options.aspx</guid>
		<description><![CDATA[I was using attributes from the System.ComponentModel.DataAnnotations&#160;&#160; namespace for model validation.&#160; This namespace includes a few very useful validation attributes such as          Required Attribute – Validates the field has a va... <a href="http://matthewmanela.com/2009/01/01/inline-regular-expression-options/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><p>I was using attributes from the <a href="http://msdn.microsoft.com/en-us/library/system.componentmodel.dataannotations.aspx" >System.ComponentModel.DataAnnotations</a>&#160;&#160; namespace for model validation.&#160; This namespace includes a few very useful validation attributes such as</p>  <ol>   <ol>     <li>Required Attribute – Validates the field has a value </li>      <li>Range Attribute – Validates the field is within a given range </li>      <li>RegularExpression Attribute – Validates the field matches a given regular expression. </li>   </ol> </ol>  <p>The regular expression attribute is very useful since you can describe exactly what format you want a string property to be in.&#160; While using this though I ran into a problem.&#160; The attribute doesn’t let you specify RegexOptions.&#160; This was an issue for me since I wanted to use the regex to validate that the users input was between 5 and 200 characters long , so I had attribute that property as such:</p>  </p>

<pre class="brush:csharp">
[RegularExpression(@".{5,200}")]
public string Text{get;set;}
</pre>

<p>However this doesn’t work since by default the wildcard . does not match new lines (which I am allowing in the input).&#160; The way to fix this is to specify the RegexOptions.SingleLine option to either the Regex constructor or Match function.&#160; The problem is I have no way of doing that here, and there is no argument on the attribute constructor to specify those options.&#160; I was considering overriding the attribute to create one that allows specifying the attribute but then I stumbled upon this:</p>

<p><a href="http://msdn.microsoft.com/library/yd1hzczs.aspx">Regular Expression Options</a></p>

<p>You are able to specify the regex option inside of the regular expression text! (which I thought was a huge discovery until my co-worker said he knew this all along but never let me know!).&#160; </p>

<p>So I just changed the expression to look like this:</p>

<pre class="brush:csharp">
[RegularExpression(@"(?s).{5,200}")]
public string Text{get; set;}
</pre>

<p>The (?s) is the inline regex option definition so say I want this in SingleLine mode! And now the validation works the way I wanted!</p>
]]></content:encoded>
			<wfw:commentRss>http://matthewmanela.com/2009/01/01/inline-regular-expression-options/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Writing a Regular Expression parser in Haskell: Part 4</title>
		<link>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-4/</link>
		<comments>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-4/#comments</comments>
		<pubDate>Tue, 11 Mar 2008 22:23:05 +0000</pubDate>
		<dc:creator>Matthew</dc:creator>
				<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Regular Expression]]></category>

		<guid isPermaLink="false">http://matthewmanela.com/?p=295</guid>
		<description><![CDATA[With the previous two modules in place we are now set up to use a DFA to match against a string.&#160; In my implementation I support either a greedy match or an short match.&#160; In a full featured regular expression &#8230; <a href="http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-4/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><P>With the previous two modules in place we are now set up to use a DFA to match against a string.&nbsp; In my implementation I support either a greedy match or an short match.&nbsp; In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match.&nbsp; </P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>To do the matching I have a general function which will create a list of all matches.&nbsp; Then the difference between short and greedy matching is which of the candidate solutions does it choose.</P></p>

<p><P>This is the method:</P></p>

<p><DIV style="BORDER-BOTTOM: gray 1px solid; BORDER-LEFT: gray 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: gray 1px solid; CURSOR: text; BORDER-RIGHT: gray 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> doMatch func machine st [] = doAccept  machine st []</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> doMatch func machine st string =  func $ map (\f -<SPAN style="COLOR: #0000ff">&gt;</SPAN> doMatch&#8217; st f []) (tails string)</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN>     where</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN>       doMatch&#8217; state [] soFar = doAccept machine st soFar</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>       doMatch&#8217; state (s:str) soFar = </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   6:</SPAN>           case findTransition machine s state of</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   7:</SPAN>             Nothing -<SPAN style="COLOR: #0000ff">&gt;</SPAN> doAccept machine state soFar</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   8:</SPAN>             Just (from, to, val) -<SPAN style="COLOR: #0000ff">&gt;</SPAN> case doMatch&#8217; to str (soFar ++ [s]) of</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   9:</SPAN>                                       (False,_) -<SPAN style="COLOR: #0000ff">&gt;</SPAN> case canAccept machine to of</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  10:</SPAN>                                                     True -<SPAN style="COLOR: #0000ff">&gt;</SPAN> (True, soFar ++ [s])</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  11:</SPAN>                                                     False -<SPAN style="COLOR: #0000ff">&gt;</SPAN> doMatch&#8217; to str (soFar ++ [s])</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  12:</SPAN>                                       (True,res) -<SPAN style="COLOR: #0000ff">&gt;</SPAN> (True,res)</PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.</P></p>

<p><P>For short or long matches I pass in one of these two functions:</P></p>

<p><DIV style="BORDER-BOTTOM: gray 1px solid; BORDER-LEFT: gray 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: gray 1px solid; CURSOR: text; BORDER-RIGHT: gray 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> &#8212; Get the shortest match</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> shortest matches = case  filter (\s-<SPAN style="COLOR: #0000ff">&gt;</SPAN>fst s) (sort matches) of</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN>                      [] -<SPAN style="COLOR: #0000ff">&gt;</SPAN> (False,&#8221;")</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN>                      ms -<SPAN style="COLOR: #0000ff">&gt;</SPAN> head ms</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   6:</SPAN> &#8212; Get the longest match</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   7:</SPAN> longest matches = last.sort $ matches</PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>I created aliases for the functions to make it more handy:</P></p>

<p><DIV style="BORDER-BOTTOM: gray 1px solid; BORDER-LEFT: gray 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; HEIGHT: 59px; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: gray 1px solid; CURSOR: text; BORDER-RIGHT: gray 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> (=~) = greedyMatch</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> (=~?) = shortMatch</PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>And then the final result:</P></p>

<p><DIV style="BORDER-BOTTOM: gray 1px solid; BORDER-LEFT: gray 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: gray 1px solid; CURSOR: text; BORDER-RIGHT: gray 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> <em>SimpleRegex<SPAN style="COLOR: #0000ff">&gt;</SPAN> &#8220;hiphiphiphorray&#8221; =~? &#8220;hip(hip)</em>&#8220;</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> (True,&#8221;hip&#8221;)</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN> <em>SimpleRegex<SPAN style="COLOR: #0000ff">&gt;</SPAN> &#8220;hiphiphiphorray&#8221; =~ &#8220;hip(hip)</em>&#8220;</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: consolas, 'Courier New', courier, monospace; BORDER-TOP-STYLE: none; COLOR: black; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN> (True,&#8221;hiphiphip&#8221;)</PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>I attached a zip of all the files for this project.</P></p>

<p><P>Enjoy!</P></p>
]]></content:encoded>
			<wfw:commentRss>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Writing a Regular Expression parser in Haskell: Part 3</title>
		<link>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-3/</link>
		<comments>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-3/#comments</comments>
		<pubDate>Tue, 11 Mar 2008 22:22:47 +0000</pubDate>
		<dc:creator>Matthew</dc:creator>
				<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Regular Expression]]></category>

		<guid isPermaLink="false">http://matthewmanela.com/?p=293</guid>
		<description><![CDATA[The third module in the simple regular expression parser is called: NFAtoDFA.&#160; Which as you might have guessed, takes the NFA that resulted from the first module and converts it into a DFA.&#160; The structure that the DFA uses is &#8230; <a href="http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-3/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><P>The third module in the simple regular expression parser is called: NFAtoDFA.&nbsp; Which as you might have guessed, takes the NFA that resulted from the first module and converts it into a DFA.&nbsp; The structure that the DFA uses is the same that the NFA uses since they are both finite state machines.</P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>Converting an NFA to a DFA requires mapping sets of nodes in the NFA to a single node in the DFA.&nbsp; Many nodes in a NFA will correspond to one node in the DFA.&nbsp; Making this change requires updating transitions to point to and from sets of nodes.&nbsp; To manage this transformation I create a state monad using the following context:</P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><DIV style="BORDER-BOTTOM: #808080 1px solid; BORDER-LEFT: #808080 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: #808080 1px solid; CURSOR: text; BORDER-RIGHT: #808080 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> &#8212; The state which we pass to build the DFA</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> data ConvertContext = ConvertContext { nfa :: FiniteMachine,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN>                                        trans :: [Transition],</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN>                                        setMap :: Map.Map (Set Node) Integer,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>                                        setStack :: [Set Node],</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   6:</SPAN>                                        begin :: Node,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   7:</SPAN>                                        accept :: Set Node,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   8:</SPAN>                                        nextNode :: Node</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   9:</SPAN>                                      } deriving (Show, Eq)</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  10:</SPAN> type ConvertState a = State ConvertContext a</PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>Most of the code in this module is just managing this context and updating it according to two operation:</P></p>

<p><OL>
<LI>Epsilon Closure</LI></p>

<p><LI>Set Move</LI>
</OL></p>

<p><P>These are explained in more detail in <A href="http://www.codeproject.com/KB/recipes/OwnRegExpressionsParser.aspx" target=_blank mce_href="http://www.codeproject.com/KB/recipes/OwnRegExpressionsParser.aspx">this article</A>. </P></p>

<p><P>Basically, epsilon closure is the process of taking a set of initial nodes and returning a new set of all nodes you can traverse to purely on epsilon transitions.&nbsp; To help with this I created some smaller methods to build up to an epsilon closure.</P></p>

<p><P>First are a couple methods (findToNodes and closure):</P></p>

<p><DIV style="BORDER-BOTTOM: #808080 1px solid; BORDER-LEFT: #808080 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: #808080 1px solid; CURSOR: text; BORDER-RIGHT: #808080 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> closure trans value nodes oldSet = Set.union (findToNodes trans value nodes) oldSet</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN> &#8212; Search the table of transitions to find all nodes you can reach given an initial set of nodes</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN> findToNodes trans value fromNodes = foldr match Set.empty trans</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>     where </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   6:</SPAN>       match (from, to, val) nodes</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   7:</SPAN>           | (from == fromNodes) &amp;&amp; (val == value) = Set.insert to nodes</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   8:</SPAN>           | otherwise = nodes</PRE></DIV></DIV>
<P><STRONG>findToNodes</STRONG> searches a transition table for all nodes which go from any node in <STRONG>fromNodes</STRONG> on <STRONG>value</STRONG>.&nbsp; It will builds up a set with all the <STRONG>to</STRONG>&nbsp; nodes that match.&nbsp; </P></p>

<p><P><STRONG>closure </STRONG>wraps findToNodes to let us easily union together an initial set and the nodes we can reach from that set.</P></p>

<p><P>With this in hand we can write clearly a epsilon closure function:</P></p>

<p><DIV style="BORDER-BOTTOM: #808080 1px solid; BORDER-LEFT: #808080 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: #808080 1px solid; CURSOR: text; BORDER-RIGHT: #808080 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> &#8212; Given an initial set of nodes, find the set of all nodes you can reach by taking </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> &#8212; transitions on epsilon only</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN> epsilonClosure trans nodes = foldUntilRepeat Set.union Set.empty $</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN>                              iterate (Set.fold (closure trans epsilon) Set.empty) nodes</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   6:</SPAN>&nbsp; </PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>This function takes full advantage of the lazy nature of Haskell.&nbsp; It repeats the closure on epsilon over and over and streams its results into our function <STRONG>foldUntilRepeat.&nbsp; </STRONG>This method does what it says, it will fold the values that are streamed in until it sees the same value twice.&nbsp; </P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>The set move is just combination of what you have already seen:</P></p>

<p><DIV style="BORDER-BOTTOM: #808080 1px solid; BORDER-LEFT: #808080 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: #808080 1px solid; CURSOR: text; BORDER-RIGHT: #808080 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> &#8212; Given a starting set of nodes the set of all nodes that you can reach on a given value</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> &#8212; This includes epislonClosure on the desitination nodes</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN> moveClosure trans value nodes = epsilonClosure trans $ </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN>                                 Set.fold (closure trans value) Set.empty nodes</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>&nbsp; </PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>With these functions in hand, this module just becomes calling them and updating the context until we have no more nodes in the NFA to process.</P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>In the <A href="http://blogs.msdn.com/matt/archive/2008/06/21/writing-a-regular-expression-parser-in-haskell-part-4.aspx" mce_href="/matt/archive/2008/06/21/writing-a-regular-expression-parser-in-haskell-part-4.aspx">next installment</A> I will discuss using the output of this modules to match a regex against a string.</P></p>

<p><P>Also, once again the code is attached.</P></p>
]]></content:encoded>
			<wfw:commentRss>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Writing a Regular Expression parser in Haskell: Part 2</title>
		<link>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-2/</link>
		<comments>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-2/#comments</comments>
		<pubDate>Tue, 11 Mar 2008 22:22:26 +0000</pubDate>
		<dc:creator>Matthew</dc:creator>
				<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Regular Expression]]></category>

		<guid isPermaLink="false">http://matthewmanela.com/?p=291</guid>
		<description><![CDATA[The first module in my simple regular expression parse is called RegexToNFA.&#160; This module exposes the types that make up a finite state machine and also the functions to convert a regular expression string into a finite state machine. My &#8230; <a href="http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><P>The first module in my simple regular expression parse is called RegexToNFA.&nbsp; This module exposes the types that make up a finite state machine and also the functions to convert a regular expression string into a finite state machine.</P></p>

<p><P>My structure for a FSM follows closely from the <A href="http://en.wikipedia.org/wiki/State_machine#Mathematical_model" target=_blank mce_href="http://en.wikipedia.org/wiki/State_machine#Mathematical_model">mathematical definition</A>:</P></p>

<p><DIV style="BORDER-BOTTOM: #808080 1px solid; BORDER-LEFT: #808080 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: #808080 1px solid; CURSOR: text; BORDER-RIGHT: #808080 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> data FiniteMachine = FiniteMachine{  table :: [Transition],</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN>                                      alphabet :: Set Char,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN>                                      start :: Node,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN>                                      final :: Set Node</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   6:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   7:</SPAN> &#8212; NFA node</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   8:</SPAN> type Node = Integer</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   9:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  10:</SPAN> &#8212; The value for an edge in a NFA</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  11:</SPAN> type TransitionValue = Maybe Char</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  12:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  13:</SPAN> &#8212; A transition in a NFA is a tuple of</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  14:</SPAN> &#8212; StartNode , DestinationNode, Value to transition on</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  15:</SPAN> type Transition = (Node,Node,TransitionValue)</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  16:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  17:</SPAN> &#8212; The value of the edge in the NFA is a Maybe Char </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  18:</SPAN> &#8212; Where Nothing is the epsilon transition</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  19:</SPAN> &#8212; therefore lets just rename Nothing to epsilon</PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>I have the value which you transition on as a Maybe Char (which I alias as TransitionValue).&nbsp; This allowed me to define epsilon as Nothing data constructor.&nbsp; </P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>With this structure defined my goal now is to convert a regular expression pattern such as: (a|b)* into a FiniteMachine.&nbsp; In order to do this there is a lot of state that I need to keep track of which naturally leads to the use of the State monad.&nbsp; To do this I set up a structure for what data I want to be kept track of and then create a state monad using that structure:</P></p>

<p><DIV style="BORDER-BOTTOM: #808080 1px solid; BORDER-LEFT: #808080 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: #808080 1px solid; CURSOR: text; BORDER-RIGHT: #808080 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> &#8212; The state that gets passed around which we used to build up the NFA</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> data ParseContext = Context </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN>                     {</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN>                       nodeList :: [Node],</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>                       transitions :: [Transition],</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   6:</SPAN>                       operators :: OperatorList,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   7:</SPAN>                       nextNode :: Node,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   8:</SPAN>                       values :: Set Char</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   9:</SPAN>                     } deriving (Show, Eq)</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  10:</SPAN>&nbsp; </PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  11:</SPAN> &#8212; Alias the State data constructor with a more friendly name</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  12:</SPAN> type RegexParseState a = State ParseContext a</PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>This structure is passed between functions to allow them to see the current state of the parsing and create a new state.&nbsp; I define many functions, each which deal with a piece of the puzzle of converting the input string into a FSM.&nbsp; I am not going to address them all but I will point out some which is note worth:</P></p>

<p><P><STRONG>convertToNFA</STRONG> &#8211; This is the top level function, it is exposed externally and lets you convert a regex to a NFA.</P></p>

<p><P><STRONG>processOperator</STRONG> &#8211; This function determines when we should execute an operator given its precedence.&nbsp; We assign each operator a precedence which lets us determine when we should execute an operator.&nbsp; For example in the expression a|b*, we want to execute star before we execute union.&nbsp; </P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>Last but not least are the methods which execute the operators.&nbsp; For example, there is one called <STRONG>doConcat</STRONG>, which performs the concatenation of two values in the regular expression. <STRONG>doConcat</STRONG> isn&#8217;t pretty, since its doing the dirty work of examining the state and create a new state to reflect a partially completed FSM.</P></p>

<p><DIV style="BORDER-BOTTOM: #808080 1px solid; BORDER-LEFT: #808080 1px solid; PADDING-BOTTOM: 4px; LINE-HEIGHT: 12pt; BACKGROUND-COLOR: #f4f4f4; MARGIN: 20px 0px 10px; PADDING-LEFT: 4px; WIDTH: 97.5%; PADDING-RIGHT: 4px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; MAX-HEIGHT: 200px; FONT-SIZE: 8pt; OVERFLOW: auto; BORDER-TOP: #808080 1px solid; CURSOR: text; BORDER-RIGHT: #808080 1px solid; PADDING-TOP: 4px">
<DIV style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px">
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   1:</SPAN> &#8212; Execute the concat operator</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   2:</SPAN> doConcat :: RegexParseState ()</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   3:</SPAN> doConcat = do</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   4:</SPAN>   st <SPAN style="COLOR: #0000ff">&lt;</SPAN>- get</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   5:</SPAN>   let nodes = nodeList st</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   6:</SPAN>       newNodes = (nodes !! 0) : (nodes !! 3) : (drop 4  nodes)</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   7:</SPAN>       newTransitions = transitions st ++ [(nodes !! 2, nodes !! 1, epsilon)]</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   8:</SPAN>       newOperators = tail $ operators st</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">   9:</SPAN>   put $ st { nodeList = newNodes,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  10:</SPAN>              transitions = newTransitions ,</PRE>
<PRE style="BORDER-BOTTOM-STYLE: none; PADDING-BOTTOM: 0px; LINE-HEIGHT: 12pt; BORDER-RIGHT-STYLE: none; BACKGROUND-COLOR: #f4f4f4; MARGIN: 0em; PADDING-LEFT: 0px; WIDTH: 100%; PADDING-RIGHT: 0px; FONT-FAMILY: Consolas, 'Courier New', Courier, monospace; BORDER-TOP-STYLE: none; COLOR: #000000; FONT-SIZE: 8pt; BORDER-LEFT-STYLE: none; OVERFLOW: visible; PADDING-TOP: 0px"><SPAN style="COLOR: #606060">  11:</SPAN>              operators = newOperators}</PRE></DIV></DIV>
<P mce_keep="true">&nbsp;</P></p>

<p><P>With all this in place, lets finally see what this module actually outputs.</P></p>

<p><P>&gt; convertToNFA &#8220;a|(bc)*&#8221;</P></p>

<p><P>FiniteMachine {table = <BR>[(4,5,Just 'c'),(2,3,Just 'b'),(0,1,Just'a'),<BR>(3,4,Nothing),(6,2,Nothing),(6,0,Nothing),<BR>(1,7,Nothing),(5,7,Nothing),(8,6,Nothing),<BR>(8,7,Nothing),(7,9,Nothing),(9,8,Nothing)], <BR>alphabet = fromList &#8220;abc&#8221;, <BR>start = 8, <BR>final = fromList [9]} 
<P>If you examine the table list in the output, you will see all the transitions for the NFA that accepts &#8220;a|(bc)*&#8221; and that the start state is node 8 and the accept state is node 9.&nbsp; </P></p>

<p><P>I uploaded the RegexToNFA.hs file for your examination.&nbsp; I tried to comment it a good amount and I feel it should be pretty easy to read and understand.</P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>In the next part I will talk about the next modules: <A href="http://blogs.msdn.com/matt/archive/2008/06/09/writing-a-regular-expression-parser-in-haskell-part-3.aspx" mce_href="/matt/archive/2008/06/09/writing-a-regular-expression-parser-in-haskell-part-3.aspx">NFAtoDFA</A></P></p>
]]></content:encoded>
			<wfw:commentRss>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Writing a Regular Expression parser in Haskell: Part 1</title>
		<link>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-1/</link>
		<comments>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-1/#comments</comments>
		<pubDate>Tue, 11 Mar 2008 22:22:02 +0000</pubDate>
		<dc:creator>Matthew</dc:creator>
				<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Regular Expression]]></category>

		<guid isPermaLink="false">http://matthewmanela.com/?p=289</guid>
		<description><![CDATA[A few weeks ago I read this article about writing a simple regular expression parser.&#160; That article does a really good job of explaining the theory behind regular expression.&#160; It then goes step by step into how to write a &#8230; <a href="http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-1/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><P>A few weeks ago I read <A href="http://www.codeproject.com/KB/recipes/OwnRegExpressionsParser.aspx" target=_blank mce_href="http://www.codeproject.com/KB/recipes/OwnRegExpressionsParser.aspx">this article</A> about writing a simple regular expression parser.&nbsp; That article does a really good job of explaining the theory behind regular expression.&nbsp; It then goes step by step into how to write a program (he uses C++) to parse a regular expression, convert it into a NFA, convert that into a DFA and then use that DFA to match strings.</P></p>

<p><P>After reading that I decided to write my own simple regular expression parser using Haskell.&nbsp; I saw it as a challenge to try to see how you deal with a more complex program in a pure functional language.&nbsp; After a couple weeks ( Grand Theft Auto 4 kind of ruined my progress for a while ) I have some results.</P></p>

<p><P>I split the project into 3 modules. </P></p>

<p><OL>
<LI><A href="http://blogs.msdn.com/matt/archive/2008/06/02/writing-a-regular-expression-parser-in-haskell-part-2.aspx" target=_blank mce_href="http://blogs.msdn.com/matt/archive/2008/06/02/writing-a-regular-expression-parser-in-haskell-part-2.aspx">RegexToNFA</A> &#8211; Provides functionality to parse a simple regular expression and return a NFA. 
<OL>
<LI>This modules also define the FiniteMachine type which is a general structure for finite state automata.</LI>
</OL></p>

<p><LI>NFAtoDFA &#8211; Providers functionality to convert a NFA into a DFA. 
<OL>
<LI>This module uses the same FiniteMachine type from RegexToNFA</LI>
</OL></p>

<p><LI>SimpleRegex &#8211; Provides the functionality to give take a regular expression and a string and return what it matches (if it matches anything). 
<OL>
<LI>This modules uses RegexToNFA and sends its results to NFAtoDFA and then uses the resulting DFA to match against a string.</LI>
</OL>
</LI>
</OL></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>This is a very simple and limited regular expression parser.&nbsp; It supports only union(|), concatenation, closure(*) and parenthesis.&nbsp; In addition, I don&#8217;t preserve information after the NFA is created about the location of the parenthesis.&nbsp; This means you can&#8217;t pull out sub-matches when a entire expression matches.</P></p>

<p><P>In my next three I will talk about each module and point out interesting parts of them.&nbsp; There is nothing too complex but shows how to approach it in Haskell (making heavy use of the State monad).</P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P>If you want to see a much more complex and full featured regular expression parser written in Haskell take a look at <A href="http://www.dcs.gla.ac.uk/~meurig/regexp/" target=_blank mce_href="http://www.dcs.gla.ac.uk/~meurig/regexp/">this</A>.</P></p>

<p><P mce_keep="true">&nbsp;</P></p>

<p><P><A href="http://blogs.msdn.com/matt/archive/2008/06/02/writing-a-regular-expression-parser-in-haskell-part-2.aspx" target=_blank mce_href="http://blogs.msdn.com/matt/archive/2008/06/02/writing-a-regular-expression-parser-in-haskell-part-2.aspx">Click here</A> to continue to Part 2.</P></p>
]]></content:encoded>
			<wfw:commentRss>http://matthewmanela.com/2008/03/11/writing-a-regular-expression-parser-in-haskell-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
