Converting RTF to HTML

Have you ever had the desire to convert some RTF text into HTML? Probably not. But if you do, then you are in luck! I recently had the need to do this conversion and after some searching found out a way to do it by enhancing a sample distributed in the MSDN library.  The sample is called: XAML to HTML Conversion Demo


The sample has code which converts HTML to and from a XAML Flow Document.  But this doesn’t make things easier until you realize that there is a way to convert RTF to XAML easily. The key is to use System.Windows.Controls.RichTextBox which can load RTF from a stream and save it as XAML.  This conversion is shown below:

private static string ConvertRtfToXaml(string rtfText)
{
    var richTextBox = new RichTextBox();
    if (string.IsNullOrEmpty(rtfText)) return "";
    var textRange = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd);
    using (var rtfMemoryStream = new MemoryStream())
    {
        using (var rtfStreamWriter = new StreamWriter(rtfMemoryStream))
        {
            rtfStreamWriter.Write(rtfText);
            rtfStreamWriter.Flush();
            rtfMemoryStream.Seek(0, SeekOrigin.Begin);
            textRange.Load(rtfMemoryStream, DataFormats.Rtf);
        }
    }
    using (var rtfMemoryStream = new MemoryStream())
    {
        textRange = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd);
        textRange.Save(rtfMemoryStream, DataFormats.Xaml);
        rtfMemoryStream.Seek(0, SeekOrigin.Begin);
        using (var rtfStreamReader = new StreamReader(rtfMemoryStream))
        {
            return rtfStreamReader.ReadToEnd();
        }
    }
}

With this code we have all we need to convert RTF to HTML. I modified the sample to add this RTF To XAML conversation and then I run that XAML through HTML converter which results in the HTML text. I added an interface to these conversion utilities and converted the sample into a library so that I would be able to use it from other projects.  Here is the interface:

public interface IMarkupConverter
{
    string ConvertXamlToHtml(string xamlText);
    string ConvertHtmlToXaml(string htmlText);
    string ConvertRtfToHtml(string rtfText);
}

public class MarkupConverter : IMarkupConverter
{
    public string ConvertXamlToHtml(string xamlText)
    {
        return HtmlFromXamlConverter.ConvertXamlToHtml(xamlText, false);
    }
    public string ConvertHtmlToXaml(string htmlText)
    {
        return HtmlToXamlConverter.ConvertHtmlToXaml(htmlText, true);
    }
    public string ConvertRtfToHtml(string rtfText)
    {
        return RtfToHtmlConverter.ConvertRtfToHtml(rtfText);
    }
}

With this I am now able to convert from RTF to HTML.  However, there is one catch – the conversion uses the RichTextBox WPF control which requires a single threaded apartment (STA).  Therefore in order to run your code that calls the ConvertRtfToHtml function, it must also be running in a STA.  If you can’t have your program run in a STA then you must create a new STA thread to run the conversion. Like this:

MarkupConverter markupConverter = new MarkupConverter();

private string ConvertRtfToHtml(string rtfText)
{
   var thread = new Thread(ConvertRtfInSTAThread);
   var threadData = new ConvertRtfThreadData { RtfText = rtfText };
   thread.SetApartmentState(ApartmentState.STA);
   thread.Start(threadData);
   thread.Join();
   return threadData.HtmlText;
}

private void ConvertRtfInSTAThread(object rtf)
{
   var threadData = rtf as ConvertRtfThreadData;
   threadData.HtmlText = markupConverter.ConvertRtfToHtml(threadData.RtfText);
}


private class ConvertRtfThreadData
{
   public string RtfText { get; set; }
   public string HtmlText { get; set; }
}
This entry was posted in C#, HTML, RTF, WPF, XAML. Bookmark the permalink.

28 Responses to Converting RTF to HTML

  1. Pingback: C# - Converting between RTF to HTML and HTML to RTF - Matthew Manela - Farblondzshet in Code

  2. ashutosh says:

    Hi Matthew,

    First of all, my appreciation for your excellent code on converting between rtf and htm.
    Now, heres the query – Is it possible to keep the ‘tracked changes’ formatting intact while this conversion occurs. If we use the code (http://code.msdn.microsoft.com/Converting-between-RTF-and-aaa02a6e) to perform converstion, the tracked changes in the rtf example below are lost :

    rtf1sste16000ansideflang1033ftnbjuc1deff0
    {fonttbl{f0 fnil fcharset0 Arial;}}
    {colortbl ;red255green255blue255 ;red0green0blue0 ;}
    {stylesheet{f0fs24 Normal;}{cs1 Default Paragraph Font;}}
    {*revtbl{Unknown;}{atiwari1;}}
    paperw12240paperh15840margl1800margr1800margt1440margb1440headery720footery720nogrowautofitdeftab720formshadefet4aendnotesaftnnrlcpgbrdrheadpgbrdrfoot
    sectdpgwsxn12240pghsxn15840marglsxn1800margrsxn1800margtsxn1440margbsxn1440headery720footery720sbkpagepgncontpgndec
    plainplainf0fs24pardplainf0fs24plainlang1033hichf0dbchf0lochf0fs20 This is my plainlang1033hichf0dbchf0lochf0fs20revisedrevauth1revdttm651739769 Secondplainlang1033hichf0dbchf0lochf0fs20deletedrevauthdel1revdttmdel651739769
    firstplainlang1033hichf0dbchf0lochf0fs20 line of text.par
    }

    (you can open the above .rtf in word and check out).

    Thanks
    Ashutosh

  3. Matthew says:

    The support for RTF I am using it limited by the WPF RichText box control. That control does not seem to support the revision tags so my sample also doesn’t support those tags.

  4. SivSakthi says:

    Hi..
    Excellent to see your code.
    I am getting an error at this line

    var richTextBox = new RichTextBox();

    {“The calling thread must be STA, because many UI components require this.”}

    at System.Windows.Input.InputManager..ctor()
    at System.Windows.Input.InputManager.GetCurrentInputManagerImpl()
    at System.Windows.Input.KeyboardNavigation..ctor()
    at System.Windows.FrameworkElement.FrameworkServices..ctor()
    at System.Windows.FrameworkElement.EnsureFrameworkServices()
    at System.Windows.FrameworkElement..ctor()
    at System.Windows.Controls.Control..ctor()
    at System.Windows.Controls.Primitives.TextBoxBase..ctor()
    at System.Windows.Controls.RichTextBox..ctor(FlowDocument document)
    at System.Windows.Controls.RichTextBox..ctor()
    at MarkupConverter.HtmlToRtfConverter.ConvertXamlToRtf(String xamlText) in C:\Users\smurugesan\Downloads\MarkupConverter\MarkupConverter\HtmlToRtfConverter.cs:line 22
    at MarkupConverter.HtmlToRtfConverter.ConvertHtmlToRtf(String htmlText) in C:\Users\smurugesan\Downloads\MarkupConverter\MarkupConverter\HtmlToRtfConverter.cs:line 17
    at WebApplication1._Default.btnConvert_Click(Object sender, EventArgs e) in c:\users\smurugesan\documents\visual studio 2010\Projects\ConversionHTMLtoRTF\WebApplication1\Default.aspx.cs:line 25
    at System.Web.UI.WebControls.Button.OnClick(EventArgs e)
    at System.Web.UI.WebControls.Button.RaisePostBackEvent(String eventArgument)
    at System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String eventArgument)
    at System.Web.UI.Page.RaisePostBackEvent(IPostBackEventHandler sourceControl, String eventArgument)
    at System.Web.UI.Page.RaisePostBackEvent(NameValueCollection postData)
    at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)

    Can you please clarify this..

  5. SivSakthi says:

    I am implementing your code at the web site. i.e Using your .dll in the web application.

    but your code works fine at the windows side.

  6. Matthew says:

    Did you follow the instructions at the end of the post where I show how to run the conversion inside of an STA thread? You will need to do this for the conversion to work inside of a website.

  7. Hello, did you tried this using images? i’m not able to see it…

  8. Gau says:

    Thanks for the code.
    I tried implementing this using STA. I am calling this function from SSRS.
    Visual Studio crashes when I try to run SSRS with RTF to Html conversion.

    Do you know of any such problem?

  9. Amin says:

    hi mr. mat,

    i don’t know c#, would u write ur code as a vb.net function(s) with a rtf stream as input & html stream as output, please?

    thanks in advance.

  10. Pingback: The Guide to Chm – Escape XML entities • Onderweg Blog

  11. tayyab says:

    i am implementing your code in web site .i want to convert html to rtf but there is STA thread exception occured.Can u tell me where in which class i have to write STA Thread Code

  12. jma says:

    Thank you Matthew for this topic. The solution works great in a standalone program.

    I’m implementing this solution for a SSRS report through a DLL, to convert RTF to HTML in a textbox in Reporting Services.
    But, when I execute the Report, Visual Studio crashes (like Gau). The error come from the line
    textRange.Load(rtfMemoryStream, DataFormats.Rtf);
    with the exception : “System.ArgumentException : ‘Rich Text Format’ data not supported” for the second parameter DataFormats.Rtf.
    Any idea ?

  13. Michele says:

    Hi, i’m using converter to parse inputs from WPF ritchtextbox in standalone program that send mails.
    Text is edited in WPF richtextbox and converted in HTML before sending.
    My problems is that hyperlinks are cutted off from the result of the conversion…
    Missing implementation?

  14. Nitin says:

    Hi Matthew,
    Firstly Thanks,
    I am getting an error at this line textRange.Load(xamlMemoryStream, DataFormats.Xaml);

    Cannot convert string ’0,0,1.5em,1.5em’ in attribute ‘Margin’ to object of type ‘System.Windows.Thickness’. ’1.5em’ string cannot be converted to Length.
    Error at object ‘System.Windows.Documents.List’, Line 1 Position 337.

    Below is the HTML which I am using.

    This is just testing:Testing 1Testing 2Testing 3* Testing 4.

    Can you please suggest?

    Regards,
    Nitin

  15. jack griffin says:

    Sorry for the misplaced HTML tags in previous comment : I hope I did it properly this time. If I did not, please forgive me as there is no preview :-)
    Hi.
    First of all, thank you for your code!
    I downloaded the code from Converting between RTF and HTML
    and use it in a C# 2010 Express project .
    It is pretty simple :
    user copies some code lines from Visual Studio and gets the resulting HTML code.
    It works nicely, except that it fails to add leading spaces properly.
    I’ll give a sample of this :
    This RTF has been copied from Visual Studio; it has 12 spaces on the left of line # 2
    and 8 spaces on the left of line # 3 :


    {\rtf1\ansi\deff0{\fonttbl{\f0\fnil Courier New;}}
    {\colortbl ;\red0\green0\blue255;\red43\green145\blue175;}
    \viewkind4\uc1\pard\cf1\lang1033\f0\fs18 private\cf0 \cf1 void\cf0 textBox2_TextChanged(\cf1 object\cf0 sender, \cf2 EventArgs\cf0 e) \{ \par
    webBrowser1.DocumentText = txtHtml.Text;\par
    \}\par
    }

    When converted to HTML, those spaces are lost , meaning that they are not converted to
    Ampersand + nbsp; as expected.
    Can you suggest a way to resolve this ?

  16. abtahi says:

    Hi.
    First of all, thank you for your code!
    I am using image in ritchtextbox with text .All of them convert to html except image
    how to convert image from rtf format to html format?
    please help me
    thanks in advance.

  17. Joan Josep says:

    Is there any way to port the project to WinRT?

  18. I think that what you posted made a bunch of
    sense. But, what about this? suppose you were to create a awesome title?
    I am not suggesting your information isn’t solid, however what if you added
    a post title that makes people want more? I mean C# – Converting RTF to HTML – Matthew Manela – Farblondzshet in Code is kinda vanilla.
    You might look at Yahoo’s home page and see how they create
    post titles to grab people to open the links. You might add
    a video or a related pic or two to grab readers excited about everything’ve written. In my opinion, it
    might bring your posts a little bit more interesting.

    My site – blog creation sites

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>