Converting RTF to HTML

Have you ever had the desire to convert some RTF text into HTML? Probably not. But if you do, then you are in luck! I recently had the need to do this conversion and after some searching found out a way to do it by enhancing a sample distributed in the MSDN library.  The sample is called: XAML to HTML Conversion Demo


The sample has code which converts HTML to and from a XAML Flow Document.  But this doesn’t make things easier until you realize that there is a way to convert RTF to XAML easily. The key is to use System.Windows.Controls.RichTextBox which can load RTF from a stream and save it as XAML.  This conversion is shown below:

private static string ConvertRtfToXaml(string rtfText)
{
    var richTextBox = new RichTextBox();
    if (string.IsNullOrEmpty(rtfText)) return "";
    var textRange = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd);
    using (var rtfMemoryStream = new MemoryStream())
    {
        using (var rtfStreamWriter = new StreamWriter(rtfMemoryStream))
        {
            rtfStreamWriter.Write(rtfText);
            rtfStreamWriter.Flush();
            rtfMemoryStream.Seek(0, SeekOrigin.Begin);
            textRange.Load(rtfMemoryStream, DataFormats.Rtf);
        }
    }
    using (var rtfMemoryStream = new MemoryStream())
    {
        textRange = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd);
        textRange.Save(rtfMemoryStream, DataFormats.Xaml);
        rtfMemoryStream.Seek(0, SeekOrigin.Begin);
        using (var rtfStreamReader = new StreamReader(rtfMemoryStream))
        {
            return rtfStreamReader.ReadToEnd();
        }
    }
}

With this code we have all we need to convert RTF to HTML. I modified the sample to add this RTF To XAML conversation and then I run that XAML through HTML converter which results in the HTML text. I added an interface to these conversion utilities and converted the sample into a library so that I would be able to use it from other projects.  Here is the interface:

public interface IMarkupConverter
{
    string ConvertXamlToHtml(string xamlText);
    string ConvertHtmlToXaml(string htmlText);
    string ConvertRtfToHtml(string rtfText);
}

public class MarkupConverter : IMarkupConverter
{
    public string ConvertXamlToHtml(string xamlText)
    {
        return HtmlFromXamlConverter.ConvertXamlToHtml(xamlText, false);
    }
    public string ConvertHtmlToXaml(string htmlText)
    {
        return HtmlToXamlConverter.ConvertHtmlToXaml(htmlText, true);
    }
    public string ConvertRtfToHtml(string rtfText)
    {
        return RtfToHtmlConverter.ConvertRtfToHtml(rtfText);
    }
}

With this I am now able to convert from RTF to HTML.  However, there is one catch – the conversion uses the RichTextBox WPF control which requires a single threaded apartment (STA).  Therefore in order to run your code that calls the ConvertRtfToHtml function, it must also be running in a STA.  If you can’t have your program run in a STA then you must create a new STA thread to run the conversion. Like this:

MarkupConverter markupConverter = new MarkupConverter();

private string ConvertRtfToHtml(string rtfText)
{
   var thread = new Thread(ConvertRtfInSTAThread);
   var threadData = new ConvertRtfThreadData { RtfText = rtfText };
   thread.SetApartmentState(ApartmentState.STA);
   thread.Start(threadData);
   thread.Join();
   return threadData.HtmlText;
}

private void ConvertRtfInSTAThread(object rtf)
{
   var threadData = rtf as ConvertRtfThreadData;
   threadData.HtmlText = markupConverter.ConvertRtfToHtml(threadData.RtfText);
}

private class ConvertRtfThreadData
{
   public string RtfText { get; set; }
   public string HtmlText { get; set; }
}
This entry was posted in C#, HTML, RTF, WPF, XAML. Bookmark the permalink.

15 Responses to Converting RTF to HTML

  1. Pingback: C# - Converting between RTF to HTML and HTML to RTF - Matthew Manela - Farblondzshet in Code

  2. ashutosh says:

    Hi Matthew,

    First of all, my appreciation for your excellent code on converting between rtf and htm.
    Now, heres the query – Is it possible to keep the ‘tracked changes’ formatting intact while this conversion occurs. If we use the code (http://code.msdn.microsoft.com/Converting-between-RTF-and-aaa02a6e) to perform converstion, the tracked changes in the rtf example below are lost :

    rtf1sste16000ansideflang1033ftnbjuc1deff0
    {fonttbl{f0 fnil fcharset0 Arial;}}
    {colortbl ;red255green255blue255 ;red0green0blue0 ;}
    {stylesheet{f0fs24 Normal;}{cs1 Default Paragraph Font;}}
    {*revtbl{Unknown;}{atiwari1;}}
    paperw12240paperh15840margl1800margr1800margt1440margb1440headery720footery720nogrowautofitdeftab720formshadefet4aendnotesaftnnrlcpgbrdrheadpgbrdrfoot
    sectdpgwsxn12240pghsxn15840marglsxn1800margrsxn1800margtsxn1440margbsxn1440headery720footery720sbkpagepgncontpgndec
    plainplainf0fs24pardplainf0fs24plainlang1033hichf0dbchf0lochf0fs20 This is my plainlang1033hichf0dbchf0lochf0fs20revisedrevauth1revdttm651739769 Secondplainlang1033hichf0dbchf0lochf0fs20deletedrevauthdel1revdttmdel651739769
    firstplainlang1033hichf0dbchf0lochf0fs20 line of text.par
    }

    (you can open the above .rtf in word and check out).

    Thanks
    Ashutosh

  3. Matthew says:

    The support for RTF I am using it limited by the WPF RichText box control. That control does not seem to support the revision tags so my sample also doesn’t support those tags.

  4. SivSakthi says:

    Hi..
    Excellent to see your code.
    I am getting an error at this line

    var richTextBox = new RichTextBox();

    {“The calling thread must be STA, because many UI components require this.”}

    at System.Windows.Input.InputManager..ctor()
    at System.Windows.Input.InputManager.GetCurrentInputManagerImpl()
    at System.Windows.Input.KeyboardNavigation..ctor()
    at System.Windows.FrameworkElement.FrameworkServices..ctor()
    at System.Windows.FrameworkElement.EnsureFrameworkServices()
    at System.Windows.FrameworkElement..ctor()
    at System.Windows.Controls.Control..ctor()
    at System.Windows.Controls.Primitives.TextBoxBase..ctor()
    at System.Windows.Controls.RichTextBox..ctor(FlowDocument document)
    at System.Windows.Controls.RichTextBox..ctor()
    at MarkupConverter.HtmlToRtfConverter.ConvertXamlToRtf(String xamlText) in C:\Users\smurugesan\Downloads\MarkupConverter\MarkupConverter\HtmlToRtfConverter.cs:line 22
    at MarkupConverter.HtmlToRtfConverter.ConvertHtmlToRtf(String htmlText) in C:\Users\smurugesan\Downloads\MarkupConverter\MarkupConverter\HtmlToRtfConverter.cs:line 17
    at WebApplication1._Default.btnConvert_Click(Object sender, EventArgs e) in c:\users\smurugesan\documents\visual studio 2010\Projects\ConversionHTMLtoRTF\WebApplication1\Default.aspx.cs:line 25
    at System.Web.UI.WebControls.Button.OnClick(EventArgs e)
    at System.Web.UI.WebControls.Button.RaisePostBackEvent(String eventArgument)
    at System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String eventArgument)
    at System.Web.UI.Page.RaisePostBackEvent(IPostBackEventHandler sourceControl, String eventArgument)
    at System.Web.UI.Page.RaisePostBackEvent(NameValueCollection postData)
    at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)

    Can you please clarify this..

  5. SivSakthi says:

    I am implementing your code at the web site. i.e Using your .dll in the web application.

    but your code works fine at the windows side.

  6. Matthew says:

    Did you follow the instructions at the end of the post where I show how to run the conversion inside of an STA thread? You will need to do this for the conversion to work inside of a website.

  7. Hello, did you tried this using images? i’m not able to see it…

  8. Gau says:

    Thanks for the code.
    I tried implementing this using STA. I am calling this function from SSRS.
    Visual Studio crashes when I try to run SSRS with RTF to Html conversion.

    Do you know of any such problem?

  9. Amin says:

    hi mr. mat,

    i don’t know c#, would u write ur code as a vb.net function(s) with a rtf stream as input & html stream as output, please?

    thanks in advance.

  10. Pingback: The Guide to Chm – Escape XML entities • Onderweg Blog

  11. tayyab says:

    i am implementing your code in web site .i want to convert html to rtf but there is STA thread exception occured.Can u tell me where in which class i have to write STA Thread Code

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>