Converting RTF to HTML

Have you ever had the desire to convert some RTF text into HTML? Probably not. But if you do, then you are in luck! I recently had the need to do this conversion and after some searching found out a way to do it by enhancing a sample distributed in the MSDN library.  The sample is called: XAML to HTML Conversion Demo


The sample has code which converts HTML to and from a XAML Flow Document.  But this doesn’t make things easier until you realize that there is a way to convert RTF to XAML easily. The key is to use System.Windows.Controls.RichTextBox which can load RTF from a stream and save it as XAML.  This conversion is shown below:

private static string ConvertRtfToXaml(string rtfText)
{
    var richTextBox = new RichTextBox();
    if (string.IsNullOrEmpty(rtfText)) return "";
    var textRange = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd);
    using (var rtfMemoryStream = new MemoryStream())
    {
        using (var rtfStreamWriter = new StreamWriter(rtfMemoryStream))
        {
            rtfStreamWriter.Write(rtfText);
            rtfStreamWriter.Flush();
            rtfMemoryStream.Seek(0, SeekOrigin.Begin);
            textRange.Load(rtfMemoryStream, DataFormats.Rtf);
        }
    }
    using (var rtfMemoryStream = new MemoryStream())
    {
        textRange = new TextRange(richTextBox.Document.ContentStart, richTextBox.Document.ContentEnd);
        textRange.Save(rtfMemoryStream, DataFormats.Xaml);
        rtfMemoryStream.Seek(0, SeekOrigin.Begin);
        using (var rtfStreamReader = new StreamReader(rtfMemoryStream))
        {
            return rtfStreamReader.ReadToEnd();
        }
    }
}

With this code we have all we need to convert RTF to HTML. I modified the sample to add this RTF To XAML conversation and then I run that XAML through HTML converter which results in the HTML text. I added an interface to these conversion utilities and converted the sample into a library so that I would be able to use it from other projects.  Here is the interface:

public interface IMarkupConverter
{
    string ConvertXamlToHtml(string xamlText);
    string ConvertHtmlToXaml(string htmlText);
    string ConvertRtfToHtml(string rtfText);
}

public class MarkupConverter : IMarkupConverter
{
    public string ConvertXamlToHtml(string xamlText)
    {
        return HtmlFromXamlConverter.ConvertXamlToHtml(xamlText, false);
    }
    public string ConvertHtmlToXaml(string htmlText)
    {
        return HtmlToXamlConverter.ConvertHtmlToXaml(htmlText, true);
    }
    public string ConvertRtfToHtml(string rtfText)
    {
        return RtfToHtmlConverter.ConvertRtfToHtml(rtfText);
    }
}

With this I am now able to convert from RTF to HTML.  However, there is one catch – the conversion uses the RichTextBox WPF control which requires a single threaded apartment (STA).  Therefore in order to run your code that calls the ConvertRtfToHtml function, it must also be running in a STA.  If you can’t have your program run in a STA then you must create a new STA thread to run the conversion. Like this:

MarkupConverter markupConverter = new MarkupConverter();

private string ConvertRtfToHtml(string rtfText)
{
   var thread = new Thread(ConvertRtfInSTAThread);
   var threadData = new ConvertRtfThreadData { RtfText = rtfText };
   thread.SetApartmentState(ApartmentState.STA);
   thread.Start(threadData);
   thread.Join();
   return threadData.HtmlText;
}

private void ConvertRtfInSTAThread(object rtf)
{
   var threadData = rtf as ConvertRtfThreadData;
   threadData.HtmlText = markupConverter.ConvertRtfToHtml(threadData.RtfText);
}


private class ConvertRtfThreadData
{
   public string RtfText { get; set; }
   public string HtmlText { get; set; }
}
 
  • Pingback: C# - Converting between RTF to HTML and HTML to RTF - Matthew Manela - Farblondzshet in Code()

  • ashutosh

    Hi Matthew,

    First of all, my appreciation for your excellent code on converting between rtf and htm.
    Now, heres the query – Is it possible to keep the ‘tracked changes’ formatting intact while this conversion occurs. If we use the code (http://code.msdn.microsoft.com/Converting-between-RTF-and-aaa02a6e) to perform converstion, the tracked changes in the rtf example below are lost :

    rtf1sste16000ansideflang1033ftnbjuc1deff0
    {fonttbl{f0 fnil fcharset0 Arial;}}
    {colortbl ;red255green255blue255 ;red0green0blue0 ;}
    {stylesheet{f0fs24 Normal;}{cs1 Default Paragraph Font;}}
    {*revtbl{Unknown;}{atiwari1;}}
    paperw12240paperh15840margl1800margr1800margt1440margb1440headery720footery720nogrowautofitdeftab720formshadefet4aendnotesaftnnrlcpgbrdrheadpgbrdrfoot
    sectdpgwsxn12240pghsxn15840marglsxn1800margrsxn1800margtsxn1440margbsxn1440headery720footery720sbkpagepgncontpgndec
    plainplainf0fs24pardplainf0fs24plainlang1033hichf0dbchf0lochf0fs20 This is my plainlang1033hichf0dbchf0lochf0fs20revisedrevauth1revdttm651739769 Secondplainlang1033hichf0dbchf0lochf0fs20deletedrevauthdel1revdttmdel651739769
    firstplainlang1033hichf0dbchf0lochf0fs20 line of text.par
    }

    (you can open the above .rtf in word and check out).

    Thanks
    Ashutosh

  • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

    The support for RTF I am using it limited by the WPF RichText box control. That control does not seem to support the revision tags so my sample also doesn’t support those tags.

  • SivSakthi

    Hi..
    Excellent to see your code.
    I am getting an error at this line

    var richTextBox = new RichTextBox();

    {“The calling thread must be STA, because many UI components require this.”}

    at System.Windows.Input.InputManager..ctor()
    at System.Windows.Input.InputManager.GetCurrentInputManagerImpl()
    at System.Windows.Input.KeyboardNavigation..ctor()
    at System.Windows.FrameworkElement.FrameworkServices..ctor()
    at System.Windows.FrameworkElement.EnsureFrameworkServices()
    at System.Windows.FrameworkElement..ctor()
    at System.Windows.Controls.Control..ctor()
    at System.Windows.Controls.Primitives.TextBoxBase..ctor()
    at System.Windows.Controls.RichTextBox..ctor(FlowDocument document)
    at System.Windows.Controls.RichTextBox..ctor()
    at MarkupConverter.HtmlToRtfConverter.ConvertXamlToRtf(String xamlText) in C:\Users\smurugesan\Downloads\MarkupConverter\MarkupConverter\HtmlToRtfConverter.cs:line 22
    at MarkupConverter.HtmlToRtfConverter.ConvertHtmlToRtf(String htmlText) in C:\Users\smurugesan\Downloads\MarkupConverter\MarkupConverter\HtmlToRtfConverter.cs:line 17
    at WebApplication1._Default.btnConvert_Click(Object sender, EventArgs e) in c:\users\smurugesan\documents\visual studio 2010\Projects\ConversionHTMLtoRTF\WebApplication1\Default.aspx.cs:line 25
    at System.Web.UI.WebControls.Button.OnClick(EventArgs e)
    at System.Web.UI.WebControls.Button.RaisePostBackEvent(String eventArgument)
    at System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String eventArgument)
    at System.Web.UI.Page.RaisePostBackEvent(IPostBackEventHandler sourceControl, String eventArgument)
    at System.Web.UI.Page.RaisePostBackEvent(NameValueCollection postData)
    at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)

    Can you please clarify this..

  • SivSakthi

    I am implementing your code at the web site. i.e Using your .dll in the web application.

    but your code works fine at the windows side.

  • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

    Did you follow the instructions at the end of the post where I show how to run the conversion inside of an STA thread? You will need to do this for the conversion to work inside of a website.

    • tayyab

      where to write this code

      • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

        You can copy and paste the code that I show at the end of the blog post where you want to perform the conversion.

  • http://www.f5itweb.pt Marco Teodoro

    Hello, did you tried this using images? i’m not able to see it…

    • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

      Hey Marco,

      I don’t think this works with images. I haven’t had a chance to dig deeper to see if it is possible using this method.

  • Gau

    Thanks for the code.
    I tried implementing this using STA. I am calling this function from SSRS.
    Visual Studio crashes when I try to run SSRS with RTF to Html conversion.

    Do you know of any such problem?

    • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

      I have not heard of such a problem. Can you show me the code you are using to launch the thread and the code to run the conversion?

  • Amin

    hi mr. mat,

    i don’t know c#, would u write ur code as a vb.net function(s) with a rtf stream as input & html stream as output, please?

    thanks in advance.

  • Pingback: The Guide to Chm – Escape XML entities • Onderweg Blog()

  • tayyab

    i am implementing your code in web site .i want to convert html to rtf but there is STA thread exception occured.Can u tell me where in which class i have to write STA Thread Code

    • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

      Wherever you call one of the Convert methods you need to first wrap it in a STA thread

  • jma

    Thank you Matthew for this topic. The solution works great in a standalone program.

    I’m implementing this solution for a SSRS report through a DLL, to convert RTF to HTML in a textbox in Reporting Services.
    But, when I execute the Report, Visual Studio crashes (like Gau). The error come from the line
    textRange.Load(rtfMemoryStream, DataFormats.Rtf);
    with the exception : “System.ArgumentException : ‘Rich Text Format’ data not supported” for the second parameter DataFormats.Rtf.
    Any idea ?

    • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

      Not sure off hand. Can you post the full exception/stack trace?

  • Michele

    Hi, i’m using converter to parse inputs from WPF ritchtextbox in standalone program that send mails.
    Text is edited in WPF richtextbox and converted in HTML before sending.
    My problems is that hyperlinks are cutted off from the result of the conversion…
    Missing implementation?

    • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

      Yes, it was missing the code to support hyperlinks, I just added it. You can download a new version from the Code Sample Gallery page or from its GitHub repository

  • Nitin

    Hi Matthew,
    Firstly Thanks,
    I am getting an error at this line textRange.Load(xamlMemoryStream, DataFormats.Xaml);

    Cannot convert string ‘0,0,1.5em,1.5em’ in attribute ‘Margin’ to object of type ‘System.Windows.Thickness’. ‘1.5em’ string cannot be converted to Length.
    Error at object ‘System.Windows.Documents.List’, Line 1 Position 337.

    Below is the HTML which I am using.

    This is just testing:Testing 1Testing 2Testing 3* Testing 4.

    Can you please suggest?

    Regards,
    Nitin

  • jack griffin
  • jack griffin

    Sorry for the misplaced HTML tags in previous comment : I hope I did it properly this time. If I did not, please forgive me as there is no preview :-)
    Hi.
    First of all, thank you for your code!
    I downloaded the code from Converting between RTF and HTML
    and use it in a C# 2010 Express project .
    It is pretty simple :
    user copies some code lines from Visual Studio and gets the resulting HTML code.
    It works nicely, except that it fails to add leading spaces properly.
    I’ll give a sample of this :
    This RTF has been copied from Visual Studio; it has 12 spaces on the left of line # 2
    and 8 spaces on the left of line # 3 :


    {\rtf1\ansi\deff0{\fonttbl{\f0\fnil Courier New;}}
    {\colortbl ;\red0\green0\blue255;\red43\green145\blue175;}
    \viewkind4\uc1\pard\cf1\lang1033\f0\fs18 private\cf0 \cf1 void\cf0 textBox2_TextChanged(\cf1 object\cf0 sender, \cf2 EventArgs\cf0 e) \{ \par
    webBrowser1.DocumentText = txtHtml.Text;\par
    \}\par
    }

    When converted to HTML, those spaces are lost , meaning that they are not converted to
    Ampersand + nbsp; as expected.
    Can you suggest a way to resolve this ?

    • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

      I haven’t looked at this in a while but I open source the code on this github page. Feel free to fork it and experiment with a fix. I can then integrate that fix back into the sample.

  • abtahi

    Hi.
    First of all, thank you for your code!
    I am using image in ritchtextbox with text .All of them convert to html except image
    how to convert image from rtf format to html format?
    please help me
    thanks in advance.

  • Joan Josep

    Is there any way to port the project to WinRT?

    • https://www.google.com/accounts/o8/id?id=AItOawlrLeowWytSCscAcNv3ky4tdtP7AcgDAC8 Matthew

      I am not sure I have never looked at that. The code is on github so feel free to fork it and try it out!