Convert HTML to Word document using CKEditor and MariGold.OpenXHTML



MariGold.OpenXHTML is a GitHub open source library to convert HTML documents into word documents. It internally uses Open XML SDK to create word documents. The CKEditor is a popular free tool for formatting the HTML in web sites. By integrating these together, we can develop an online HTML to Word converter. We will create an ASP.NET MVC project to demonstrate this.

Using the code

This tutorial is using Visual Studio 2015 community edition. The first part of this tutorial will be explaining how to integrate CKEditor in MVC project and the second part will be discussing about the conversion of HTML to a word document from the output of CKEditor.

Setup the CKEditor 

Download your preferred package from the CKEditor web site. This tutorial will be using the full package which contains all the plugins to experiment with. Open the visual studio and create a new MVC project with default templates. We can re-use the Home controller and Index cshtml for our demo purpose.

Extract the downloaded CKEditor package and copy the entire ckeditor folder into the Scripts folder.

Remove all the html contents from Index.cshtml and add the following code.

@using (Html.BeginForm("Index", "Home", FormMethod.Post))
    @Html.TextArea("content", new { @id = "editor1" })
    <input type="submit" value="Submit" />

Of Course we need to include the reference of ckeditor.js and a script element at the bottom of the same page to initialize the CKEditor.

<script type="text/javascript" src="~/Scripts/ckeditor/ckeditor.js"></script>

CKEditor is now fully configured and if you run the application it will load in the home page. The next step is to install the MariGold.OpenXHTML and implement an Index post action method on Home controller to submit the HTML content.

Setup the MariGold.OpenXHTML

This library is available as an NugGet package. To install, enter the following command on package manager console.

Install-Package MariGold.OpenXHTML

This will also install the following dependencies.

DocumentFormat.OpenXml – OpenXml SDK library to create Open XML word documents.
MariGold.HtmlParser – To parse and extract the HTML elements from the input text.

Final step is to integrate all these to create the word documents on the fly. Add a new Index method as below on Home controller to post the HTML from CKEditor. Don’t forget to include the necessary namespaces.

using System.Web.Mvc;
using System.IO;
using MariGold.OpenXHTML;
public FileResult Index(string content)
    using (MemoryStream mem = new MemoryStream())
        WordDocument doc = new WordDocument(mem);
        doc.Process(new HtmlParser(content));

        return File(mem.ToArray(), "application/msword", "sample.docx");

Most of the work is done in the WordDocument class. This class contains few properties and methods to manipulate the process of converting HTML into Open XML word document. Refer the GitHub project home page for more details.

Here, we will be using a MemoryStream to create the word document in-memory. The Process method is responsible for parsing the HTML and convert it into word document. This method requires an IParser type implementation for parsing the HTML text. This will help to completely replace default HTML parsing implementation with any other custom implementation. Refer the GitHub project home page on how to implement this.

The Save method is required to flush all the modifications into the MemoryStream. The last line of code will write the content of MemoryStream as a binary array into the FileContentResult. This will force the browser to download the output file.