Converting Html to Pdf using C# and magic!

0
50

Everyone at some point needs to convert and html page into a PDF for some reason or another. For this I’ve always used a component which was cheap at the time (and is far from it now). This component has worked well for the last 8(ish) years, most ish in the last while because it doesn’t deal well with https sites. When contacting the vendor they said hey but our latest version which can be expected I guess because it’s 8 years down the line but then came the it’s ALL OF THE $ Smile. The last issue is that the component I used doesn’t work in Azure. That lead me to some investigation and then ultimately this post.

What’s the problem?

We have 3 problems

  1. We need to convert html 2 pdf
  2. We it not to break the bank as there potentially aren’t high returns on it’s usage
  3. The last one is the specific component I used doesn’t work in Azure web apps because of the way permissions work there

Easy to solve right, well if you reading this because you needed this component then yes but if you were me Googling with Bing for 40 minutes then writing some code that makes me all kinds of dirty then that yes becomes a ja, kind of but it works so here it is Open-mouthed smile.

What’s the primary component?

The component being used for all the magic is wkhtmltopdf which is an open source project licensed under the LGPLv3 license. The component is command line based which renders HTML into PDF using the Qt WebKit rendering engine. It run entirely “headless” and do not require a display or display service.

With little work you can for example generate a pdf of Google’s home page with the script below

wkhtmltopdf.exe "https://google.com" "google.pdf"

That will go off an think for a couple seconds and return you a pdf Smile

image

Super simple right? There is also a bunch of arguments that you can pass into the exe which you can find on the projects site from the basics like changing orientation to passing in a username and password for authenticated pages that you want to generate PDF’s for.

How we using it?

As mentioned above this is a command line utility so we are basically just wrapping it in a standard execution of cmd from C# and then waiting for the pdf to generate and returning it to the caller.

using System; using System.IO; using System.Threading.Tasks; using System.Web; namespace Html2Pdf.Lib { public static class TheMagic { public static async Task<byte[]> Go(string url, int timeoutInSeconds = 30, string pathToExe = null) { if (pathToExe == null) { pathToExe = $@"{Path.GetDirectoryName(typeof(TheMagic).Assembly.Location)}\wkhtmltopdf.exe"; if (!File.Exists(pathToExe)) { pathToExe = HttpContext.Current.Server.MapPath("~/bin/wkhtmltopdf.exe"); } } var timeout = DateTime.UtcNow.AddSeconds(timeoutInSeconds); var savePdfTo = Path.GetTempFileName(); var t = Task.Run(() => GeneratePdf(url, savePdfTo, pathToExe)); while (!t.IsCompleted) { if (timeout < DateTime.UtcNow) { break; } await Task.Delay(250); } while (!File.Exists(savePdfTo)) { if (timeout < DateTime.UtcNow) { break; } await Task.Delay(250); } while (File.GetLastWriteTimeUtc(savePdfTo).AddSeconds(2) >= DateTime.UtcNow) { if (timeout < DateTime.UtcNow) { break; } await Task.Delay(250); } var bytes = File.ReadAllBytes(savePdfTo); try { File.Delete(savePdfTo); } catch { } return bytes; } private static void GeneratePdf(string url, string targetLocation, string pathToExe) { ExecuteCommand(pathToExe, $@"""{url}"" ""{targetLocation}"""); } public static string ExecuteCommand(string pathToExe, string args) { try { System.Diagnostics.ProcessStartInfo procStartInfo = new System.Diagnostics.ProcessStartInfo(pathToExe, args); procStartInfo.UseShellExecute = false; procStartInfo.CreateNoWindow = true; System.Diagnostics.Process proc = new System.Diagnostics.Process(); procStartInfo.RedirectStandardOutput = true; proc.StartInfo = procStartInfo; proc.Start(); proc.WaitForExit(); } catch { } return null; } } }

And you call this code like the below example.

public async Task DownloadHomePageAsPdf() { var bytes = await TheMagic.Go($"{Request.Url.GetLeftPart(UriPartial.Authority)}/Home"); return File(bytes, "application/pdf"); }

I’m not sure how stable this code is to run in production but in the basic testing I’ve done it gets the job done and without any issues so far

image

So we’ve solved all of our problems

  1. We are converting html 2 pdf
  2. It’s free, can’t get cheaper then that (unless you want to pay me for it Smile with tongue out)
  3. And although not shown in the blog (because I could have got the screenshots from anywhere) this code works in Azure.

Conclusion

Anything is possible. The interesting thing about this solution is that you see the guts of what looks like a messy solution but don’t realize that under the covers if you are using a 3rd party component they are probably doing something similar (or worst) but the key takeaway is that it works Open-mouthed smile.

The code used for this sample will be available on GitHub shortly if you want to download it and see it working for yourself.

Do you know of any cool converting components? Why not share them below in the comments Smile

Happy converting!

LEAVE A REPLY