ProblemHere I will put together links and tips on how to handle the following scenario. I need to produce a PDF documentation from a set of Word files in .doc and .docx formats. These have some content that changes and, to handle that, I would create a "template" file where the content would be adjusted on the fly. Then the documents would be converted to PDF format for distribution. Details on implementation are below.
Working with Template FilesTemplate files are just formatted text files. These can be .docx, .doc, or .odt formats. Custom tags are inserted into the file originally, and then replaced when the custom output is required. The purpose is to have a visual designer for documents and also a system that is easily customizable but, at the same time, easy to read and manipulate by a machine.
Word XML (.docx)There is a library for working with .docx format, for .Net platform:
Word (.doc)These files can be read using MS Office or Libre Office.
Open Document Format (.odt)
Direct ManipulationWhile reading the template files can be done using native libraries, the templates are simple enough for a quick find/replace functionality of the plain text. Both .docx and .odt are, in fact, Zip compressed collection of XML files.
- Zip packages are supported directly by the .Net Framework, using ZipPackage class.
- Zip files are accessible through ZipFile class.
Converting to PDFWord files are converted to PDF using LibreOffice.
- Earlier mentioned AODL library can export files as PDF.
- CLI-UNO Language Bindings also can be used to export .odt files to PDF in .Net directly.
- Converting Microsoft Word Document to PDF format using OpenOffice.org (Portable), on CodeProject link
- HOW TO: Convert office documents to PDF using Open Office in C# (link)
- Programmatically convert Word (docx) to PDF, on StackOverflow (link)
C:\Program Files (x86)\LibreOffice 4\URE\bin\cli_uno.dll