Advanced HTML with Org mode to impress your friends

posted
2023-09-02

This time, let’s indulge in the favourite pastime of every blog-building hacker out there: blogging about blogging.

This website replaced my decade-old and dormant site. Over time, the threshold to post got higher and higher. I wasn’t interested in re-learning how to build it just to write something.

I’ve since learned from my mistake. This site is built with Org mode using org-publish. I use Org daily, and there isn’t really anything to re-learn. Everything happens from Emacs, and there are zero additional dependencies to download.

Building the site

I’m not going to discuss how to set up org-publish, there are plenty of resources online. I’ve linked some posts that I found exceptionally helpful1, 2, 3.

Essentially, I’ve set up a build-site.el script that runs Emacs with emacs -Q --script to invoke org-publish. This way the build can be fully automated with source control, behaving more like a usual static site generator, although you can still invoke it directly from the editor.

HTML export variables

org-html exposes a number of variables to customise the HTML output, for example:

  • org-html-doctype
  • org-html-head
  • org-html-head-extra
  • org-html-meta-tags
  • org-html-divs
  • org-html-preamble
  • org-html-postamble
  • org-html-scripts

For a full list, see M-x describe-variable ^org-html-. Most notable are org-html-{pre,post}amble, which people have used for all kinds of layouts.

However, all of these options are fairly limited. Conceptually simple changes, like placing the date and author information below the title, or changing the “Table of Contents” heading, are surprisingly tricky with these.

Now that I’ve learned about the better way to achieve these types of changes, I wish org-html didn’t have some of these variables at all. They make some narrow use cases easy, but at the expense of obscuring more powerful concepts behind org-export.

Demystifying org-export

org-export supports swappable backends for formats. These are defined using org-export-define-backend which takes a unique backend name and an alist of transcoders. Broadly speaking, transcoders are functions that take an Org element plus some metadata and return a string representing it in the output format.

Here’s an example of a minimal S-expression backend:4

(defun indent-string (string width)
  (org-ascii--indent-string (string-trim string) width))

(defun sexp-template (contents info)
  (format "(template\n%s)"
          (indent-string contents 1)))
(defun sexp-inner-template (contents info)
  (format "(inner-template\n%s)"
          (indent-string contents 1)))
(defun sexp-section (section contents info)
  (format "(section\n%s)"
          (indent-string contents 1)))
(defun sexp-headline (headline contents info)
  (format "(headline %s\n%s)"
          (org-export-data (org-element-property :title headline)
                           info)
          (indent-string contents 1)))
(defun sexp-paragraph (paragraph contents info)
  (format "(paragraph %s)" contents))
(defun sexp-plain-text (text info)
  (format "\"%s\"" (string-replace "\n" "\\n" text)))

(org-export-define-backend 'sexp
  '((template . sexp-template)
    (inner-template . sexp-inner-template)
    (section . sexp-section)
    (headline . sexp-headline)
    (paragraph . sexp-paragraph)
    (plain-text . sexp-plain-text)))

and its output for this simple document:

(with-temp-buffer
  (insert "Hello world!\n")
  (insert "* Here's a headline\n")
  (insert "Here's some more text\n")
  (org-export-as 'sexp))
(template
 (inner-template
  (section
   (paragraph "Hello world!\n"))
  (headline "Here's a headline"
   (section
    (paragraph "Here's some more text\n")))))

Deriving the org-html backend

In addition to defining new backends, you can also derive an existing backend using org-export-define-derived-backend. Can you see where this is going?

Here’s how to derive the HTML backend and replace some transcoders:

(defun my-html-paragraph (paragraph contents info)
  (format "<p class=\"my-paragraph\">%s</p>" contents))

(org-export-define-derived-backend 'my-html 'html
  :translate-alist '((paragraph . my-html-paragraph)))

yielding HTML with a modified <p> tag:

<p class="my-paragraph">Hello world!</p>

What can you customise with this? Everything5.

Examples

Rather than relying the various org-html* variables to modify the <head> element and other pieces of the template, it’s possible to render the whole thing yourself:

(defun cool-html-template (contents info)
  (concat "<!DOCTYPE html>\n"
          "<html lang=\"en\">\n"
          "  <head>\n"
          "    <title>" (org-export-data (plist-get info :title)
                                         info) "</title>\n"
          "  </head>\n"
          "  <body>\n"
          (indent-string contents 4) "\n"
          "  </body>\n"
          "</html>"))

(org-export-define-derived-backend 'cool-html 'html
  :translate-alist '((template . cool-html-template)))

resulting in this majestic display:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>cool crab's cool page</title>
  </head>
  <body>
    🦀
  </body>
</html>

Rather than using org-html-{pre,post}amble, it’s possible to render the inner-template as well, e.g. to wrap the document in <main>:

(defun cool-html-inner-template (contents info)
  (concat "<main>\n" (indent-string contents 2) "\n</main>"))

(org-export-define-derived-backend 'cool-html 'html
  :translate-alist '((template . cool-html-template)
                     (inner-template . cool-html-inner-template)))
<!DOCTYPE html>
<html lang="en">
  <head>
    <title>cool crab's cool page</title>
  </head>
  <body>
    <main>
      <img src="/cool-crab.png" alt="a cool-looking crab on the beach, wearing dark sunglasses and enjoying a refreshing beverage" />
    </main>
  </body>
</html>

The info alist

I haven’t yet mentioned the info argument passed into the transcoder functions. The Org export reference documentation describes this as a “communication channel”, it’s an alist used to pass around metadata.

There’s a lot of data in there, including keywords defined in the Org document. This makes it useful to implement custom options that can be set in the document to influence the generated HTML.

There are convenience functions for accessing data within the info alist, like org-export-get-date.

But wait, there’s more

I’ve only scratched the surface of what org-export backends can do.

You can also go nuts with the Org parse tree before it’s exported using filters, and modify the plain markup before parsing using hooks. See the Advanced Export Configuration manual page for details.

To sum up, export backends provide extremely flexible and elegant rendering of Org documents, and you can do a lot more with a backend compared to the available export options.

Acknowledgements

Thanks to Jason Stewart for feedback on an earlier draft of this post.

Footnotes:

4

These examples are generated by Org Babel!

5

See the org-element-all-elements and org-element-all-objects variables for the list of types, also documented in the Org element API. In addition to those, org-export adds the template, inner-template, and plain-text types.