Converting my old Wordpress posts to Markdown
Torsten Uhlmann
—Mon, 05 Oct 2015
Photo by Torsten Uhlmann
I was moving my old Wordpress site to a new statically generated version. Getting hacked sucks, you know…
This site is generated by Cryogen, a fairly simple Clojure based site generator. Cryogen uses Markdown or Asciidoc files for posts and pages.
Now the task was to extract old posts from Wordpress and convert them into Markdown.
Here are the outlined steps I took to get this done, maybe this list helps future convertees:
- First setting up an Ubuntu VirtualBox to reinstall my Wordpress site. I had teared mine down immediately when I was notified that it was hacked. If yours is still running- well, good for you.
- Now, Wordpress has an export feature that allows you to export posts and/or pages into an
XML
file. I used that to export all posts.
- Then came the fun part. I wipped up a short quick & dirty Clojure script to read the
XML
, extract the posts and important metadata and write them asHTML
files. - The script then reads back the
HTML
files, usessed
to search and replace old image paths and then usespandoc
to transform theHTML
contents into Markdown.
The script is not polished and hardly performant. But if it helps someone, here’s the gist.
g ist:tuhlmann/d9f1e3237eb8f692eb71
The cool part is the way it reads XML
and uses Clojure’s zippers to traverse and transform it.
Have fun, Torsten.