1 October 2020
Stuart Rowlands

Introducing Merlin

Over the past two years we’ve been busy creating, refining and open-sourcing a new, ‘magical’ content migration tool — Merlin.

Merlin logo

Why Merlin?

Historically, content migrations have been custom...and painful. You needed someone with a good understanding of content, content structures and the technology driving both the source and destination to build custom solutions that would read the data, transform it, and then migrate it onto the new platform. And every time you needed to migrate a site, you’d be re-doing this process in a tailored way.

The cumbersome (and costly) process created a barrier for change. Organisations often needed to move platforms because their old CMS was costly, nearing end-of-life, not secure enough, not flexible enough, etc., but moving platforms involved so much customised development effort that it was costly. This meant that many organisations were forced to stagnate on their old platform, rather than update. In fact, there are many websites/organisations in this situation right now.

We wanted to solve this problem, by building a content migration tool that we could re-use, and allow others to benefit from as well.

How Merlin works

Our content migration tool, Merlin, starts with a smart spider that helps to find all public-facing content on a website. Based on a set of defined rules these URLs are separated into categories of content (e.g. news articles vs blogs vs staff profiles). These pages are then interrogated and broken down at a field level (e.g. title, publish date, body, file attachments, etc.) and output into a generic, machine-readable format (JSON) — ready for ingestion into a CMS or other platform. The tool is generic enough that it can look at a website and create a standard format that can then be put into your target destination. In our case, we’ve used it to migrate sites to Drupal (and GovCMS), but Merlin is cross-platform so you could use it to move a site into any CMS. Merlin also provides rich options for data transformation. It can do like-for-like content migration, but it can also update the marked-up language to transform the content, to improve the quality of the content during the migration process.

Merlin ingests markup from the web and makes it pluggable so source data can come from any format, any input file, such as PDF, XML, etc.

Merlin in action

To date, we’ve used Merlin on many government sites, including:

  • DHHS’s Better Health Channel (migration complete but site not live yet)

Merlin has taken content from different content management systems (CMSs), including SiteCore, Squiz Matrix, Wordpress and older versions of Drupal.

Merlin and GovCMS

Salsa has used Merlin extensively for GovCMS sites, and as the GovCMS implementation partner and ongoing program support provider we know the target destination extremely well. Now, using Merlin to migrate content onto GovCMS, we only need to manually provide configuration for mappings from the source site and everything else magically comes through to GovCMS with no coding required. It brings across the content, site structure/information architecture (IA), images, and so on. This means the barrier to migrating onto GovCMS is now incredibly low.

Lowering the barrier

In general, Merlin lowers the barrier industry-wide for site migrations from any source CMS to any target CMS. The process is repeatable, predictable and largely automated. This, in turn, reduces the risk of manual errors and creates a smoother, faster and cheaper migration option.

Open sourcing it

We’ve also open sourced Merlin, so others can use it and benefit from the tool.

Merlin on GitHubExternal Link

Merlin documentation on GitHubExternal Link