Over the past two years we’ve been busy creating, refining and open-sourcing a new, ‘magical’ content migration tool — Merlin.
Historically, content migrations have been custom...and painful. You needed someone with a good understanding of content, content structures and the technology driving both the source and destination to build custom solutions that would read the data, transform it, and then migrate it onto the new platform. And every time you needed to migrate a site, you’d be re-doing this process in a tailored way.
The cumbersome (and costly) process created a barrier for change. Organisations often needed to move platforms because their old CMS was costly, nearing end-of-life, not secure enough, not flexible enough, etc., but moving platforms involved so much customised development effort that it was costly. This meant that many organisations were forced to stagnate on their old platform, rather than update. In fact, there are many websites/organisations in this situation right now.
We wanted to solve this problem, by building a content migration tool that we could re-use, and allow others to benefit from as well.
How Merlin works
Our content migration tool, Merlin, starts with a smart spider that helps to find all public-facing content on a website. Based on a set of defined rules these URLs are separated into categories of content (e.g. news articles vs blogs vs staff profiles). These pages are then interrogated and broken down at a field level (e.g. title, publish date, body, file attachments, etc.) and output into a generic, machine-readable format (JSON) — ready for ingestion into a CMS or other platform. The tool is generic enough that it can look at a website and create a standard format that can then be put into your target destination. In our case, we’ve used it to migrate sites to Drupal (and GovCMS), but Merlin is cross-platform so you could use it to move a site into any CMS. Merlin also provides rich options for data transformation. It can do like-for-like content migration, but it can also update the marked-up language to transform the content, to improve the quality of the content during the migration process.
Merlin ingests markup from the web and makes it pluggable so source data can come from any format, any input file, such as PDF, XML, etc.
Merlin in action
To date, we’ve used Merlin on many government sites, including:
Department of Health and Human Services’ (DHHS) Finding website
Victorian Department of Premier and Cabinet’s Open Data (view Open Data Portal case )
Victoria’s (view Legislation case )
Migrating over 25 sites onto GovCMS (view migrating 25+ government sites onto GovCMS case )
Department of Foreign Affairs and (DFAT) corporate site (view DFAT case )
DHHS’s Better Health Channel (migration complete but site not live yet)
Merlin has taken content from different content management systems (CMSs), including SiteCore, Squiz Matrix, Wordpress and older versions of Drupal.
Merlin and GovCMS
Salsa has used Merlin extensively for GovCMS sites, and as the GovCMS implementation partner and ongoing program support provider we know the target destination extremely well. Now, using Merlin to migrate content onto GovCMS, we only need to manually provide configuration for mappings from the source site and everything else magically comes through to GovCMS with no coding required. It brings across the content, site structure/information architecture (IA), images, and so on. This means the barrier to migrating onto GovCMS is now incredibly low.
Lowering the barrier
In general, Merlin lowers the barrier industry-wide for site migrations from any source CMS to any target CMS. The process is repeatable, predictable and largely automated. This, in turn, reduces the risk of manual errors and creates a smoother, faster and cheaper migration option.
Open sourcing it
We’ve also open sourced Merlin, so others can use it and benefit from the tool.