Skip to main content

21 May 2017

Drupal 8 migration: what we've learnt so far

Reading time: 30-45 mins

It's great that the Migrate module has been included in Drupal 8 core. It’s awesome that we now have a stable framework we can use for migrating data into Drupal’s wonderfully refactored entity system! Wait, did I say stable?

In this post I’ll have a look at the current state of Drupal 8 Migrate module and share what we’ve found so far.

Let’s begin this post by emphasising the fact that version 8 is the most mature and flexible Drupal release to date. We’re genuinely really excited about working with it. We also love the Migrate module and have been developing Drupal migration solutions for a long time (for Drupal 7 projects in the main). So we couldn’t wait to try out the Drupal 8 version.

Luckily, we’ve recently had the chance to do just that when we redeveloped the young Vic website (link). This is a Drupal site integrated with the Tessitura ticketing system. We’ve used Drupal’s Migrate module to synchronise the theatre’s production and performance data stored in Tessitura by importing them into Drupal entities. While the Migrate framework provides a very simple way to import data into Drupal out of the box, during the development we’ve had to learn a lot about its more complex concepts (like process and source plugins), created our own data parser plugin and found out about some contributed modules that proved to be vital for a more complete and robust migration solution.

We’re going to go through most of these details with you but let’s start at the beginning and review the basic structure of a migration in Drupal 8.

Drupal 8 Migrate concepts

So let’s review how a migration looks in Drupal 8. Everything starts with your source – this is the data you want imported into Drupal; it might be of a format of a database, a spreadsheet or any machine-readable format produced by a 3rd party system.

Your source is then mapped into your destination – your Drupal entities. This mapping is done in YAML format, which allows for an easily-manageable file that contains all your migration configuration. The YAML file needs to be placed into your module’s config/install folder – if you’re familiar with Drupal’s configuration management system, you might already suspect a potential problem here. YAML config placed into this folder gets read by the configuration management system and parsed into configuration entities on module install. Unfortunately this means that every time you do a change to the file (and indeed you will as you tweak your migration settings, mapping configurations, dependencies and so on), your module needs to be uninstalled and then reinstalled for the changes to take effect. Unfortunately this is a really time-consuming operation, especially in the case of complex mapping configurations.

Configuring your migration

So, this is the basic concept of a migration in Drupal. You’ve got source data that you want migrated into your destination. For a simple example, let’s imagine that you’ve got a basic Drupal content type called ‘article’ that contains only a single title field. The YAML configuration file for the migration would look something like this:

(Editorial note: the code examples need to be placed into a <code> ... </code> block when creating the content so that formatting looks nice on the front-end ;)

# Migration configuration for article content.

id: article_node

label: Article migration

migration_group: my_migration_group

source:

  plugin: legacy_articles

destination:

  plugin: entity:node

process:

  # Hardcode the destination node type (bundle) as 'article'.

  type:

    plugin: default_value

    default_value: article

  title: legacy_title

  

Let’s review what’s going on in this configuration file. First, we start by declaring the unique identifier of the migration using id: article_node. This is pretty obvious; migrations need to be uniquely identified as you might (and most probably will) create a number of migrations when importing data into Drupal from an external datasource (such as users, taxonomy terms or any other data that you want to turn into Drupal entities).

The next row defines a label for the migration – this text will show up on the migration UI as the label of the migration.

Next, we define the migration group. This is an optional step, but is a pretty useful feature as it lets us categorise our migrations both in the back-end and the UI. It can be especially useful if you have a number of sources or migrations as it lets you associate migrations with migration groups, which makes managing them much easier.

Moving on, we define the source. As you can see, the source key requires us to define a plugin that describes how Migrate can retrieve our source data. If you’re familiar with Plugins in Drupal 8, you’ll already know that a plugin is basically a PHP class, placed into a specific folder structure within your module and tagged using Annotation syntax (there are other plugin discovery methods in Drupal 8 but Annotation is the most widely used one). We’ll need to create this file and place it into src/Plugin/migrate/source folder within our module’s directory. We’ll come back to this step later on in this post.

Then we define the destination, which is entity:node. This simply means that we want to create node entities from the legacy data. If we wanted to create entities of a different type, let’s say membership_data entities, we’d simply use entity:membership_data here.

The next step is to define the process. A Drupal 8 migration process is another YAML configuration entry that defines how your source data should be mapped to your destination. In other words, this is where you can tell Migrate what data your destination should eventually contain. 

In this example the process is straightforward. We’re doing two simple things:

define the destination type using the default_value plugin, a process plugin built into Drupal 8 Migrate module that lets us define a simple fixed value map the legacy_title field to our node’s title – again, an obvious step.

 

OK, so far we’ve created a basic YAML configuration for the migration. Will this just work if we enable our module? Of course not. We need to create our source plugin that tells Migrate how to retrieve our source data.

Migrate source plugins

As we noted earlier in this post, we need to create a plugin that defines the structure of our source data. This is actually a very simple step as all we need to do is place a PHP file into a specific folder structure within our module and tag it using Annotation syntax. Let’s have a look again at our source setting in the YAML configuration we’ve created earlier:

source:

  plugin: legacy_articles

 

So, we’ll need to create a migrate source plugin with the ID legacy_articles. How do we do that? Following Drupal 8’s PSR-4 conventions, we’ll create a PHP file under src/Plugin/migrate/source within our module’s directory and call it Article.php - in this case it doesn’t matter what the filename is as long as it’s in the right directory but it’s essential to give it a name that you can easily recognise later on as it will make your life easier when you work with a number of source plugins.

Let’s have a look at the contents of the plugin class and see what they should be. This is how the entire file will look like (don’t worry if you don’t understand everything at first glance, we’ll go through it step by step).

<?php

/**

 * @file Contains Drupal\my_module\Plugin\migrate\source\Article

 */

namespace Drupal\my_module\Plugin\migrate\source;

 

use Drupal\migrate\Plugin\migrate\source\SqlBase;

use Drupal\migrate\Row;

 

/**

 * Source plugin for article content.

 *

 * @MigrateSource(

 *   id = "legacy_article"

 * )

 */

class Article extends SqlBase {

  /**

   * {@inheritdoc}

   */

  public function query() {

    /**

     * This is where we tell Migrate how to retrieve our legacy data.

     * It's very important that this query returns a single row for each

     * legacy article to be migrated, otherwise your migration will not behave

     * as you expect.

     */

    $query = $this->select('legacy_articles', 'a')

                 ->fields('a', ['legacy_title', 'id']);

    return $query;

  }

 

  /**

   * {@inheritdoc}

   */

  public function fields() {

  /**

   * This is where we define the fields available for mapping.

   */

    $fields = [

      'legacy_title' => $this->t('Title of legacy article'),

      'id' => $this->t('Unique ID of legacy article')

    ];

    return $fields;

  }

 

  /**

   * {@inheritdoc}

   */

  public function getIds() {

  /**

   * This is where we define the unique identifier of

   * each source row from our legacy dataset.

   */

    return [

      'id' => [

        'type' => 'integer',

        'alias' => 'a',

      ],

    ];

  }

 

  /**

   * {@inheritdoc}

   */

  public function prepareRow(Row $row) {

    /**

     * This method lets us tweak our raw source data after it's pulled

     * from the source but before it gets passed on to the destination.

     */

 

    // Skip this row if the legacy article doesn't

    // contain a title

    if (empty($row->getSourceProperty('legacy_title'))) {

      return FALSE;

    }

    return parent::prepareRow($row);

  }

}

Let’s see what we’re doing here. The first step is to define the namespace of our class, just like we do with any other class in Drupal 8. It’s vital to follow PSR-4 conventions here, otherwise Drupal’s Plugin system won’t recognise our class as a plugin and our migration won’t work.

Next, we import a few other namespaces that we’ll be using in our class. As you can see we import Drupal\migrate\Plugin\migrate\source\SqlBase and Drupal\migrate\Row. As we noted earlier, our source data is in a SQL format, so we’ll use Migrate’s SqlBase class, which is a built-in class that helps in working with SQL sources. Next, we import Migrate’s Row class, which is a representation of each source row that we’re migrating. This class basically helps us tweak the source data before it’s actually migrated into our destination.

The next step is to add the Annotation, which is basically a comment just above our class definition. If you’re new to Drupal 8’s Annotation syntax, it’s understandable that you wonder “hey, does this mean that a comment is actually a functional part of my code???” - the answer is YES. Drupal will parse this information when it does plugin discovery, so this is the place where you can define the unique identifier (and other settings when required) of your plugin. Let’s break this up and see what’s happening:

* @MigrateSource(

*   id = "legacy_article"

* )

As we’re creating a migrate source plugin, we tag our class using @MigrateSource (which is an annotation class defined by Migrate in core/modules/migrate/src/Annotation/MigrateSource.php). Then, within the brackets, we define the id of our source plugin. If you look back at the YAML configuration we’ve created earlier, it should now make sense why we gave the ID legacy_article to our migrate source plugin. (Note: read more about annotation-based plugins here https://www.drupal.org/docs/8/api/plugin-api/annotations-based-plugins)

Moving on, we create our class definition. Note that we name our class Article – as I stated above, the name of the file itself doesn’t matter but it’s very important to note that the class name and the filename do need to match (following PSR-4 standards). Our class extends SqlBase, which provides access to the query() method. This method lets us define the query that Migrate should run in order to retrieve our legacy data. In this example, we’re only interested in the title of the legacy article, so that’s the only column we need to retrieve. Is that true? Well, not entirely. Migrate requires us to define a unique identifier for each source row that it tries to import, so that it can create a relationship between source and destination. In other words, this is the way Migrate keeps track of the mapping between old and new data. This comes very handy in case of rollbacks or updates to already migrated data.

So in our query() method we retrieve the legacy_title and the id column from the legacy database:

$query = $this->select('legacy_articles', 'a')

                 ->fields('a', ['legacy_title', 'id']);

 

Next, we let Migrate know what source fields we want to make available during the process. We need to make sure to explicitly list the fields we want to work with, otherwise they won’t be available during migration:

public function fields() {

  /**

   * This is where we define the fields available for mapping.

   */

    $fields = [

      'legacy_title' => $this->t('Title of legacy article'),

      'id' => $this->t('Unique ID of legacy article')

    ];

    return $fields;

  }

 

This way both legacy_title and id will become available as columns that might be mapped to our destination. (Even though we don’t actually map the id field in our example, it won’t cause any harm to make it available.)

The next step is vital for our migration to work properly. This is where we define the property which Migrate will use to uniquely identify a given source row:

public function getIds() {

  /**

   * This is where we define the unique identifier of

   * each source row from our legacy dataset.

   */

   return [

     'id' => [

       'type' => 'integer',

       'alias' => 'a',

     ],

   ];

}

 

Here we tell Migrate that we want to use the id column as the unique identifier, which we set to an integer type. What happens in the background is Migrate creates a new entry in a dynamically created database table that’s associated with our migration and inserts a new row that contains the ID of the legacy article (e.g. 23) in one column and the ID of the new Drupal node (e.g. 342) in another. This way Migrate will be able to relate node/342 to ID 23 in the source data.

Lastly, we do a little tweak to our source data, right before it actually gets mapped to our destination. By implementing the prepareRow() method we have a way of accessing the raw data of each row that gets imported. In our example, we want to skip importing legacy articles that don’t have a title, which may well be a real-life example. 

(A quick note here: it might be a good moment here to emphasize how important it is to examine your source data BEFORE you start writing your migration classes. From time to time, you’ll encounter situations where your source data is of low quality, which means that rows might fail to / partly contain all required information that you need to migrate. Catching some of these source data issues early in the process can save a lot of development time and, also, might prevent you from banging your head in the wall. The latter never happened to us of course.)

Let’s have a look at our prepareRow() implementation:

  /**

   * {@inheritdoc}

   */

  public function prepareRow(Row $row) {

    /**

     * This method lets us tweak our raw source data after it's pulled

     * from the source but before it gets passed on to the destination.

     */

 

    // Skip this row if the legacy article doesn't

    // contain a title

    if (empty($row->getSourceProperty('legacy_title'))) {

      return FALSE;

    }

    return parent::prepareRow($row);

  }

 

This is actually quite easy. We do a simple comparison to see if the given row’s legacy_title column is empty and if it is, all we need to do is return FALSE. This way we instruct Migrate to skip this row and continue with the next iteration. So, in this case, returning FALSE will not interrupt the migration process but simply move on to the next row in our source.

Running our migration

That’s it! Now we’ve created a basic migration which we can run using drush... “Wait, we can’t run migrations using the UI???” - well, no we can’t. At the time of writing, the Migrate module is still in experimental phase, which also means that there is no stable way to run migrations using the UI. While it makes sense that the preferable way of running migration is via drush (which takes load off the webserver and is able to process large datasets more reliably), it might be handy in some cases to have the ability to run them on the UI. We’re pretty sure this will be fixed in an upcoming release.

So, let’s try and run our migration using the drush command drush migrate-import article_node. And voila, it won’t work...

Wait, we’ve missed something…

In fact, we can’t run migrations at all. That is, unless we install a contributed module called Migrate Tools (https://www.drupal.org/project/migrate_tools). This module lets us actually run the migration we’ve created. Now let’s try the drush command again, it should work now.

So, this means that we need to install a contributed module if we want to run our migration. This is a bit weird, but we’ll have to live with it.

(Editorial note: should we make this a series of blog posts? There is a lot more I could cover that would be great to blog about I think?)

Conclusion

In this post we’ve only touched the surface of Migrate module in Drupal 8 but you can already see it’s awesomeness. We love the fact that it’s a very flexible and pluggable system that lets you migrate any sort of data into Drupal entities.

At this moment in time, it does have some shortcomings that require you to either install additional modules or develop custom code in order to achieve your goals. Nevertheless, Migrate is constantly evolving and we’re pretty sure we’ll see great enhancements to it in the near future.

Get in touch about your next project!