Introduction
A common challenge when doing migrations of existing sites is the need to keep some of the sites available as-is while the CMS implementation is being rolled out.
One solution to this is to look at doing a "lift-and-shift" migration, meaning take a static copy of the site as it appears to users in the browser, and copy that to some other location. The benefit of this approach is that the process can be concluded quickly (you are limited by how much content there is to download and how quickly you can download it).
There are some limitations to a lift-and-shift migration like this:
- no content management: the only way to change the content on the migrated sites will be to upload a replacement page;
- no server-side code: the process described here can only see whatever is rendered in the browser. Any features that require server-side code like authentication, account management, submitting forms or other user generated content, will be missing.
Once you have the migration completed, you can then work on adding content management to the sites, converting to a new design or implementation as suits your needs.
Can you do "lift-and-shift" migrations on Crownpeak? Yes, and here is how you do it.
Conceptually what you need to do is:
- Create an archive copy of your sites as they exist right now.
- Upload the archive to DXM to create digital assets†.
- Publish the content
† Digital assets are CMS assets that are not associated with a template. They cannot be edited directly in the CMS but can be replaced by re-uploading.
Process Description
Preparation
Before embarking on this process, there are a couple of tools you will need to have available:
- Wget - a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.
- Visual Studio with the Crownpeak Visual Studio extension installed. At this time, it is only the Windows-based Visual Studio tool that is supported, not Visual Studio Code or Visual Studio for Mac.
You will also need to create a model in the CMS so that the uploaded digital assets have the correct workflow on upload. Copy the /System/Models/Basis/Asset
model to your project, then set the workflow property on the Asset blueprint (the file inside the model folder you copied).
Creating an archive
The objective here is to download a copy of the sites exactly as they appear to users in the browser. This include all resources needed to display the site including images, CSS, Javascript, downloads and so forth.
There are plenty of tools out there to help with this process. We used wget, a version of which is available on most, if not all, platforms. Make sure to use the latest version for your platform as the later versions can detect page pre-requisites in more HTML elements than older versions.
These are the command-line arguments we used:
> wget --no-clobber --continue --mirror --page-requisites --no-parent --no-verbose '{locale-homepage-url}'
Argument |
Description |
no-clobber |
Prevents wget from deleting or overwriting a file it has already been retrieved. This is especially helpful if you need to restart the download for any reason. |
continue |
Instructs wget to check whether it has a local copy of a resource before attempting to download it. This allows the download process to resume if it was stopped. |
mirror |
Provides wget with a number of parameters suitable for creating an offline archive of a site. These parameters include setting up recursive fetches and timestamping. |
page-requisites |
Instructs wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets. |
no-parent |
Prevents wget from ascending to the parent directory when retrieving recursively. |
no-verbose |
(Optional) Reduce the amount of messaging from wget. |
{locale-homepage-url} |
The starting point for the site indexing. I recommend running the command multiple times, providing a different locale homepage each time rather than attempting to download the entire site in one go. This allows you to monitor the progress and fine tune the download as needed |
NOTE: You may find it easier to use a wget config or start up file so that these configuration parameters do no need to be provided on the command line.
The result of running this command is a local archive of the site with all pages and resources accessible from the initial entry point.
Uploading
The biggest challenge with recreating a site in the CMS is getting the folder structure created. Here is an example structure from a recent migration:
www.acme.com/
├── content
│ └── dam
│ └── corporate
│ ├── GenericContent
│ │ ├── generic-backgrounds
│ │ └── productpage-components
│ ├── Promo blocks
│ └── road
│ └── test
├── etc
│ └── designs
│ ├── gdpr
│ │ └── clientlib-site
│ │ └── assets
│ │ └── img
│ │ └── template
│ └── corporate
│ └── clientlib-site
│ ├── img
│ └── refs
└── global
├── de-de
│ ├── ...
├── it-it
│ ├── ...
├── ja-jp
│ ├── ..
└── ...
The browser-based interface to the CMS does not provide any facility to upload a folder (and therefore any sub-folders). However, the Crownpeak Visual Studio extension does have a facility where you can upload a ZIP file and the folder structure in the ZIP file will be recreated in the CMS.
The next step is therefore to create a ZIP file of the site(s) you downloaded — excluding the top-level domain folder.
> cd www.acme.com
> zip ../acme.com.zip -r *
Now that you have the ZIP file, you can use the Visual Studio extension to upload the ZIP file. Use the "Upload" button on the Crownpeak toolbar in Visual Studio to access the following dialog box:
Make sure that you select the model you prepared earlier as this will ensure that these assets are uploaded and associated with the correct workflow for your project.
Publishing
Once you have all of the assets uploaded to an appropriate place in your CMS instance and you have checked that all assets are in the correct workflow, you should be in a position to publish these assets to whatever hosting you have configured through the CMS.