Headless Delivery and Non-web Content

MarcusEdwards · ‎08-05-2019

Headless Delivery and Non-web Content

This is part of series of articles that is looking at “headless” content delivery. Before we take a look at the use cases where headless is considered to be a very good fit, I wanted to look at the two key pieces in most, if not all, headless content delivery:

non-web content; and
pulling content

Non-web Content

Non-web content typically means anything that isn’t HTML. The most common formats for this are:

JSON as this can be readily integrated into browser-based apps and frameworks like Angular, Vue and React; and
XML.

Either way, the “content” is structured data that is composed of fields and values. This is natural fit for DXM as this is exactly what a templated asset is — a list of field names and values. However, both JSON and XML lend themselves naturally to nested or hierarchical structures where fields are not simple values but structured content. Here is an example structure for presenting data on a book:

book.json

{
  "type": "book",
  "title": "Web Content Management",
  "subtitle": "Systems, features and best practices",
  "poster": "https://flyingsquirrelbook.com/images/2x/cover.png",
  "published_date": "2019",
  "metadata": 
  {
    "authors": [
    {
      "name": "Deane Barker",
      "profile_picture": "https://flyingsquirrelbook.com/images/2x/deane-barker-author.png",
    }]
    "publisher": "O’Reilly"
  }
}

This is a succinct, readable, predictable data structure. However, it doesn’t fit well to a field-value list because of the deep structure (look at metadata for example). To represent this in a pure field-value list, the nested structures would need to be converted to fields in some way: "metadata" could need to become "metadata_publisher", "metadata_authors_name:1" and "metadata_authors_profile_picture:1" for example.

We'll look at a different strategy for representing deep structure in the JSON/XML in a later section. For now, let's look at how to get any kind of structured output.

Rendering JSON output

Crownpeak DXM can produce output in any format you need. Well, any textual format. A naive approach to creating the JSON (or XML) output would be using literal markup in the output.aspx handler, something like this:

output.aspx

{ 
  "type ": "<%= asset[ "content_type "] %> ", 
  "title ": "<%= asset[ "title "] %> ", 
  "subtitle ": "<%= asset[ "subtitle "] %> ", 
… 
}

However, this approach has numerous problems — what if the title field value contains a double quote? What if the subtitle was blank? These problems are exactly what developers have been warned about doing for years because of the risk of “injection” attacks.

A better approach is to use a serialisation library that understands how to produce valid JSON. The .NET framework includes a facility to do this using Data Contracts (see MSDN article that talks about using data contracts). There are also third-party JSON serialisers like the very popular NewtonSoft Json.Net library. However, you cannot upload and use third-party DLLs in the DXM templating context.

You can use the built-in .NET serialisation though. To do so, you need to create .NET classes and annotate their properties to indicate field types, names and so forth. Here is an example for some of the book data we saw earlier:

Book.cs

using System.Runtime.Serialization;

[DataContract(Name="book")]
public class Book
{
   [DataMember(Name="type")]
   public string Type { get; private set; } = "book";

   [DataMember(Name="title")]
   public string Title {get; set; }

   [DataMember(Name="subtitle", IsRequired=false)]
   public string SubTitle {get; set; } 
}

The DXM templating context does put limits on which .NET namespaces you can use for security reasons. We’re going to need to use the Util.SerializeDataContactJson method instead of the built-in .NET or third-party framework serialization classes and methods. The method returns a string so we can just render that to the output stream and we’re done.

output.aspx

<%
  var demo = new Book()
  {
      Title = "Web Content Management",
      SubTitle = "Systems, Features and Best Practices",
      Metadata = new Metadata
      {
        Publisher = "O'Reilly"
      }
  };
  demo.Metadata.Authors.Add(new Author { Name="Deane Barker", ProfilePictureUrl="https://"});

  Out.WriteLine(Util.SerializeDataContractJson(demo));
%>

Rendering XML Output

If you find that using XML would be a better fit for your application, you can use the same techniques I described for JSON output to render XML. The DataContract marked classes don’t need to change but you will need to use the Util.SerializeDataContractXML method.

Pulling Content

Now that we have the structured data, we need to think about how we’re going to get access to it. Headless delivery relies on apps being able to pull content rather than having it pushed to them. This is typically done through some kind of API call and in most cases this is a REST (or REST-like) API over HTTPS.

Crownpeak’s content delivery API at the time of writing is based on using Search G2. Using Search G2 to hold and retrieve content like this is extremely powerful as it is:

fully adaptive: new content types and structures can be added as you need them
expressive: you can query the content on any field or combination of fields you need.
caching: the results of queries so the performance of your application will be better when you have the same query being requested by multiple clients (latest 10 blog articles for example).

Publishing to Search G2

Publishing your content to Search G2 is a great option to consider. Search G2 supports a rich query API using Apache Solr and allows you to retrieve the content in either JSON or XML format.

Getting content into Search G2 is very different to the output + publishing mechanism described above — you need to write Search G2 handlers for your templates that effectively map the asset fields to search fields.

One constraint with using Search G2 that you will need to consider is that Search G2 is effectively a key-value store so supporting deep structures is complicated. You can store JSON embedded in a string field value. This is a convenient way to map deep structures but does have some caveats:

embedded JSON won’t be translated into different formats by Solr's native result format transform (wt parameter); and
embedded JSON will need to be extracted and parsed separately by your application.

Here is an example of querying Search G2 for book data. Notice that for the nested JSON to be expanded in the result, you will need a field transformer:

GET http://searchg2.crownpeak.net/cpuk-training-stage/select/?q=*:*&wt=json&indent=true&fl=*,custom_s_me...

{
  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "id": "133770",
        "url": "133770",
        "title": "Web Content Management",
        "custom_s_subtitle": "",
        "custom_i_published": 2013,
        "custom_s_poster": "/mle/_Assets/images/200x200.png",
        "custom_s_type": "book",
        "language": "en",
        "_version_": 1645647868320546816,
        "custom_s_metadata": {
          "authors": [
            {
              "name": "Deane Barker",
              "profile_picture": "\/mle\/_Assets\/images\/200x200.png"
            }
          ],
          "publisher": "O'Reilly"
        }
      }
    ]
  }
}

Publishing to a content delivery environment

If there is some reason not to use SearchG2 — perhaps you have an existing JSON format that you need to provide content in; or you need to have deep structures — you should consider publishing the data to a content delivery environment.

This could be a web hosting pod that all Crownpeak customers have access to, but much better would be to use AWS S3. S3 is a key part of the growing trend in server-less computing because it offers scalability, caching and resilience through multi-region replication, all of which come automatically just from using S3.

Publishing to AWS S3 is as simple as setting up your publishing package in the CMS to target an S3 bucket. The only thing you may wish to pay particular attention to is the asset naming as S3 doesn’t have “folders” per se, but it does support “prefixes” which look like a folder path, e.g. “items/books/technical/web-content-management.json” is an S3 resource name but collections be queried for using the S3 API using prefixes like “items/books” or “items/books/technical”.

Query operations are limited but you can use the S3 API to list the contents of buckets and use the prefixing mechanism to group or categorise content. Once you know which assets you actual need, you can use the S3 API to retrieve the object.

Conclusion

In this article we’ve looked at the two critical factors in headless content delivery: non-web or structured content, and a content delivery API. With these two elements it should be clear how headless is a good use case for non-web content and multi-channel delivery.

In the next article, I’ll be looking at content aggregation as another benefit touted for headless CMS.