Follow Story Needle | Content strategy for a post-device era on Feedspot

Continue with Google
Continue with Facebook


Much of our lives is structured around time.   When we think about time, we often think about events or schedules.  When does something happen?   Time provides structure for experience.  

In our lives online, content shapes our experience.  How then can time bring structure to content we consume online?

Content strategists generally think about structuring content in terms of sequences, or logical arrangement.   I’d like to introduce another way to think about structuring content: in terms of time.  But instead of focusing on timing — when something happens —  we can focus on how much time something requires.  The focus on the  dimension of duration has developed into a discipline known as timeboxing.

Timeboxing and Content

Timeboxing is a concept used in agile planning and delivery.   Despite its geeky name, all of us use timeboxes everyday, even if we don’t use the term.  We schedule 15 minute conversations, or half-hour meetings, and then decide what’s the most important material to cover in that block of time, and what material needs to wait to be dealt with elsewhere. Despite our best wishes, circumstances commonly dictate our available time to discuss or consider an issue, rather than the reverse.  

With roots in project management, timeboxing is most often applied to the production of content, rather than its consumption.  But timeboxing can be used for any kind of task or issue.

“Timeboxing allocates a fixed time period, called a timebox, within which planned activity takes place” 


The consumption of content can also be timeboxed.  Content that is consumed on a schedule is often timeboxed.  Some examples of timeboxing content experience include:

  • The BBC offers a “Prayer for the day” lasting two minutes
  • Many broadcasters offer timed 15 or 30 second updates on markets, weather and traffic
  • The PechaKucha format for presentations of 20 slides for 20 seconds each, for a total of 6 minutes, 40 seconds, to maximize the number of speakers and keep presentations interesting
  • Time-limited conversations in “speed networking” in order to maximize the number of conversations had

Producers (those who offer content) decide how much content to make available in order to synchronizing when it can be consumed.  There’s a close association between fixing the duration of content, and scheduling it at a fixed time.  The timing of content availability to users can help to timebox it.  

Limits without schedules

But timeboxing doesn’t require having a schedule.  In an agile methods context, timeboxing inverts the customary process of planning around a schedule.  Instead of deciding when to do something, and then figuring out how long can be allotted to a task, the timeboxing approach first considers how long a task requires, and then schedules it based on when it can be addressed.  How long something takes to express is many cases more important than when something happens.

This is a radical idea.  For many people, timeboxing — limiting the time allowed for content — is not our natural tendency.   Many of us don’t like restricting what we can say, or how long we can talk about something.  

Timeboxing tends to happen when there’s a master schedule that must be followed.  But when access to content doesn’t depend on a schedule, timeboxing is ignored. Even audio content loses the discipline of having a fixed duration when it is delivered asynchronously.  A weekly podcast will often vary considerably in length, because it is not anchored to a schedule forcing it to fit in a slot that is followed by another slot.

Authors, concerned about guarding their independence, often resist imposing limits on how much they can say.  Their yardstick seems to be: the duration of the content should match whatever they as authors think is necessary to convey a message.  The author decides the appropriate duration —  not an editor, a schedule, or the audience.  

Without question, the web liberates content producers from the fixed timetable of broadcast media.  The delivery of digital content doesn’t have to follow a fixed schedule, so the duration of content doesn’t have to be fixed either.  The web transcends the past tyranny of physical limitations.  Content can be consumed anytime. Online content has none of the limits on space, or duration, that physical media imposed.  

Schedules can be thought of as contracts. The issue is not only whether or not the availability of content follows a schedule, but who decides that schedule.  

Online content doesn’t need to be available according to a set schedule, unlike broadcasters who indicate that  you can listen to this content at 9 am each day. 

A schedule may seem like tyranny, forcing authors to conform to artificial limitations about duration, and restricting audiences to content of a limited duration.  But schedules have hidden benefits.  

Setting expectations for audiences

When content follows a schedule, and imposes a duration, the audience knows what to expect.  They tacitly accept the duration they are willing to commit to the content.  They know ahead of time what that commitment will be, and decided it is worth doing.

The more important question is how consumers of content can benefit from timeboxing, not how producers of content can benefit.  Timeboxing content can measure the value of content in terms of the scarcest commodity audiences have: their time.

How should one measure the time it takes to consume content?  A simple standard would be listening duration: the amount of time it would take for the content to be read aloud to you in a normal speaking pace.  

We read faster than we talk.  We are used to hearing words more slowly than we would read them.  If we are a “typical” person, we read 250 words/minute.  We speak and listen at 150 words/minute.

Listening can’t be sped-up, unlike reading. And having to have something repeated is annoying for a listener.  For content presented on a screen, there is generally no physical limits to how much content can be displayed.  Content creators rely on the audience’s ability to scan the content, and to backtrack, to locate and process information of interest to them.  Content read aloud doesn’t offer that freedom.  Listening duration provides a more honest measurement of how long content takes to consume.

The duration of content is important for three reasons.  It influences:

  1. The commitment audiences will make to trying out the content
  2. The attention they may be able to offer the content
  3. The interest they find the content offers
Audience commitments 

Most of us have seen websites that provide a “time to read” indicator.  Videos and podcasts also indicate how many minutes the content lasts.  These signals help audiences choose content that matches their needs — do they have enough time now, or do they wait until later?  This is content the right level of detail, or it is too detailed?  

A news article from the Christian Science Monitor gives readers a choice of a long and short version.

Content varies widely in length and the amount of time required to consume it.  One-size does not fit all content.  Imagine if publishers made more of an effort to standardize the length of their content, so that specific content types were associated with a specific length of time to read or listen.

Timeboxing recognizes that tasks should fit time allotments.  Timeboxing can encourage content designers to consider the amount of time they expect audiences will give to the content.

Audiences have limited time to consume content.  That means they can’t commit to looking or listening to content unless it fits their situation.

And when they consume content, they have limited attention to offer any specific message.  

Audience attention

Currently, some publishers provide authors with guidelines about how many words or characters to use for different content elements.  Most often, these guidelines are driven by the internal needs of the publisher, rather than being led by audience needs.  For example, error messages may be restricted to a certain size, because they need to fit within the boundaries of a certain screen.  The field for a title can only be of a certain length, as that’s what is specified in the database.  These limits do control some verbosity.  But they aren’t specifically designed around how long it would take audiences to read the message.  And limiting characters or words by itself doesn’t mean the content will receive attention from audiences.

Time-to-read is difficult to calculate.  Instead of counting words or characters, publishers try to guess the time those words and characters consume.  That is not a simple calculation, since it will partly depend on the familiarity of the content, and how easily it can be processed by audiences.  Tight prose may be harder comprehend, even if it is shorter.

Since much text is accompanied by visuals, the number of words on a screen may not be a reliable indication of how long it takes to consume the content.  Apple notes: 

“Remember that people may perform your app’s actions from their HomePod, using ‘Hey Siri’ with their AirPods, or through CarPlay without looking at a screen. In these cases, the voice response should convey the same key information that the visual elements display to ensure that people can get what they need no matter how they interact with Siri.” 

Apple Human Interface Guidelines

The value of structuring content by length of time, rather than number of characters or words, is easiest to appreciate when it comes to voice interaction.  Voice user interfaces rely on a series of questions and answers, each of which needs to be short enough to maintain the attention of both the user and the bot processing the questions. Both people and bots have limited buffers to hold inbound information.  The voice bot may always be listening for a hot word that wakes it up — so that it really starts to pay attention to what’s being said.  Conversely, the user may be listening to their home speaker’s content in a distracted, half-hearted way, until they hear a specific word or voice that triggers their attention.

Matching the audience’s capacity to absorb information

Attention is closely related to retention. Long, unfamiliar content is hard to remember.  Many people know about a famous study done in the 1950s by a Professor Miller about the “magical number seven” relating to memory spans.  The study was path breaking because it focused on how well people can remember “contents”, and proposed creating chunks of content to help people remember.  It is likely the beginning of all discussion of about chunks of content.  Discussing this study, Wikipedia notes: a memory “span is lower for long words than it is for short words. In general, memory span for verbal contents (digits, letters, words, etc.) strongly depends on the time it takes to speak the contents aloud.”  The famous Miller experiment introduced time (duration) as a factor in retention.  It is easier to recall shorter duration content than longer duration.  

We can extend this insight when considering how different units of content can influence audiences in other ways, beyond what they remember.  Duration influences what audiences understand, what they find useful, and what they find interesting.  

Exceeding the expected time is impolite. When audience believe content takes “too long” to get through, they are annoyed, and will often stop paying attention. They may even abandon the content altogether.

The amount of attention people are willing to give to content will vary with the content type.  For example, few people want to read long entries in a dictionary, much less listen to a definition read aloud.

Some content creators use timeboxing as their core approach, as is evident in the titles of many articles, books and apps.  For example, we see books promising that we can “Master the Science of Machine Learning in Just 15 Minutes a Day.”  Even when such promises may seem unrealistic, they feel appealing.  As readers, we want to tell publishers how much time we are able and willing to offer

The publisher should work around our time needs, and deliver the optimal package of material that can be understood in a given amount of time.   It doesn’t help us to know the book on machine learning is less than 100 pages, if we can’t be sure how difficult the material is to grasp.  The number of pages, words, and characters, is an imperfect guide to how much time is involved.

Audience interest

Another facet of structuring content by time is that it signals the level of complexity, which is an important factor in how interesting audiences will find the content.  If a book promises to explain machine learning in 15 minutes a day, that may sound more interesting to a reader without an engineering background than a book entitled “The Definitive Guide to Machine Learning” which sounds both complicated and long.

What is the ideal length of a content type, from an audience perspective?  How long would people want to listen to (or read attentively) different content types, if they had a choice?  For the purposes of this discussion, let’s assume the audience is only moderately motivated. They would like to stop as soon as their need for information is satisfied.

Time-delimited content types can offer two benefits to audiences:

  1. Pithiness
  2. Predictable regularity

Content types define what information to include, but they don’t necessarily indicate how much information to include.  The level of detail is left to individual authors, who may have different standards of completeness.  

When content becomes bloated, people stop paying attention.  There’s more information than they wanted.    

Making content more even

Another problem is when content is “lumpy”: some content relating to a certain purpose is long-winded, while other content is short.  A glossary has short definitions  for some  words but other definitions are several paragraphs.    We find this phenomenon in different places.  On the same website, people move between short web pages that say very little and long pages that scroll forever. 

Paradoxically, the process of structuring content into discrete independent units can have the effect of creating units of uneven duration.  The topic gets logically carved up.  But the content wasn’t planned for consistency in length.  Each element of content is independent, and acts differently, requiring more or less time to read or hear.  

 Audiences may give up if they encounter a long explanation when they were expecting a short one.  It only takes one or two explanations that are abnormally long for audiences to lose confidence in what to expect.  A predictable experience is broken.

Timeboxing content messages encourages content designers to squeeze in as much impact in the shortest possible time.

Message units, according to duration

If people have limited free time, have limited attention to offer, and have lukewarm interest, the content needs to be short — often shorter than one might like to create.

We can thus think about content duration in terms of “stretch goals” for different types of content.  Many people will be happy if the content offered can be successful communicating a message while sticking to these durations.  

While no absolute guidelines can be given for how long different content should be, it is nonetheless useful to make some educated guesses, and see how reliable they are.  We can divide durations into chunks that increase by multiples of three, to give different duration levels.  We can then consider what kinds of information can reasonably be conveyed within such a duration.  

  • 3-5 seconds: Concisely answer “What is?” or “Who is?” with a brief definition or analogy (“Jaws in outer space”)  
  • 10-15 seconds: Provide a short answer or tip, make a suggestion, provide a short list of options.
  • 30 seconds:  Suggest a new idea, explain a concept, offer an “elevator pitch”
  • 1-3 minutes: To discuss several things, explain the context or progression of something — an episode or explainer 

For writers accustomed to thinking about the number of words, thinking how long it would take to listen to a message involves a shift.  Yet messages must match an expected pattern.  Their duration is a function of how much new material is being introduced.  

Creating messaged based on their duration helps to counterbalance a desire for completeness.  Messages don’t have to be complete.  They do have to be heard, and understood.  There’s always the opportunity for follow up.  And while less common, if a message is too short, it is also disappointing.  

Testing the duration

For short chunks of content addressing routine purposes, content designers should craft their messages to be appealing to the distracted content consumer. They can ask:

  • How easy is the message to follow when read aloud?
  • Does the message stand on its own, without exhausting the audience?
  • Can people ask a question, and get back an answer that matches how much time they are willing and able to offer?

I expect that the growing body of research relating to voice user interaction (VUI) will add to our understanding of the role of duration in content structure.  Feel free to reach out to me on social or community channels if you’d like to share experience or research relating to this theme.   It’s an evolving area that deserves more discussion.

— Michael Andrews

The post Structuring Content through Timeboxing appeared first on Story Needle.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Content doesn’t organize itself.  That’s why we have content models.  

A lot of advice about creating content models misses an important dimension: how the user fits in. Many content models a good at describing content.  But not many are very user centric.  I want to suggest some simple steps to help make content models more centered on user needs.

Two popular ways of thinking about content models are (1) that the content model is like a database for content (the technical approach), or (2) that the content model is a structural representation of a massive document (the structured authoring approach).  When combined, these approaches transform a content model into a picture of documents-as-a-database.   

Content models generally focus on showing what information is relevant to various topics.  Some models can be very sophisticated at representing the publisher’s perspective, and all the details it might want to manage.  But even in sophistical models, the needs and motivations of audiences are hard to see.  

Content models show numerous fields and values.  Each topic could become a screen that could be configured in various ways.  One CMS vendor says of content modeling: “it’s very similar to database modeling.” 

But actually, designing content to support user goals very different from designing  database to store records.  Databases are a bad analogy for how to model content.

Audiences don’t want to read a database. Even if they are interested in the topic.  A database is fine for scanning for short bits of information to get quick answers. It’s less good for integrating different fragments of information together into a meaningful whole. People need support bridging different fragments of information. 

A content model should aim to do more than show a picture of how topics can be broken into chunks.  

Neither are content models about navigation paths, as if they were a site map.   True, different chunks, when linked together, can allow users click between them.  It’s nice when users can jump between topics.  But it’s not clear why users are looking at this content to begin with. Many models may look like a collection of linked Wikipedia articles about baseball teams, baseball players, and pennant races.  It’s a model of what we could call brochure-ware.  It’s a database of different articles that reference one another.  The connections between chunks are just hyperlinks. There’s no obvious task associated with the content.  

What Users need to Know

Most explanations of content models advise publishers to model stuff that people might want to know about. I call this the stuff-to-know-about perspective. It’s a good starting point.  But it should not be the end point of the content model, as it often is.

When we look at stuff people might want to know about we start with topics. We identify topics of interest and then look at how these topics are connected to each other.

Suppose you and I are going to take a trip to a place we have never visited. Let’s imagine we are going to Yerevan in Armenia. We’d want to consult a website that presents content about the city. What might the content model look like?  As a thought experiment, we are going to simultaneously think about this situation both as content modeler and as a prospective tourist.  We’ll see if we can blend both these perspectives together.  (This technique is known as wearing two hats: switching roles, just like we do all the time in real life.)   

As content modelers, we will start with stuff we as tourists will need to know about.  We’ve never been to Yerevan and so we need to know some very basic information.

If we are going to travel there, we will want to:

  • Find a place to stay, probably a hotel 
  • Take transport within the city
  • Find restaurants to eat at
  • Visit tourist sites
  • Check out local entertainment

These user needs provide the basis for the content model. We can see five different topics that need to be covered. There needs to be profiles of:

  • Hotels
  • Transport options 
  • Tourist sites
  • Restaurants
  • Entertainment venues  

Each profile will break out specific aspects of the topic that are most interest to readers.  Someone will need to figure out if each hotel profile will mention whether or not a pillow menu is available.  But for the moment, we will assume each profile for each topic covers  important information users are looking for, such as opening times.  

We have some topics to make into content types. But the relationship between them isn’t yet clear.

As modelers, we have identified a bunch of stuff that tourists want to know about.  But it’s not obvious how these topics are connected to one another.  It’s like we have several piles of tourism brochures: a pile on hotels, a pile on tourist sites, and so on, each stacked side by side, but separate from each other.  If you’ve ever walked into the tourist information center in a city you are visiting, and walked out with a pile of brochures, you know that this experience is not completely ideal.  There’s loads of material to sort through, and decisions to coordinate.  

Modeling to Help Users Make Decisions

If we only adopt a topic perspective, we don’t always see how topics relate to one another from the users’ perspective. It’s important for the model not only to represent stuff people need to know about. We also need the model to account for how audiences will use the content. To do this we need to dig a little deeper. 

As modelers, we need to look at the choices that users will be making when consulting the content. What decisions will users make? On what basis will users make these decisions?  We need to account for our decision criteria in the content model.

As a prospective tourist, I’ve decided that three factors influence my choices. I want to do things that are the best value, the best experience, and the most convenient.  This translates into three criteria: price, ratings, and location. 

It turns out that these factors are dimensions of most all of the topics.  As a result, information about these dimensions can connect the topics together. 

I want to go to places that are convenient to where I am staying or spending time.  All the different venues have a location.  Different venues are related to one another through their location. But we don’t have any content that talks about locations in general. This suggests to new content type: one on neighborhoods.  This content type can help to integrate content about different topics, revealing what they have in common. People both want to know what’s nearby and get a sense of what a neighborhood feels like based on what’s there.

The user’s decision criteria helps to identify additional content types, and to form connections

Many venues also so have ratings and prices. This information also presents an opportunity to connect different types of content. We can create a new content type profile for the “best of” highlights in the city.   It can show the top rated restaurants according to price category. And they can show the top rated attractions. This could be a list that links to the more detailed profiles.   We now have a way to decide how to prioritize things to do.  This content type helps users compare information about different items. 

Modeling to Help Users Act

As tourists, we now know what we want to do. But are we able to do it? 

Remember, we’ll be in Armenia. We don’t know if the familiar apps on our phones will work there. Neither of us speak Armenian, so making phone calls seems intimidating. We need a way to make sure we can actually do the things we’ve decided we want to do.

For the content to really support our visit, we want the content to give us peace of mind about the risks and disappointments we worry about. We don’t want to waste our time unnecessarily — or worse, find that we can’t do things that we had planned on doing.  We want to avoid a long queue at the museum. We want to make sure that we can get a table at a well-known restaurant.  We went to go to a show, without having to visit the venue before hand to buy the ticket.

When we consider the actions users want to take after consulting the content, we can find additional points of integration.

These needs suggest additional features that can be added to the content model. We want the ability to buy tickets after we decide to visit a museum or club. We want to be able to make reservations for a restaurant.  We want a booking widget.  A tourist website can create a widget that connects to outside services that enable these actions.  In some cases, the website can pull content from other sources to give readers the ability to see whether or not and option is available at a particular point in time.

Helping people act sometimes entails thinking about content beyond the content you’ve created yourself.   It can involve integrating with partners.

The Three Steps of Content Modeling

 This post is necessarily a very high-level and incomplete overview of content modeling.  There are many more possibilities that could be added, such as including a calendar of events and special offers.  But my goal here has been to provide some simple guidance about how to model content. 

The three steps to creating a user centric content model are:

  1. Identify the topics  that users need to know about, and what specifically about those topics matter to users
  2. Identify the criteria that users have when making decisions while consulting this content
  3. Identify what actions users want to take after consulting the content, and what additional information or features can be added to help them

This process can surface connections between different chunks of information, and help to ensure that the content model supports the customer’s journey.

— Michael Andrews

The post User Centric Content Models appeared first on Story Needle.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Many useful approaches are available to support the structuring of content.  It’s important to understand their differences, and how they complement each other.  I want to consider content structuring in terms of a spectrum of approaches that address different priorities. 

Only a few years ago most discussion about structuring content focused on the desirability of doing it.  Now we are now seeing more written about how to do it, spawning discussion around topics such as content models, design patterns, templates, message hierarchies, vocabulary lists, and other approaches.  All these approaches contribute to structuring content.  Structuring content involves a combination of human decisions, design methods, and automated systems. 

Lately I’ve been thinking about how to unlock the editorial benefits of content models, which are generally considered a technical topic. I realized that discussing this angle could be challenging because I would need to separate general ideas about structuring content from concepts that are specific to content models.  While content models provide content with structure, so do other activities and artifacts.  If the goal is to structure content, what’s unique about content models?   We need to unpack the concept of structuring content to clarify what it means in practice.

Yet structuring content is not the true end goal.  Structuring content is simply a means to an end.  It’s the benefits of structuring content that are the real goal.  The expected benefits include improved consistency, flexibility, scaleability, and efficiency. Ultimately, the goal should be to deliver more unique and distinctive content: tailored and targeted to user needs, and reflecting the brand’s strategic priorities.  

 In reality, content structuring is not a thing.  It’s an umbrella term that covers a range of things that promote benefits, such as content consistency and so on.  Many approaches contribute.  But no single can approach claim “mission accomplished.”  

What are we talking about, exactly?

People with different roles and responsibilities talk about structuring content.  It sometimes seems like they are talking about different surface features of a giant pachyderm.

The Guardian newspaper earlier this month published an article on how they use “a content model to create structured content”.  

“Structured content works well for a media company like The Guardian, but the same approach can work for any organization that publishes large amounts of data on multiple platforms. Whether you’re a government department, art gallery, retailer or university.”

The Guardian

I applaud these sentiments, and endorse them enthusiastically.  Helpfully, the article provided tangible examples of how structuring content can make publishing content easier.  But the article unintentionally highlighted how terminology on this topic can be used in different ways.  The article mentions content reuse as a benefit of content structuring.  But the examples related more to republishing finished articles with slight modification, rather than reusing discrete components of content to build new content. When the writer, a solutions architect, refers to a content type, he identifies video as an example.  Most content strategists would consider video a content format, not a content type.  Similarly, when the article illustrates the Guardian’s content model, it looks very limited in its focus (a generic article) — much more like a content type than a full content model.  

Mike Atherton commented on twitter that the article, like many discussions of content structuring, didn’t address distinctions between “presentation structure vs semantic structure, how the two are compatible or, indeed, different, and whether they can or should be captured in the same model.”  

Mike raises a fair point: we often talk about different aspects of structure, without being explicit about what aspect is being addressed.

I think about structure as a spectrum. As yet there’s no Good Housekeeping Seal Of Approval on the one right way to structure content.  Even people who are united in enthusiasm for content structure can diverge in how they discuss it — as the Guardian article shows.  I know other people use different terminology, define the same terminology in different ways, and follow slightly different processes.  That doesn’t imply others are wrong.  It merely suggests that practices are still far from settled.  How an organization uses content structuring will partly depend on the kind of content they publish, and their specific goals.  The Guardian’s approach makes sense for their needs, but may not serve the needs of other publishers.  

For me, it helps to keep the focus on the value of each distinct kind of decision offers.  For those who write simple articles, or write copy for small apps that don’t need to be coordinated with other content, some of these distinctions won’t be as important.  Structure becomes increasingly important for enterprises trying to coordinate different web-related tasks.  The essence of structure is repeatability.  

The Spectrum of content structuring

The structuring of content needs to support different decisions.   

Structure brings greater precision to content. It can influence five dimensions:

  1. How content is presented
  2. What content is presented
  3. Where content is presented
  4. What content is required
  5. What content is available

Some of these issues involve audience-facing aspects, and others involve aspects handled by backend systems.  

Different aspects of content structure

Content doesn’t acquire its structure at one position along this spectrum.  The structuring of content happens in many places.  Each decision on the spectrum has a specific activity or artifact associated with it. The issues addressed by each decision can be inter-related.  But they shouldn’t become entangled, where it is difficult to understand how each influences another.  

UI Design or Interaction Design

UI design is not just visual styling.  Interaction design shapes the experience by structuring micro-tasks and the staging of information. From a content perspective, it’s not so much about surface behaviors such as animated transitions, but how to break up the presentation of content into meaningful moments.  For example, progressive disclosure, which can be done using CSS, both paces the delivery of content and directs attention to specific elements within the content.  Increasingly, UX writers are designing content within the context of a UI design or prototype.  They need understand the cross-dependences between the behavior of content and how it is understood and perceived.  

The design of behavior involves the creation of structure.  Content needs to behave predictably to be understandable.  UI design leverages structure by utilizing design patterns and design libraries.    

Content Design

Content design encompasses the creation and arrangement of different long and short messages into meaningful experiences. It defines what is said.  

Content design is not just about styling words.  It involves all textual and visual elements that influence the understanding and perception of messages, including the interaction between different messages over time and in different scenarios.  Words are central to content design; some professionals involved with content design refer to themselves as UX writers. Terminology is finely tuned and controlled to be consistent, clear, and on-brand. 

Writers commonly break content into blocks of text.  They may use a simple tool like Dropbox paper to provide a “distraction free” view of different text elements that’s unencumbered by the visual design.  It may look a bit like a template (and is sometimes referred to as one), but it’s purpose is to help writers to plan their text, rather than to define how the text is managed.  The design of content relies heavily on the application of implicit structure.  Audiences understand better when they are comfortable knowing what they can expect.  The design may utilize a message hierarchy (identifying major and minor messages), or voice and tone guidelines that depend on the scenario in which the writing appears.  For the most part these implicit structures are managed offline through guidelines for writers, rather than through explicit formal online systems.  But some writers are looking to operationalize these guidelines into more formal design systems that are easier and more reliable to use.  

Content design involves delivering a mix of the fresh and the familiar.  The content that’s fresh, that talks about novel issues or delivers unique or distinctive messages, is unstructured — it doesn’t rely on pre-existing elements.   Messages that are familiar (recycled in some way) have the possibility of becoming structured elements.  Content design thus involves both the creation of elements that will be reused (such as feedback messaging), and ad hoc content that will be specific to given screen.  But even ad hoc elements present the opportunity reuse certain phrases and terminology so that it is consistent with the content’s tone of voice guidelines.   Some publishers are even managing strings of phrases to reuse across different content.

Page Templates

Templates provide organizational structure for the content — for example, prioritizing the order of content, and creating a hierarchy between primary and secondary content.  The template defines the elements will be consistent for any content using the template, in contrast to the interaction design, which defines the elments that will be fluid and will change and respond to users as they consume the content.  

Templates provide slots to fill with content. Page templates specify HTML structure, in contrast to the drafting templates writers use to design specific content elements.   Page templates express organizational structure, such as where an image should be placed, or where a heading is needed. The template doesn’t indicate what each heading says, which will vary according to the specifics of the content.  Templates can sometimes incorporate fixed text elements, such as copyright notice in the footer of the page, if they are specific to that page and are unlikely to change.  The critical role that templates play is that they define what’s fixed about a page that the audience will see.  Templates provides the framework for the layout of the content, allowing other aspects of the content to adjust.  

Layout has a subtle effect on how content is delivered and is accessed across different screens.  Elements that are obvious on some screen sizes may not be so on other screen sizes — for example, a list of related articles, or a cross-promotion.  Page templates must address how to make core information consistently available.  

Content Types

Content types indicate what kinds of information and messages audiences need to see to satisfy their goals.  The more specific the audience goal, the most specific the content type is likely to be. For example, many websites have an “article” content type, that only a few basic attributes, such as title, author and body.  Such types aren’t associated with any specific goal.  But a product profile on an e-commerce website will be much more specific, since different elements are important to satisfying user needs for them to decide to buy the product.  The more specific a content type, the more similar each screen of content based on it will seem, even though the specific messages and information will vary. Content types provide consistency in the kinds of information presented for a given scenario.

Content types are designed for a specific audience who has a specific goal. It specifies: to support this purpose, this information must be presented.  It answers: what elements of content needs to be delivered here for this scenario?  One of the benefits of a content type is that it can provide options to show more details, fewer details, or different details, according to the audience and scenario. 

Content types also encode business rules about the display of content. In doing so, they provide the logical structure of content.   If the content model already has defined the specifics of required information, it can pre-populate the information — enabling the reuse of content elements.  

Content Models

Content models indicate the elements of content that are available to support different audiences and scenarios.  They specify the specific kinds of messages the publisher has planned to use across different content.  They specify the semantic structure of the content — or put more simply, how different content elements are related to each other in their meaning.

Content is built from various kinds of messages associated with different topics and having different roles, such as extended descriptions, instructions, calls-to-action, value propositions, admonitions, and illustrations.  The content model provides a overview of the different kinds of essential messages that are available to build different versions and variations of content.  

In some respects, a content model is analogous to a site map.  A site map provides external audiences and systems a picture of the content published on a website.  A content model provides a map of the internal content resources that are available for publication.  But instead of representing a tree of web pages like a site map, the content model presents constellation of  “nodes”  that indicate available information resources.  A node is a basic unit of content that part of and connected to the larger structure of content.  They correspond to a content elements within published content — the units of content described within a pair of HTML tags.

Each node in a content model represents a distinct unit of content covering a discrete message or statement of information. Nodes are connected to other nodes elsewhere.  A node may be empty (authors can supply any message provided it relates to the expected meaning), or a node may be pre-populated with one or more values (indicating that the meaning will have a certain predefined message).  

Content models connect nodes by identifying the relationships between them —  how one element relates to another.  It can show how different nodes are associated, such as what role one node has to another.  For example, one node could be part of another node because is a detail relating to a larger topic.  The relationships provide pathways between different nodes of content.  

Content models are more abstract than other approaches to structuring content, and can therefore be open to wider interpretation about what they do.  The content model represents perhaps the deepest level of content structure, capturing all reusable and variable content elements. 

No single model, template or design system

No single representation of content structure can effectively depict all its different aspects.  I haven’t seen any single view representation that supports the different kinds of design decisions required.  For example, wireframes mix together fixed structures defined by templates with dynamic structures associated with UI design.  When content is embedded within screen comps, it is hard to see which elements are fixed and which are fluid.  Single views promote a tunnel focus on a specific decision, but block visibility into larger considerations that may be involved.  I’ve seen various attempts to improve wireframes to make them more interactive and content-friendly, but the basic limitations remain.

Consider a simple content element: an alert that tells a customer that their subscription is expiring and that they need to submit new payment details.  UI design needs to consider how the alert is delivered where it is noticed but not annoying.  Content design needs to decide on whether to use an existing alert, or write a new one.  The template must decide where within a  page or screen the alert appears.  The content type will specify the rules triggering delivery of the alert: who gets it, and when. And the content model may hold variations of the alert, and their mappings to different content types that use them.  You need a better alert, but what do you need to change?  What should stay the same, so you don’t mess up other things you’ve worked hard to get right?

Such decisions require coordination; different people may be responsible for different aspects. Not only must decisions and tasks be coordinated across people, they must be coordinated across time.  Those involved need to be aware of past decisions, easily reuse these when appropriate, and be able to modify them when not.  Agility is important, but so is governance.

A benefit of content structure is that it can accelerate the creation and delivery of content.  The challenge of content structure is that it’s not one thing.  There are different approaches, and each has its own value to offer.   Web publishers have more tools than ever to solve specific problems. But they still need truly integrated platforms that help web teams coordinate different kinds of decisions relating to specifying and choosing content elements. 

— Michael Andrews

The post Where does Content Structuring Happen? appeared first on Story Needle.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

There’s a approach to content management that is being used, but doesn’t seem to have a name.  Because it lacks a name, it doesn’t get much attention.  I’m calling this approach tailless content management — in contrast to headless content management.  The tailless approach, and the headless approach, are trying to solve different problems.

What Headless Doesn’t Do

Discussion of content management these days is dominated by headless CMSs.   A crop of new companies offer headless solutions, and legacy CMS vendors are also singing the praises of headless.  Sitecore says: “Headless CMSs mean marketers and developers can build amazing content today, and—importantly—future-proof their content operation to deliver consistently great content everywhere.”  

In simple terms, a headless CMS strips functionality relating to how web pages are presented and delivered to audiences.  It’s supposed to let publishers focus on what the content says, rather than what it looks like when delivered.  Headless CMS is one of several trends to unbundle functionality customarily associated with CMSs.  Another trend is moving the authoring and workflow functionality into a separate application that is friendlier to use.  CMS vendors have long touted that their products can do everything needed to manage the publication of content.  But increasingly content authors and designers are deciding that vendor choices are restricting, rather than helpful.  CMSs have been too greedy in making decisions about how content gets managed.  

“Future-proof” headless CMSs may seem like the final chapter in the evolution of the CMS.  But headless CMSs are still very rigid in how they handle content elements.  They are based on the same technology stack (LAMP) that’s obliquely been causing problems for publishers over the past two decades.   In nearly every CMS, all audience-facing factual information needs to be described as a field that’s attached to a specific content type.  The CMS may allow some degree of content structuring, and the ability to mix different fragments of content in different ways.  But they don’t solve important problems that complex publishers face: the ability to select and optimize alternative content-variables, to use data-variables across different content, and to create dynamic content-variables incorporating data-variables.   To my mind, those three dimensions are the foundation for what a general-purpose approach to content engineering must offer.  Headless solutions relegate the CMS to being an administrative interface for the content.  The CMS is a destination to enter text.  But it does a poor job supporting editorial decisions, and giving publishers true flexibility.   The CMS design imposes restrictions on how content is constructed.  

Since the CMS no longer worries about the “head”, headless solutions help publishers focus on the body.  But the solution doesn’t help publishers deal with a neglected aspect: the content’s tail.

Content’s ‘Tail’

Humans are one of the few animals without tails.  Perhaps that’s why we don’t tend to talk about the tail as it relates to content.  We sometimes talk about the “long tail” of information people are looking for.  That’s about as close as most discussions get to considering the granular details that appear within content. The long tail is a statistical metaphor, not a zoological one.  

Let’s think about content management as having three aspects: the head at the top (and which is top of mind for most content creators), the body in the middle (which has received more attention lately), and the tail at the end, which few people think much about. 

The head/body distinction in content is well-established.  The metaphor needs to be extended to include the notion of a tail.  Let’s breakdown the metaphor:

  • The head — is the face of the content, as presented to audiences.  
  • The body — are the organs (components) of the content.  Like the components of the human body (heart, lungs, stomach, etc.) the components within the body of content each should have a particular function to play.
  • The tail — are the details in the content (mnemonic: deTails).  The tail provides stability, keeping the body in balance

 In animals, tails play an important role negotiating with their surroundings.  Tails offer balance.  They swat flies.  They can grab branches to steady oneself.  Tails help the body adjust to the environment.  To do this, tails need to be flexible. 

Details can be the most important part of content, just as the tails of some animals are main event. In a park a kilometer from my home in central India, I can watch dozens of peacocks, India’s national bird.  Peacocks show us that tails are not minor details.

When the tail is treated as a secondary aspect of the body, its role gets diminished.  Publishers need to treat data as being just as important as content in the body.  Content management needs to consider both customer-facing data and narrative content as distinct but equally important dimensions.  Data should not be a mere appendage to content. Data has value in its own right.  

With tailless content management, customer-facing data is stored separately from the content using the data.  

The Body and the Details

The distinction between content and data, and between the body and the detail, can be hard to grasp.  The architecture of most CMSs don’t make this distinction, so the difference doesn’t seem to exist.  

CMSs typically structure content around database fields.   Each field has a label and an associated value.  Everything that the CMS application needs to know gets stored in this database.  This model emerged when developers realized that HTML pages had regular features and structures, such as having titles and so on. Databases made managing repetitive elements much easier compared to creating each HTML page individually.

The problem is that a single database is trying to many different things at once.  It can be:

1. Holding long “rich” texts that are in the body of an article

2. Holding many internally-used administrative details relating to articles, such as who last revised an article

3. Holding certain audience-facing data, such as the membership services contact telephone number and dates for events

These fields have different roles, and look and behave differently.  Throwing them together in a single database creates complexity.  Because of the complexity, developers are reluctant to add additional structure to how content is managed.  Authors and publishers are told they need to be flexible about what they want, because the central relational database can’t be flexible.  What the CMS offers should be good enough for most people.  After all, all CMSs look and behave the same, so it’s inevitable that content management works this way.

Something perverse happens in this arrangement.  Instead of the publisher structuring the content so it will meet the publisher’s needs, the CMS’s design ends up making decisions about if and how content can be structured.

Most CMSs are attached to a relational database such as mySQL.  These databases are a “kitchen sink” holding any material that the CMS may need to perform its tasks.  

To a CMS, everything is a field.  They don’t distinguish between long text fields that contain paragraphs or narrative content that has limited reuse (such as a teaser or the article body) from data fields with simple values that are relevant across different content items and even outside of the content.  CMSs mix narrative content, administrative data, and editorial data all together. 

A CMS database holds administrative profile information related to each content item (IDs, creation dates, topic tags, etc). The same database is also storing other non-customer facing information that’s more generally administrative such as roles and permission.   In addition to the narrative content and the administrative profile information, the CMS stores customer-facing data that’s not necessarily linked to specific content items. This is information about entities such as products, addresses of offices, event schedules and other details that can be used in many different content items.  Even though entity-focused data can be useful for many kinds of content, these details are often fields of specific content types.  

The design of CMSs reflects various assumptions and priorities.  While everything is a field, some fields are more important than others.  CMSs are optimized to store text, not to store data.  The backend uses a relational database, but it mostly serves as a content repository. 

Everyday Problems with the Status Quo

Content discusses entities.  Those entities involve facts, which are data.  These facts should be described with metadata, though they frequently are not.

A longstanding problem publishers face is that important facts are trapped within paragraphs of content that they create and publish.  When the facts change, they are forced to manually revise all the writing that mentions these facts.  Structuring content into chunks does not solve the problem of making changes within sentences.  Often, factual information is mentioned within unique texts written by various authors, rather than within a single module that is centrally managed.  

Most CMSs don’t support the ability to change information about an entity so that all paragraphs will update that information. 

Let’s consider an example of a scenario that can be anticipated ahead of time.  A number of paragraphs in different content items mention an application deadline date.  The procedure for applying stays the same every year, but the exact date by which someone must apply will change each year.  The application deadline is mentioned by different writers in different kinds of content: various announcement pages, blog posts, reminder emails, etc. In most CMSs today, the author will need to update each unique paragraph where the application is mentioned.  They don’t have the ability to update each mention of the application date from one place.   

Other facts can change, even if not predictably.  Your community organization has for years staged important events in the Jubilee Auditorium at your headquarters.  Lots of content talks about the Jubilee Auditorium.  But suddenly a rich donor has decided to give your organization some money.  To honor the donation, your organization decides to rename Jubilee Auditorium to the Ronald L Plutocrat Auditorium.  After the excitement dies down, you realize that more than the auditorium plaque needs to change.  All kinds of mentions of the auditorium are scattered throughout your online content.  

These examples are inspired by real-life publishing situations.   

Separating Concerns: Data and Content

Contrary to the view of some developers, I believe that content and data are different things, and need to be separated.

Content is more like computer code than it’s like data.  Like computer code, content is about language and expression.  Data is easy to compare and aggregate.  Its values are tidy and predictable.  Content is difficult to compare: it must be diff’d.  Content can’t easily be aggregated, since most items of content are unique.

Each chunk of content is code that will be read by a browser.  The body must indicate what text gets emphasis, what text has links, and what text is a list.  Content is not like data generally stored in databases. It is unpredictable. It doesn’t evaluate to standard data types. Within a database, content can look like a messy glob that happens to have a field name attached to it.

The scripts that a CMS uses must manipulate this messy glob by evaluating each letter character-by-character.  All kinds of meaning are embedded within a content chunk, and some it is hard to access.  

The notion that content is just another form of data that can be stored and managed in a relational database with other data is the original sin of content management.  

It’s considered good practice for developers to separate their data from their code.  Developers though have a habit of co-mingling the two, which is why new software releases can be difficult to upgrade, and why moving between software applications is hard to do.

The inventor of the World Wide Web, Tim Berners-Lee, has lately been talking about the importance of separating data from code, “turning the way the web works upside-down.”  He says: “It’s about separating the apps from the data.”

In a similar vain, content management needs to separate data from content.  

Data Needs Independence

We need to fix the problem with the design of most CMSs, where the tail of data is fused together to the spine of the body.  This makes the tail inflexible.  The tail is dragged along with the body, instead of wagging on its own.  

Data needs to become independent of specific content, so that it can be used flexibly.  Customer-facing data needs to be stored separately from the content that customers view.  There are many reasons why this is a good practice.   And the good news is it’s been done already.

Separating factual data from content is not a new concept.  Many large ecommerce websites have a separate database with all their product details that populates templates that are handled by a CMS.  But this kind of use of specialized backend databases is limited in what it seeks to achieve.  The external database may serve a single purpose: to populate tables within templates.  Because most publishers don’t see themselves as data-driven publishers the way big ecommerce platforms are, they may not see the value of having a separate dedicated backend database.  

Fortunately there’s a newer paradigm for storing data that is much more valuable.  What’s different in the new vision is that data is defined as entity-based information, described with metadata standards.  

The most familiar example of how an independent data store works with content is Wikipedia.  The content we view on Wikipedia is updated by data stored in a separate repository called Wikidata.  The relationship between Wikipedia and Wikidata is bidirectional.  Articles mention factual information, which gets included in Wikidata.  Other articles that mention the same information can draw on the Wikidata to populate the information within articles.

Facts are generally identified with a QID.  The identifier Q95 represents Google.  Google is a data variable.  Depending on the context, Google can be referred to by Google Inc. (as a joint-stock company until 2017) Or Google LLC (as a limited liability company beginning in 2017).  As a data value, the company name can adjust over time.  Editors can also change the value when appropriate.  Google became a subsidiary of Alphabet Inc. (Q20800404) in 2015.  Some content, such as relating to financial performance will address that entity starting in 2015.  Like many entities, companies change names and statuses over time.

How Wikipedia accesses Wikidata. Source: Wikidata

As an independent store of data, Wikidata supports a wide variety of articles, not just one content type.  But its value extends beyond its support for Wikipedia articles.  Wikidata is used by many other third party platforms to supply information.  These include Google, Amazon’s Alexa, and the websites of various museums.

While few publishers operate of the scale of Wikipedia, the benefits of separating data from content can be realized on a small scale as well.  An example is offered by the popular static website generator package called Jekyll, which is used by Github, Shopify, and other publishers.  A plug in for Jekyll lets publishers store their data in the RDF format — a standard that offers significant flexibility.  The data can be inserted into web content, but is a format where it can also be available for access by other platforms. 

Making the Tail Flexible

Data needs to be used within different types of content, and across different channels — including channels not directly controlled by the publisher.

The CMS-centric approach, tethered to a relational database, tries to solve these issues by using APIs.  Unfortunately, headless CMS vendors have interpreted the mantra of “create once, publish everywhere” to mean “enter all your digital information in our system, and the world will come to you, because we offer an API.”  

Audiences need to know simple facts, such as what’s the telephone number for member services, in the case of a membership organization.  They may need to see that information within an article discussing a topic, or they may want to ask Google to tell them while they are making online payments.  Such data doesn’t fit into comfortably into a specific structured content type.  It’s too granular.  One could put it into a larger contact details content type, but that would include lots of other information that’s not immediately relevant.  Chunks of content, unlike data, are difficult to reuse in different scenarios.  Content types, by design, are aligned with specific kinds of scenarios. But defined content structures used to build content types are clumsy supporting general purpose queries or cross-functional uses.    And it wouldn’t help much to make the phone number into an API request.  No ordinary publisher can expect the many third party platforms to read through their API documentation in the event that someone asks their voice bot service about a telephone number.

The only scaleable and flexible way to make data available is to use metadata standards that third party platforms understand.  When using metadata standards, special a API isn’t necessary.  

An independent data store (unlike a tethered database) offers two distinct advantages:

1. The data is multi-use, for both published content and to support other platforms (Google, voice bots, etc.)

2.  The data is multi-source, coming from authors who create/add new data, from other IT systems, and even from outside sources

The ability of the data store to accept new data is also important.  Publishers should grow their data so that they can offer factual information accurately and immediately, wherever it is needed.  When authors mention new facts relating to entities, this information can be added to the database.   In some cases authors will note what’s new and important to include, much like webmasters can note metadata relating to content using Google’s Data Highlighter tool.  In other cases, tools using natural language processing can spot entities, and automatically add metadata.  Metadata provides the mechanism by which data gets connected to content. 

Metadata makes it easier to revise information that’s subject to change, especially information such as prices, dates, and availability.  The latest data is stored in the database, and gets updated there.  Content that mentions such information can indicate the variable abstractly, instead of using a changeable value.  For example: “You must apply by {application date}.”  As a general rule, CMSs don’t make using data variables an easy thing to do.

A separate data store makes it simpler to pull data coming from other sources.  The data store describes information using metadata standards, making is easy to upload information from different sources.  With many CMSs, it is cumbersome to pull information from outside parties.  The CMS is like a bubble.  Everyone may work fine as long you as you never want to leave the bubble.  That’s true for simple CMSs such as WordPress, and for even complex component CMSs (CCMSs) that support DITA.  These hosts are self-contained.  They don’t readily accept information from outside sources.  The information needs to be entered into their special format, using their specific conventions.  The information is not independent of the CMS.  The CMS ends of defining the information, rather than simply using it.

A growing number of companies are developing enterprise knowledge graphs — their own sort of Wikidata. These are databases of the key facts that a company needs to refer to.  Companies can use knowledge graphs to enhance the content they publish.  This innovation is possible because these companies don’t rely on their CMS to manage their data.

— Michael Andrews

The post Tailless Content Management appeared first on Story Needle.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Content models have been around for a couple of decades. Until recently, only a handful of people interested in the guts of content management systems cared much about them.  Lately, they are gaining more attention in the information architecture and content strategy communities, in part due to Carrie Hane’s and Mike Atherton’s recent book, Designing Connected Content.  The growing attention to content models is also revealing how the concept can be interpreted in different ways as content models get mixed into broader discussions about structured content.  And ironically, at a time when interest in content models is growing, some of the foundational ideas about content models are aging poorly.   I believe the role of content models needs to be redefined so they can better serve changing needs.

This post will cover several issues:

  1. How content models can be poorly designed
  2. Defining content models in terms of components
  3. Criteria for what content should be included in a content model
  4. Editorial benefits of content models

In my previous post, I argued for the benefits of separating 1.) the domain model covering data about entities appearing within content from 2.) the content model governing expressive content.  This post will discuss what content models should do. Content models have greater potential than they offer currently.  To realize their full potential, content models will require redefining their purpose and construction, taking out pieces that add little value and adding in new pieces that can be useful.

Content Models in Historical Perspective

Bob Boiko’s highly influential Content Management Bible, published in 2002, provides one of the first detailed explanations of content models (p. 843):

“Database developers create data models (for database schema). These models establish how each table in the database is constructed  and how it relates to other tables in the database.  XML developers create DTDs (or XML Schema).  DTDs establish how each element in the XML file is constructed and how it relates to other elements in the file.  CMS developers create content models that serve the same function — they establish how each component is constructed and how it relates to the other components in the system.”

Content models haven’t changed much since Boiko wrote that nearly two decades ago.  Indeed, many CMSs haven’t changed much either during that time. (Many of today’s popular CMSs date from around the time Boiko wrote his book.)  Boiko likens the components of a content model to the tables of a database.  Boiko implies through his analogy that a content model is the schema of the content —  because CMSs historically have served as a monolithic database for content.  While content strategists may be inclined to think about all content comes from the CMS, that’s is no longer entirely true.  Two significant developments have eroded the primacy of the CMS:  first APIs, and more recently graph databases that store metadata.  While very different, both APIs and graph databases allow data or text to be accessed laterally, often with a simple “GET” command, instead of requiring a traversal of a hierarchy of attributes used in XML and the HTML DOM.  

APIs allow highly specific information to be pulled in from files located elsewhere, while graphs allow different combinations of attributes to be stitched together as and when they are needed.  Both are flexible “just-in-time” ways of getting specific information directly from multiple sources. Even though content may now come from many sources, content models are still designed as if they were tables in a traditional database.  The content model is not a picture of what’s in the CMS.  The CMS is no longer a single source repository of all content resources.  Content models need to evolve.

Content structure does not entirely depend on arranging material into hierarchies.  Hierarchies still have a role in content models, but they are often over-emphasized.  It’s not so important to represent in a content model how information gets stored, as may have been true in the past. It’s more important to represent what information is needed.  

Content models have great potential to support editorial decision making. But existing forms of content models don’t really capture the right elements.  They may specify chunks of content that don’t belong in a content model.  

How Content Models can be poorly designed

Content models reflect on the expertise and judgments of those designing them.  Designers may have sightly different ideas about what a content model represents, and why elements are included in a content model.   Content models may capture the wrong elements.  They sometimes can include too much structure.  When that’s the case, it creates unnecessary work, and sometimes makes the content less flexible.

Many discussions about content models will refer to the relationship between chunks of content or blocks of text.  These relationships are likened to the fields used in a database.  Such familiar terms avoid wonky jargon.  But they can be misleading.   

Photo by Neil Martin from Pexels

Many content strategists talk about chunks as the building blocks of content models.  It’s common to refer to the structuring of content as “chunking” it.  It’s accessible as a metaphor — we can think visually about a chunk, like a chunk of a chocolate bar.  But the metaphor is still abstract and can evoke different ideas about when and why chunking content is desirable.  People don’t agree what a chunk actually is — and they may not even realize they are disagreeing.  At least three different perspectives about chunks exist:

  1. Chunks as cohesive units of information  — they are independent of any context (the Chunks of Information perspective)
  2. Chunks as discrete segments that audiences consume — they depend on the audience context (The Chunks of Meaning perspective)
  3. Chunks as elements or attributes managed by the CMS — they depend on the IT context (The Chunks of a Database perspective)

Each of these perspectives is valuable, but slightly different.  While it is useful to address all these dimensions, it can be hard to optimize for all three together.   Each perspective assumes a different rationale for why content is structured.  

“Chunks of Information” Perspective 

When chunks are considered units of information, it can lead to too much structuring in the content model.   Because the chunk is context-independent, it can be loaded down with lots of details just in case they are needed in a certain scenario.  Unless specific rules are written for every case when the chunk is displayed, all the information gets displayed to audiences that’s in the chunk.  In many cases that’s overkill; audiences only want some of the information.  Chunks get overloaded with details (often nested details) — the content model is trying to manage field-level information that belongs in the domain model and that should be managed with metadata vocabularies.  Metadata vocabularies allow rich data to be displayed as and when it is needed (see chapter 13 of my new book, No More Silos: Metadata Strategy for Online Publishers).  Content models, in contrast, often expose all the data all the time.  

Another symptom of too much detail is when chunks get broken out to add completeness.  Some content models apply the MECE standard: mutually exclusive, collectively exhaustive.  While logically elegant, they make assumptions about what content is actually needed.  For example, on a recipe website, each recipe might indicate any allergens associated with the ingredients.  That’s potentially useful content. One can filter out recipes that have offending allergens.  But it doesn’t follow that each allergen deserves its own profile, indicating all the recipes that contain peanuts, for example.  

Sometimes content models add such details because one can, and because it seems like it would be more complete to do so.  It can lead to page templates that display content sorted according to minor attributes that deliver little audience or business value.  The problem is most noticeable when the content aims to be encyclopedic, presenting all details in any combination.  Some content models promote the creation of collections of lists of things that few people are interested in.  Content models are most effective when they identify content to pull in where it is actually needed, rather than push out to someplace it might be needed.

“Chunks of Meaning” Perspective 

Focusing on audience needs sounds like a better approach to avoid unnecessary structuring.  But when editorial needs guide the chunking process, it can lead to another problem: phantom chunks in the content model.  These are pieces of content that might look like chunks in a content model.  But they don’t behave like those used in a content model.

The concept of chunks are also used in structured authoring, which has a different purpose than content modeling.  Segmenting content is a valuable approach to content planning.  Segmenting allows content to be prioritized and content to be organized around headings.  It can help improve both the authoring and audience experience.  But most segmenting is local to the content type being designed. Segments won’t be reused elsewhere, and it doesn’t reuse specific values.  Segmenting helps readers understand the whole better.  But each segement still depends on the whole to be understood.  It’s not truly an independent unit of meaning.

“Chunks of a Database” Perspective

Chunks are also viewed as elements managed by a CMS — the fields of a database.  They may be blocks of text (such as paragraphs) or as nested sets of fields (such as an address).  But blocks may not be the right unit to represent in a content model.  When defined as a block, data (entity values) gets locked into specific presentation.  When this data is baked into the content model as a block, the publisher can’t easily show only some of the data if that’s all that’s required.  

Nesting makes sense when dependencies between information elements exist.  But ideally, the model should present content elements that are independent and that can be used in flexible ways.  As mentioned in my previous post, content models can become confusing when they show the properties of entities mentioned in the content as being attributes of the content.  

When the focus of a content model is on blocks of text, it can be to the exclusion of other kinds of elements such as photos, links to video or audio, or message snippets.  Moreover, only certain kinds of text blocks are likely to be components in a content model.  Not all blocks of text get reused.  And not all text that gets reused is a block.  

Generally, long blocks of text are difficult to reuse.  They aren’t likely to vary in regular ways as would short labels.  Although it is possible to specify alternative paragraphs to present to audiences, it is not common.  The opportunity to use text blocks as components mostly arises when wanting to use the same text block in different content types to support different use cases.  

In summary, different people think about chunks different because they are motivated by different goals for structuring content.  While all those goals are valid, they are not all relevant to the purpose of content modeling.  The purpose of a content model is not to break down content. The purpose of a content model is to enable different elements of content to be pieced together in different ways.  If the model breaks the content into too many pieces, or into pieces that can’t be used widely, the model will be difficult to use.  

It is easy to break content apart.  It is much harder to piece together content elements into a coherent whole.  But if done judiciously, content models can provide richer meaning to the content delivered to audiences.  

What precisely does a Content Model represent?

Because chunks are considered in different ways, it is necessary to define the elements of a content model more precisely.  Like Boiko, I will refer to these elements as components, instead of as attributes or as blocks.    

Content models specify content components that can be presented in different ways in different contexts.  The components must be managed programmatically by IT systems.  Importantly, a content component is not a free-text field, where anything can be entered, but never to be reused.  A content model does not present potential relationships between content items. It is not a planning or discovery tool.   It should show actual choices that will be available to content creators to present to audiences.

Content components are content variables.   If the chunk isn’t a variable, it’s not a content component

Think about a content variable as a predefined variant of content.  If the content is an image of a product, the variants might be images showing different views of the product, or perhaps different sizes for the images.  The image of the product is a content component.  It is a variable value.  People conventionally think about variables as data. They should broaden  their thinking.  Content variables are any use of content where there’s an option about which version to use, or where to use it.  

A content model is useful because it shows what content values are variable.  Content values are expressive when they vary in predictable or regular ways.  

A chunk is a component only if it satisfies at least one of two criteria:

  1. The component varies in a recurring way, and/or
  2. The component will be reused in more than one context.

Content components can be either local or global variables.  Content components are local variables when used in one context only.  The component presents alternative variations, or it is optional in different scenarios.  Content components are a global variables when they can be used in different contexts.

We can summarize whether a chunk is a content component in a matrix:

Context Value is Fixed Value is Variable
Value is local to one context Not a component Component
Value is global: used in more than one context Component Component

Content components are the content model’s equivalent of a domain model’s enumerated values.  Enumerated values are the list of allowed values in a domain model (sometimes called a controlled vocabulary, or colloquially known as a pick-list value.)  Enumerated values are names of attribute choices — the names of colors, sizes, geographic regions, etc.  They are small bits of data that can be aggregated, filtered upon, and otherwise managed.   

In the case of a content model, the goal is to manage pieces of content rather than pieces of data.  Generally, the pieces of content will be larger than the data.  The components can be paragraphs or images.  These components behave differently from the data in a domain model.  One can’t filter on content values (unlike data values).   And it will be rare that one aggregates content values.  The benefit of a content variable is that one creates rules for when and where to display the component.

Let’s consider the variation in content according to three levels, which I will call repetitive, expressive, and distinctive.  These terms are just labels to help us think about how fixed or variable content is. They aren’t meant to be value judgments. 

Repetitive content refers to content that is fixed in place.  It always says the same thing in the same way in one specific context.  The meaning and the style are locked down — there’s no variation.  For example, the welcome announcement and jingle for a podcast may always be the same each week, even though the program that follows the intro will be different every week.  The welcome announcement is specific to the podcast, and is not used in other kinds of content.

Expressive content refers to how content variation changes the meaning for audiences.  It considers variation in the components chosen.  Variation can happen within components, and across different content incorporating those components.  Expressive content also resembles a term in programming known as expressions, which evaluates values.  With expressive content, the key question is knowing what’s the right value to use — choosing the right content variation.

With distinctive content, no two content items are the same.  This blog post is distinctive content, because none of the material has been reused, or will be reused.  The body of most articles is distinctive content, even if one can segment it into discrete parts.  

It’s important to recognize that a content model is just one of the tools that’s available to organize the structuring of content.  Other tools include content templates, and text editors.  

Let’s focus on the “body field” — the big blob of text in much online content.  It can be structured in different ways.  Not all editorial structuring involves content components.  An article might have a lead paragraph.  That paragraph may be guided by specific criteria. It must address who the article is for, and what benefit the article offers the reader.  But that lead is specific to the article.  It is part of the article’s structure, but not an independent structure that’s used in other contexts.

The same article might have a summary paragraph.  Unless the summary is used elsewhere, it is also not a content component.  The summary may be standalone content that could be used in other contexts, although I’ve seen plenty of examples of where that’s been done that haven’t been great user experiences.

These segments of an article help readers understand the purpose of the content, and help writers plan the content.  But they aren’t part of the content model.  Such segmentation belongs in the text editor where the content is created.

Consider a different example of a content chunk.  Many corporate press releases have an impact on the price of company shares.  Companies routinely put a “forward earnings” disclaimer at the end of each press release.  This disclaimer is only applicable to press releases, and the wording is fixed. The disclaimer is not a content component that varies or is used in other contexts.  It should be incorporated into the content template for press releases.

Kind of text Variability Where to specify or manage
Repetitive Consistent text for one context only Template — hardwired
Expressive Same text used in multiple contexts or regularly variable text used in at least one context Content Model
Distinctive All content is unique: not reused in different contexts Structured guidelines in text editor

The content model is only one tool of many available to structure content in the broader sense.  The content model only addresses variable content components.  The content model doesn’t define the entire structure of the content that audiences see.  The content model helps support templates, but doesn’t define all the elements in a template or the organization of the wireframes.  The structure of the authoring environment may draw on components available in the content model, but it will segment content characteristics that won’t be part of the content model.  

What should be included in a Content Model?

Components are meaningful objects.  They can change in meaning.  They can create new meaning when combined in different ways.  They aren’t simply empty containers or placeholders for material to present.  

Content models provide guidance for two decisions:

  • Where can a component be used? — the available contexts
  • Which variation can be used? — the available variants

The components within a content model can be of three varieties:

  1. Statements
  2. Phrases
  3. Media Assets

Statements are sentences, paragraphs or sections comprised of several paragraphs.  Structurally, they can be sections, asides, call outs, quotes, and so on.  Statements will often be long blocks of texts.  In some cases there will be variations of the blocks of text for different audience segments or regions.  Other times, there will be no variation in the text, but it will appear in more than one context.  

An example of how statements can vary in a single context would be if an explanation about customer legal rights changed depending on whether the customer was based in the US or the UK.  The substance of the content component changes.  

An example of how a statement can be used in multiple contexts is a disclosure about pricing and availability.  A publisher may need to include the statement “pricing and availability subject to change” in many different contexts.  


A content component may be a phrase or managed fragment of text.    Phrasing has become much more important, especially with the rise of UX writing.  Some wording is significant, because it relates to a big “moment of truth” for the audience: a decision they need to make, or a notification that’s important to them.  Specific phrasing may be continually refined to reflect brand voice and terminology, and to ensure customer acceptance.  It may be optimized through A/B testing.  

In contrast to variations in statements, which generally relate to differences in meaning or substance, the variation in phrasing generally relates to wording, tone, or style.

Some examples of managed phrases include:

 The recent emergence of design systems for UX writing is promoting the reuse of text phrases.  UX writing design systems can indicate a preferred messaging that conforms to editorial decisions about branding, or that performs better with audiences.  Although it is not currently common to..

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

A model is supposed to be a simplification of reality.  It is meant to help people understand and act.  But sometimes models do the opposite and cause confusion.  If the model becomes an artifact of its own, it can be hard to see its connection to what it is supposed to represent.  Over recent months, several people have raised questions on Twitter about the relationship between a domain model, a content model, and a data model.  We also may also encounter the terms ontology or vocabulary, which are also models. With so many models out there, it’s small wonder that people might be confused. 

From what I can see, no consensus yet exists on how to blend these perspectives in a way that’s both flexible and easy to understand.  I want to offer my thinking about how these different models are related.  

Part of the source of confusion is that all these models were developed by different parties to solve different problems.   Only recently has the content strategy community started to focus on how integrate the different perspectives offered by these models.   

The Different Purposes of Models

Domain models have a geeky pedigree.  They come from a software development approach known as domain-driven design (DDD).  DDD is an approach to developing applications rather than to publishing content.  It’s focused on tasks that a software application must support (behavior), and maps tasks to domain objects that are centered on entities (data).  The notion of a domain model was subsequently adopted by ontology engineers (people who design models of information.)  Again, these ontology engineers weren’t focused on the needs of web publishers: they just wanted a way to define the relationship between different kinds of information to allow the information to be queried. From these highly technical origins, domain models attracted attention in the content strategy community as a tool to model the relationships of entities that will appear in one’s content.  The critical question is, so what?  What value does a domain model offer to online publishers?  This question can elicit different and sometimes fuzzy answers.  I’ll offer my perspective in a moment.

A content model sounds similar to a domain model, but the two are different.  A content model is an abstract picture of the elements that content creators must create, which are managed by a CMS.  When content strategists talk about structuring content, they are generally referring to the elements that comprise a content model.  Where a domain model is concerned with data or facts, a content model is concerned with expressive content — the text, images, videos and other material that audiences consume.  Compared with a domain model, a content model is more focused on the experience of audiences.  Unsurprisingly, content strategists talk about content models more than they talk about domain models.  

Content models can serve two roles: representing what the audience is interested in consuming, and representing how that content is managed.   The content model can become confusing when it tries to indicate both what the machine delivering content needs to know about, as well as what the audience needs to see.  

Regrettably, the design of CMSs has trained authors to think about content elements in a certain way.  Authors decompose text articles into chunks, presented as fields in a CMS.  The content model can start to look like a massive form, with many fields available to address different aspects of a topic or theme.  Not all fields will display in all scenarios, and fields may be shared across different views of content (hence rules are needed to direct what’s shown when). It may look like a data model.  But the content model doesn’t impose strict rules about what types of values are allowed for the fields.  The values of some fields are numbers, some are pick list values.  Many fields are multiple paragraphs of text representing thousands of characters.  Some fields are links to images, audio, or to videos.  Some fields may involve values that are phrases, such as the text used on a button.  While all these values are “data” in the sense of being ones and zeros, they don’t add up to a robust data model.  That’s one reason that many developers consider content as unstructured — the values of content defy any uniformity.  

A content model is not a solid foundation for a data model about the content. The structure represented in a content model is not semantic (machine intelligible) — contrary to the beliefs of many content strategists.  Creating a content model doesn’t make the content semantic.   Structured authoring helps authors plan how different pieces of content can fit together. But author-defined structures don’t mean anything to outside parties, and most machines won’t automatically understand what the chunks of content mean.  A content model can inform a schematic of the content’s architecture, such as what content is needed and from where it will be sourced (it could come from other systems, or even external sources).  That’s useful for internal purposes.  The content model is implemented with custom code.  

The primary value of content models is to guide editorial decisions.  The content model defines content types — distinct profiles of content that address specific user purposes and goals.  A content model can specify many details, such as a short and a long description to accommodate different kinds of devices, or alternative text for different audiences in different regions.   A detailed conent model can help the content adapt to different contexts.  

Domain models are strong where content models are weak. Although domain models did not originally rely on metadata standards (e.g., in DDD), domain models increasingly have become synonymous with metadata vocabularies or ontologies.  Domain models define data models: how factual information is stored so it can be accessed. They supply one source of truth for information, in contrast to the many expressive variations represented in a content model.  Domain models represent the relationships of the data or information relating to a domain or broad subject area.  Domain models can be precise about the kinds of values expected.  Precise values are required in order to allow the information to be understood and reused in different contexts by different machines.  Because a domain model is based on metadata standards, the information can be used by different parties.  Content defined by a content model, in contrast, is primarily of use to the publisher only.   

The core value of a domain model is to represent entities — the key things discussed in content.  Metadata vocabularies define entity types that provide properties for all the important values that would provide important information.  Some entity types will reference other entity types.  For example, an event (entity type 1) takes place at a location (entity type 2).  The relationships between different entities are already defined by the vocabulary, which reduces the need for the publisher to set up special rules defining these relationships.  The domain model can suggest the kinds of information that authors need to include in content delivered to audiences.  In addition, the domain model can also support non-editorial uses of the information.  For example, it can provide information to a functional app on a smartphone.  Or it can provide factual information to bots or to search engines.  

The Boundary between Domain and Content Models

What’s the boundary between a domain model and a content model?

A common issue I’ve noticed is that model makers try to use a content type to represent an entity type. Certain CMSs aren’t too clear about the difference between content types and entity types.  One must be careful not to let your CMS force you to think in certain ways. 

Let’s consider a common topic: events.  Some content strategists consider events as a distinct content type.  That would seem to imply the content model manages all the information relating to events. But an event is actually an entity type.  Metadata standards already define all the common properties associated with an event.  There’s little point replicating that information in the content model.  The event information may need to travel to many places: to a calendar on someone’s phone, in search results, as well as on the publisher’s website which has a special webpage for events.    But how the publisher wants to promote the event could still be productively represented in the content model.  The publisher needs to think about editorial elements associated with the event, such as images and calls-to-action.

Event content contains both structured editorial content, as well as structured metadata

The domain model represents what something is, while the content model can represent what is said or how it is said.  Let’s return to the all important call-to-action (CTA).  A CTA is a user action that is monitored in analytics.  The action itself can be represented as metadata — for example, there is a “buy action” in schema.org.  Publishers can use metadata to track what products are being bought according to the product’s properties, for example, color.  But the text on the buy button is part of the content model.  The CTA phrasing can be reused on different buttons.  The value of the content model is to facilitate the reuse of expressive content rather than the reuse of information.  Content models will change, as different elements gain or lose their mojo when presented to audiences.  The elements in a content model can be tested.  The domain model, centered on factual information, is far more stable.  The values may change, but the entities and properties in the model will rarely change.

When information is structured semantically with metadata standards, a database designed around a domain model can populate information used in content.  In such cases, the domain model supports the content model.  But in other cases, authors will be creating loosely structured information, such as long narrative texts that discuss information.  In these cases, authors can annotate the text to capture the core facts that should be included.  The annotation allows these facts to be reused later for different contexts.  

Over time, more editorial components are becoming formalized as structured data defined by metadata vocabulary standards.  As different publishers face similar needs and borrow from each other’s approach, the element in the content model becomes a design pattern that’s widely used, and therefore a candidate for standardization.  For example, simple how-to instructions can be specified using metadata standards.  

The Layered Cake How domain models can support content models

One simple way to think about the two models is as layers of a cake.  The domain model is the base layer.  It manages the factual information that’s needed by the content and by machines for applications.  The content model is the layer above the domain model.  It manages all the relevant content assets (thumbnails, video trailers, diagrams, etc), all the sections of copy (introductions, call outs, quotes, sidebars, etc.) and all the messaging (button text, alternative headlines, etc.)  On the top of these layers is the icing on the cake: the presentation layer.  The presentation layer is not about the raw ingredients, or how the ingredients are cooked.  It’s about how the finished product looks.  

The distinctions I’ve made between the domain model and content model may not align with how your content management systems are set up.  But such decoupling of data and content is becoming more common.   If factual information is kept separate from expressive content, publishers can gain more flexibility when configuring how they deliver content and information to audiences.

— Michael Andrews

The post Where Domain Models Meet Content Models appeared first on Story Needle.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

FAQs are a polarizing form of content.  Some content professionals think polished FAQs can be awesome (even if many are far from being so.) Many other content professionals believe FAQs can never be awesome.  Some publishers treat FAQs as another outlet to promote the publisher’s message.  Critics view FAQs as unnecessary clutter — an admission that the design of the content is failing. 

I want to try to separate the light from the heat.  FAQs may seem like a dinosaur from the earliest days of online publishing, but they are morphing into something much more intelligent than many content designers realize.  FAQs will likely be more important in the future, not less.  But they won’t act like most FAQs used today.

The Stigma of FAQs

FAQs are so loathed that slogans about them now show up on swag.  A Twitter poll revealed that a “frequently asked” request for a coffee/tea mug slogan was “No FAQs.”  The mug was duly made, and is appearing on office desks.

A cozy cuppa. (Via Twitter)

Lisa Wright, a technical communicator who has written one of the most insightful critiques of FAQs, invokes a ghoulish specter:  “Like zombies in a horror film, and with the same level of intellectual rigor, FAQs continue to pop up all over the web.”  Be afraid: FAQs can trigger nightmares!

There are plenty of valid criticisms of FAQs, including many that get scant attention.  But there are also plenty of criticisms of FAQs that seem subjective, and merely reinforce pre-existing attitudes about them.  Dinging FAQs can be fun and can help you bond with new friends.  Mocking FAQs during a rabble rousing conference presentation is a sure crowd pleaser, although the glib assertions in such talks are hard to fact-check and challenge.  I’ve seen critiques of FAQs that say categorically that users don’t want FAQs, without providing any evidence that would allow us to evaluate how accurate, or generalizable, such a statement is.   Even data about FAQs can be difficult to evaluate.  Analytics may show FAQs are being viewed, while usability tests may show FAQs are frustrating to use.  Evaluating the value of FAQs requires a deep understanding of both content relationships and user expectations.  

To say that FAQs don’t “spark joy” would be an understatement.  FAQs carry a stigma: their existence can seem to signal  failure.  Lots of people wish they weren’t necessary. Why do customers keep asking these same questions again and again?  What’s the root cause triggering this irritation?  FAQs may not be a sign that there are problems in the content.  FAQs may be a sign that there are problems in the products that the content must explain.  Many common customer service-related questions are about low-level bugs: frequent points of friction that users encounter that vendors decide are not serious enough to prioritize fixing. People may ask: why can’t they do something on their phone that they can do on their desktop?  In an ideal world, users wouldn’t have to ask questions. Everything would work perfectly and no thinking would be necessary.  We should never give up on that aspiration. But the messy reality is that vendors ship products with bugs and limitations. Users will always want to do something that was deemed an edge case or a low priority.   The gap between what’s available and what’s expected results in a question.  

 FAQs have been around since the earliest days of the internet.  They arose because they provided a simple way for publishers to address information that audiences were seeking.  FAQs were the first feedback loop on websites, long before usability testing or A/B testing became common.  Users indicated through emails or searches the questions they had, and publishers provided answers.  FAQs aren’t the only way to reply to user questions.  But credible, relevant FAQs can signal that the publisher listens to what information people want to know. Susan Farrell of the usability research consultancy Nielsen Norman Group has concluded that “FAQs Still Deliver Great Value.”

FAQs are simply a generic content type, much like a table or video can be a generic content type or format.  Content professionals should resist the temptation to categorically dismiss FAQs as a bad or evil content type.  Few would condemn all videos as evil, even if there are plenty of examples of bad videos.  FAQs are surprising hard to do well. They have to deliver important information, highly anticipated by the user, in a concise and precise way.   

For many web publishers, FAQs are a poorly managed content type.  Why are FAQs poorly managed? FAQ pages often suffer from unclear ownership.  FAQs are located on one of the few web pages in an organization that may be used by both marketing and customer support.  That dual ownership can create a tension about the purpose of the FAQs: the kinds of questions presented, and the kinds of answers allowed.  Without clear ownership and or a clearly defined purpose, FAQ pages can become a dumping ground for random information that different parties want to publish. When that happens, FAQs are no longer about answering common customer questions; they are about exposing organizational anxieties.   FAQs can share the governance problems of another high profile web page: the home page.

FAQs can seem dishonest at times.  Not all FAQs are really “frequently asked” questions, even if they appear in a short list on a FAQ page.  True FAQs are based on real questions from real users.  Potemkin FAQs are questions that the publisher decided they wanted to talk about, or wanted to spin in a flattering light.  I’ve seen FAQs with a strong marketing focus, such as “How is your product different than company X’s product?”  Even if buyers are wondering about that, they aren’t looking for an answer on the FAQ page.  FAQs are meant to answer factual questions — not provide opinion and commentary.  FAQs are not a sales channel.  They are not a list of potential buyer objections.  Answers should answer, not sell.  

The purpose of FAQs is to prevent the user from having to dig through pages of content to get an answer to a straightforward question.  But if the user has to go digging through a long list of FAQs to find the question in order to find the answer, then the benefit of the FAQ has been nullified.  

In order to decide if FAQs are appropriate, a publisher should understand when and why customers have a question to begin with?  What triggers the question, and when is it triggered?  

FAQs respond to a user need for information. Either the information is new to the user, or it has been forgotten.  Several common scenarios arise.  If customers are familiar with other similar organizations and now are considering your organization, they may ask questions so they can make a comparison.  For example, they may want to know what’s your organization’s policy about an issue that’s  important to them.  If customers are already familiar with your organization, they may have questions about changes that may have occurred.  For example, annual changes in tax policies routinely generate many questions about how such changes affect specific situations.

Using FAQs especially makes sense when the web isn’t the primary channel of communicating with audiences.  If people are already reading your web content, there is little point having them find the FAQs if the question can be answered on the pages people are already visiting.  But many FAQ scenarios arise from non-web channel interactions.  People have an issue with a product they’ve bought, and are driven to the website looking for the answer.  The BBC’s audiences hear about something on radio or TV, and want follow up details, so they head to the website.  Someone is planning to visit a retail store, but has a question about the validity of a competitor coupon.  And sometimes people have general questions that aren’t related to a specific task.  Questions don’t always arise in the context of a web task.  Providing answers can sometimes be a precondition to starting a task.

Questions and Answers as a Content Form

Questions and answers are a fundamental way of structuring content.  Q&As are one of the oldest forms of content, and they can be traced to ancient times when content was oral.  The conversational nature of questions and answers aligns closely to the increasingly post-document character of online content

Published content should have a purpose.  Questions put a spotlight on the purpose of the content.  What question(s) does the content answer?  Does it answer the question well?  Is the question important?  

A question can be called many things.  When I lived in Britain, I discovered people made enquiries (with an “e”), which to my American ears sounded rather formal.  Living in India, I notice few people have questions, but many people have doubts.  People with doubts generally aren’t skeptical; they just want answers.  Computer geeks will speak about queries.  All these terms can be synonyms, though they can evoke subtly different connotations about intention and purpose, depending on one’s background.   Do they need a formal verdict about eligibility?  Are they confused? Are they exploring?

No matter how carefully-crafted a publisher’s content is, audiences will still have questions. Publishers will need to provide answers to those questions — as they are articulated by audiences.  Publishers can’t expect that audiences will stop asking questions.  Publishers can’t even expect that everyone will read through all their carefully crafted content.  Publishers would be presumptuous to assume that their content will answers every question audiences have, and that information sought will always be easy to find within the text.  One of the ironies of the anti-FAQ attitude is that while it claims to be audience-centric, it actually is publisher-centric.  FAQ-phobia at its worst becomes an attitude of “No Questions Allowed: we’ll decide what you need to know, and will tell you when we decide you need to know about it.”  Like a stern school teacher, the publisher doesn’t permit any participation.  

It’s helpful to compare the characteristics of FAQ with those of Q&As found in online forums.  They are similar, except that both the questions and answers in FAQs tend to be more fixed, as one party chooses both the question and the answer.  Q&As in forums tend to be more fluid.  In an open Q&A, it can be more transparent who raised the question, and users themselves may supply the answer.  Questions in a Q&A can sometimes be duplicated (a sure sign they are frequently asked) and sometimes questions mutate: people ask variants, or request an update based on new circumstances.  Answers sometimes spawn new questions.  Q&As in forums can be less efficient than FAQs at directly answering common questions, but they can be effective  surfacing what issues concern audiences, and how they are thinking about these issues.

Questions themselves can be interesting.  One can see some common questions, and think: that’s a good question!  Hadn’t thought to ask that myself, but interested to know the answer.  For example, I found these questions on the Nestlé India website:  

  • “Are the natural trans fats in dairy as harmful for the body as man-made trans fats?”  
  • “Are stir fries healthy?”

The very presence of these questions provides an indication of what customers must be chatting about online and in social media.  Questions can be the voice of the customer, if the questions are genuine.

Any form of published questions and answers involves some kind of moderation.  FAQs typically don’t offer an “Ask Me Anything” form of openness — the questions selected are chosen editorially.  But many Q&A sites allow such openness.  Quora, StackExchange, and other sites allow users to pose any question they want (consistent with their guidelines), and users vote on the value of both the question and the answers.  The success of these sites indicates that the question-and-answer format does service a useful role.

In the case of FAQs, publishers must decide which questions are common enough to merit an answer.  Who specifically are FAQs meant to address: everyone, or specific groups of individuals?  And what kinds of questions are appropriate to answer with FAQs?   These are editorial decisions, and they need to be supported with the right structure for the content.  Many FAQ problems arise from either not making clear editorial decisions, or not having the right structure in place to support the editorial decisions made.  

FAQs can sprawl if governance is lacking.  Some publishers use FAQs to broadcast information about things they think audiences should know about, even if audiences aren’t asking about them often.   They lack a process to evaluate the importance of a question to the audience.  

Many people think about FAQs as a single destination page. But some publishers have multiple FAQ pages.  When FAQs are treated as web pages, users may never even find the questions and answers.  They need to figure out two things: whether their query is a frequently asked question, and knowing where the FAQs are located.   Audiences ideally shouldn’t have to think about where the answers live.  

The content marketing software firm HubSpot seems to have over 7000 FAQ pages, covering different branded audience and task-themed areas of their website, such as:

  • Content Marketing Certification FAQ – HubSpot Academy
  • HubSpot Developers FAQ
  • Workflows | Frequently Asked Questions – HubSpot Academy
  • Frequently Asked Questions – HubSpot Design
  • HubSpot Partner Program FAQs

What a mess — how is the user supposed to know where to get an answer?  

Such sprawl is common when marketing organizations dominate the process.  Both questions and answers get framed by marketing segmentation.  The supply of answers — the stuff to talk about — drives the process, instead of the supply of questions.  That creates a risk that the FAQs don’t sound authentic.  They can sound as if a blurb about something was written, and only then was a leading question created to become its heading.  In Hubspot’s case, even the title of blog posts use the term FAQ.  While it can be appropriate to address common questions in a blog post, those shouldn’t be labelled as  FAQs.  

Some frequently asked questions  reflect customer skepticism.  For example, MeWe, a social network site, claims to be free and to respect user privacy.  They have a FAQ on how they can be free and make money.  The question seems genuine, even if the answer seems vague.

How can you be so great?

 Not only should the questions be important and relevant to many people, their answer needs to concise enough to cover the question’s scope.  Open ended questions fail that test: short answers will fail to satisfy everyone’s criteria.  Answers should not involve “it depends…” unless the answer provides onward links to explain different dimensions relating to the question.  Overly general answers can sound evasive.  

FAQs prove their value when they deliver brevity.  Audiences don’t want to wade through lengthy text to find an answer.  A classic case of a mismatch between questions and content are terms and conditions (T&Cs).  While there may be legal reasons for having a long terms and conditions document, the information is hard to access. Apple’s terms and conditions would take nine hours to read completely.  Users will have specific questions, and should be able to get specific answers without having to scan or read the T&Cs.

Publishers need to clearly convey what kind of questions are covered by FAQs.  Many users will assume frequently asked questions are perennial questions repeatedly asked by people over time —a sort of greatest hits of factoids.  But some publishers such as the BBC introduce the notion of “most popular” FAQs, which is confusing.  Many users will assume that frequent questions are popular ones.  But the BBC seems to describe lots of questions as FAQs, and then scores them by popularity, which can fluctuate.  Popular questions may relate to how to get tickets to show or purchase a calendar linked to a program.  Popular questions may be shorted lived and of interest only to a limited subgroup of visitors.  There certainly needs to be a way to address questions that become suddenly and perhaps momentarily popular, but FAQ pages are customarily static.  Many users won’t expect answers to such questions on a FAQ page.

Matching Questions with Answers: The issue of Intention

The root of question is “quest.”  Users are on a quest.  Questions arise because users need to understand something in order to do something.  It is not always clear why someone is asking a question.  Sometimes there could be more than one reason.  Some people might ask about a return policy because they want to try the product before committing.  Others ask because they are buying a gift and don’t know if the recipient will like it.  

A core design challenge for FAQs is understanding how specific or general a user intention is.  This gets into how to handle the granularity of questions and answers.  

First, let’s break down the components of the customer-publisher interaction.

The quaeritur (the question asked) needs a corresponding quaesitum (a solution).  While it sounds simple to match these two parts, in the word of online information the process is slightly more complex.  It’s actually a three stage process:

  1. Question that was asked (the user query)
  2. The question that was answered (the published question or statement)
  3. The answer that was provided (or elaboration of the statement)

There are several places where the process could go wrong.  

How user queries get connected to published answers

The user query may not match the published FAQ question.  If the user sees a list of questions on a FAQ page, they may not see a question that matches how they are thinking about an issue.  There are many reasons why the user query may not overlap with the published question.  One reason is language: the terminology in a user query could be less formal and vaguer, and the user is unable to translate how they are thinking about the question into the publisher’s terminology.  The other reason is a mismatch of scope.  Users may be looking for more specific answers, and hence questions, than appear in a FAQ’s list of questions.  Lisa Wright notes: “If a question appears to exclude the required information, the user may never click to see the answer, even if it is actually relevant.”  

The published answer may not satisfy the user goal associated with their original query.   The answer may be too general, or it may focus on details that while of interest to many, are not relevant to the specific user.  

Because of the possibilities of mismatching, publishers need to remove extraneous steps and provide onward next steps to get users toward the answer they seek.  If the user query matches an answer, it is best to show that answer directly, and not show how the publisher wrote the question, which could be broader.  Specific queries are unlikely match published questions on a FAQ page.  Users need the ability to express their own questions such as typing a search query, which can be mapped to appropriate answers.  If the user query doesn’t match an available answer, it is best to show the nearest question for which there is an answer, assuming there is one.   

Users need to know: “What questions can I ask?”   The BBC, which has a search interface to their FAQs, even has a FAQ on how to use their FAQ, a tacit admission that people aren’t sure what they can expect.

How to use the BBC’s FAQs

One of the major uncertainties associated with tell-me-what-you-want command-based queries is not knowing if one can ask a question.  Voice interfaces are frustrating when the agent responds that they don’t understand the question, or when they misinterpret the question.  Designers of query bots are developing ways to indicate query patterns a user can try that will yield useful answers.  

The BBC’s FAQs reveal the problem of matching a user query to a publisher question.  I have difficulty downloading BBC podcasts on my smartphone’s podcast app (rather than the BBC’s proprietary iPlayer app).  But my query doesn’t really match the questions available, which are either more general or are irrelevant.   My issue feels like it should be a frequently asked question.  I’d be surprised if I’m alone in having problems accessing the BBC’s DRM-clamped, regionally-restricted audio content.  

Is there a match?

The BBC’s FAQ explains: “If you don’t get the result you’re looking for then it’s likely that your question isn’t one of our Frequently Asked Questions and we..

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Reading involves work and writing is difficult.  Countless books are available on how to write.  What could be left to say?  In view of all the writing advice that’s available, it’s surprising that one topic gets scant coverage: entities.  Not many writers talk about their use of entities in their writing.  I believe entities can be powerful lens for considering writing and the reader experience.

What’s an entity?  It is not a word used much in colloquial speech.  But it’s a handy term to refer to nouns that have a specific identity.  Merriam-Webster lists some synonyms of entity: “being, commodity, individual,  object, substance, thing.”  These words used to suggest the idea of an entity may seem vague, but specific examples of entities can be concrete. Most commonly people associate entities with organizational units, such as a corporate entity.  But the term can refer to all kinds of things: people, places, materials, concepts, brands, time periods, or space aliens.   Merriam-Webster cites the following usage example: “the question of whether extrasensory perception will ever be a scientifically recognized entity.”  In this example, the term entity refers to a phenomenon that many people don’t consider real: ESP.  The characters in Harry Potter novels can be entities, as can a celebrity or a football team.  

Perhaps the easiest way to think about an entity is to ask if something would have an entry in the encyclopedia — if so, it is likely an entity.  Entities are nouns referring to a type of thing  (a category, such as mountains), or a specific individual example of a thing (a proper noun, such as the Alps).  Not all nouns are entities: they need to specific and not generic.  A window would probably be too generic to be an entity — without further information, the reader won’t think too much about it.  A double glazed window, or a window on the Empire State Building, would be an entity, because there’s enough context to differentiate it from generic examples.  Windows as a category could be an entity, since they can be considered in terms of their global properties and variations: frosted windows, stain-glass windows, etc.  While there is no hard and fast rule about what’s considered an entity, the more salient something is in the text, the more likely it is be an entity.  A single mention of a generic window would not be an entity, but a longer discussion of windows as an architectural feature would be.

Entities are interesting in writing because they carry semantic meaning (as opposed to other kinds of meaning such as mood or credibility.)  Entities in writing make writing less generic.  They overlap with the concept of detail in writing.  But the role that entities play is different from making writing vivid by providing detail.  Entities are the foreground of the writing, not the background.  Many details in writing such as the brand of scarf that a protagonist wears are not terribly important.  Details are background color and in some writing are extraneous.  Entities, in contrast, are the key factual details mentioned in the text.  They can be central the content’s meaning.

Ease of reading and understanding

Clarity is an obsession of many writers.  Entities can play an important role in clarity.

I became more mindful of the role of entities in writing while reading a recent book of jazz criticism by Nate Chinen.  I enjoy learning about jazz, and the writer is very knowledgeable on the subject. He personally knows many of the people he writes about, and can draw numerous connections between artists and their works.  Yet the book was difficult to read.  I realized that the book talked about too many entities, too quickly.  A single sentence could mention artists, works, dates, places, music style, and awards.  While I know a bit about jazz, my mind was often overloaded with details, some of which I didn’t understand completely.  I felt the author was at times was “name checking” by dropping names of people and things he knew and that the reader should be impressed he knew.

Chinen created what I’ll call “dense content” — text that’s full of entities.  His writing provides a negative example of dense writing.  But not all dense content is necessarily hard to understand.

If dense content can be difficult to understand, is light content a better option?  Should entities be mentioned sparingly?

Light content is favored by champions of readability.  Writing should be simple and easy to read, and readability advocates can devised formulas to measure how readable a text is.  Texts are scored according to different criteria that are believed to influence readability:

  1. Sentence length
  2. Syllables per sentence.
  3. Ratio of commonly used words as a portion of the entire text.

All these metrics favor the use of short, simple words, and tend to penalize extensive reference to entities, which can be unfamiliar and longer words.

So if readability scores are maximized, does understanding improve?  Not necessarily.  Highly readable content, at least as scored according to these metrics, may in fact be vague content that’s full of generalities and lacking concrete examples.  The concept of readability confuses syntactical issues (the formation of sentences) with semantic ones (the meaning of sentences).  Ease of reading is only partly correlated with depth of understanding.

The empty mind versus of the knowing  mind

One of the limitations of readability as an approach is that it doesn’t consider the reader’s prior knowledge of a topic.  It assumes the reader has an empty mind about the topic, and so nothing should be in doubt as to meaning.  Readability incorporates a generic idea of education level, but it is silent about what different people know already.  For example, my annoyance at the jazz criticism book may a sign that I wasn’t the target audience for the book: I over-estimated my knowledge, and have blamed the author for making me feel unknowledgeable.  Indeed, some readers are enthusiastic about the dense detail in the book.  I, however, wanted more background about these details if they were considered important to mention.  

One way to extend the concept of readability to incorporate understanding is to measure the use of entities in writing.  I would suggest two concepts:

  1. Entity density
  2. Entity novelty

Entity density refers to how many different entities are mentioned in the text.  Some text will be more dense with entities compared with other text.  Entity density could measure entities per sentence, or total entities mentioned in an article.  Computers can already recognize entities in text, so an application could easily calculate the number of entities in the article, and the average per sentence.  

Example of computer recognition of entities in text.

Entity novelty takes the idea a step further.  It asks: how many new entities are introduced in the text for the reader?  For example, I’ve been discussing an entity called “readability.”  I am assuming the reader has an idea what I am referring to.  If not, readability would be a novel entity for the reader.  It is more difficult to calculate the number of unknown entities within a text.  Perhaps reading apps could track if the entity has been frequently encountered previously.  If it was, then it could assume it was no longer novel.

The idea behind these metrics is highlight how entities can be either helpful or distracting.  The text could have many entities and be helpful to the reader, if the reader was already familiar with the entities.  The text can include unfamiliar entities, provided there aren’t too many.  But if the text has too many entities that are novel for the reader, both readability and understanding may suffer.

Scanning and entities

Another dimension that readability metrics miss is the scan-ability of text.  The assumption of readability is that the entire text will be read.  In practice, many readers choose what parts of the text to read based on interests and relevance.  The mention of entities in text can influence how easily readers can find text of interest.  Readers may be looking for indications that the text contains material that they:

  • Already know
  • Are not interested in
  • Know they are interested in
  • Find unfamiliar but are curious about.

Instead of considering text from the perspective of the “empty mind,” scan-ability considers text from the perspective the “knowing mind.”  Readers often search for concrete words in text, especially capitalized proper nouns.  Vague, generic text is hard to scan.  

Imagine a reader wants to know about Japan’s banking system.  What entity would they look for?  That will depend partly on their existing knowledge.  If they want to know who is in charge of banking in Japan, they will look for mentions of specific entities.  Perhaps they know the name of the person and will search for that name.  Or they may not know the name of the person, but have an idea of their formal title so they will look for a mention of the words “Japan,” “Bank,” and “Governor.”    If they don’t know the formal title, they might look for mentions of a person’s role, such as “head of the central bank.”  In text, all these entities (name, title, and role) could appear in a paragraph on the topic.   All aid in the scanning of text.

Entities can help readers find information another way as well.  Entities can be described with metadata, which makes the information much easier to find online when searching and browsing.  When computers describe entities, they can keep track of different terms used to describe them, so that readers can find what they need whether or not they know about the topic already.  Metadata can connect different aspects of an entity, so that people can search for a name, a title, or a role and be taken to the same information.

— Michael Andrews

The post Reading, Writing, and Entities appeared first on Story Needle.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Who supports your work? If you work in a non-profit or a university, that’s an important question. These organizations depend on the generosity of others. They should want the world know who is making what they do possible. Fortunately, new standards for metadata will make that happen.

Individuals and teams who work in the non-profit and academic sectors, who either do research or deliver projects, can use online metadata to raise their profiles. Metadata can help online audiences discover information about grants relating to advancing knowledge or helping others. The metadata can reveal who is making grants, who is getting them, and what the grants cover.

Grants Metadata

A new set of metadata terms is pending in the schema.org vocabulary relating to grants and funding. The terms can help individuals and organizations understand the funding associated with research and other kinds of goal-focused projects conducted by academics and non-profits. The funded item (property: fundedItem) could be anything. While it will often be research (a study or a book), or it could be delivery of a service such as training, curriculum development, environmental or historical restoration, inoculations, or conferences and festivals. There is no restriction on what kind of project or activity can be indicated.

The schema.org vocabulary is the most commonly used metadata standard for online information, and is used in Google search results, among other online platforms. So the release of new metadata terms in schema.org can have big implications for how people discover and assess information online.

A quick peek at the code will show how it works. Even if you aren’t familiar with what metadata code looks like, it is easy to understand. This example, from the schema.org website, shows that Caroline B Turner receives funding from the National Science Foundation (grant number 1448821). Congratulations, Dr. Turner! How cool is that?

  1. <script type=“application/ld+json”>
  2. {
  3.   “@context”: “http://schema.org”,
  4.   @type“: “Person“,
  5.   “name”: “Turner, Caroline B.”,
  6.   “givenName”: “Caroline B.”,
  7.   “familyName”: “Turner”,
  8.   “funding”: {
  9.      “@type”: “Grant”,
  10.      “identifier”: “1448821”
  11.      “funder”: {
  12.        “@type”: “Organization”,
  13.        “name”: National Science Foundation“,
  14.        “identifier”: “https://doi.org/10.13039/100000001”
  15.      }
  16.    }
  17. }
  18. </script>

The new metadata anticipates diverse scenarios. Funders can give grants to projects, organizations, or individuals. Grants can be monetary, or in-kind. These elements can be combined with other schema.org vocabulary properties to provide information about how much money went to different people and organizations, and what projects they went to.

Showing Appreciation

The first reason to let others know who supports you is to show appreciation. Organizations should want to use the metadata to give recognition to the funder, and encourage their continued future support.

The grants metadata helps people discover what kinds of organizations fund your work. Having funding can bring prestige to an organization. Many organizations are proud to let others know that their work was sponsored by a highly competitive grant. That can bring credibility to their work. As long as the funding organization enjoys a good reputation for being impartial and supporting high quality research, noting the funding organization is a big benefit to both the funder and the grant receiver. Who would want to hide the fact that they received a grant from the MacArthur Foundation, after all?

Appreciation can be expressed for in-kind grants as well. An organization can indicate that a local restaurant is a conference sponsor supplying the coffee and food.

Providing Transparency

The second reason to let others know who supports your work is to provide transparency. For some non-profits, the funding sources are opaque. In this age of widespread distrust, some readers may speculate about the motivations an organization if information about their finances is missing. The existence of dark money and anonymous donors fuels such distrust. A lack of transparency can spark speculations that might not be accurate. Such speculation can be reduced by disclosing the funder of any grants received.

While the funding source alone doesn’t indicate if the data is accurate, it can help others understand the provenience of the data. Corporations may have a self-interest in the results of research, and some foundations may have an explicit mission that could influence the kinds of research outcomes they are willing to sponsor. As foundations move away from unrestricted grants and toward impact investing, providing details about who sponsors your work can help others understand why you are doing specific kinds of projects.

Transparency about funding reduces uncertainty about conflicts of interest. There’s certainly nothing wrong with an organization funding research they hope will result in a certain conclusion. Pharmaceutical companies understandably hope that the new drugs they are developing will show promise in trials. They rely on third-parties to provide an independent review of a topic. Showing the funding relationship is central to convincing readers that the review is truly independent. If a funding relationship is not disclosed but is hidden, readers will doubt the independence of the researcher, and question the credibility of the results.

It’s common practice for researchers to acknowledge any potential conflict of interest, such as having received money from a source that has a vested interested in what is being reported. The principle of transparency applies not only to doctors reporting on medical research, but also to less formal research. Investment research often indicates if the writer has any ownership of stocks he or she is talking about. And news outlets increasingly note when reporting on a company if that company directly or indirectly owns the outlet. When writing about Amazon, The Washington Post will note “Bezos also owns The Washington Post.”

If the writer presents even the appearance that their judgment was influenced by a financial relationship, they should disclose that relationship to readers. Transparency is an expectation of readers, even though publishers are uneven in their application of transparency.

Right now, transparency is hard for readers to crack. Better metadata could help.

Current Problems with Funding Transparency

Transparency matters for any issue that’s subject to debate or verification, or open to interpretation. One such issue I’m familiar with is antitrust — whether certain firms have too much (monopoly) market power. It’s an issue that has been gaining interest across the globe by people holding different political persuasions, but it’s an issue where there is a range of views and cited evidence. Even if you are not be interested in this specific issue, the example of content relating to antitrust illustrates why greater transparency through metadata can be helpful.

A couple of blocks from my home in the Washington DC area is an institution that’s deeply involved in the antitrust policy debate: the Antonin Scalia Law School at George Mason University (GMU), a state-funded university that I financially support as a taxpayer. GMU is perhaps best-known for the pro-market, anti-regulation views of its law and economics faulty. It is the academic home of New York Times columnist Tyler Cowen, and has produced a lot of research and position papers on issues such as copyright, data privacy, and antitrust issues. Last month GMU hosted public hearings for the US Federal Trade Commission (FTC) on the future of antitrust policy.

Earlier this year, GMU faced a transparency controversy. As a state-funded university, it was subject to a Freedom of Information Act (FOIA) request about funding grants it receives. The request revealed that the Charles Koch Foundation had provided an “estimated $50 million” in grants to George Mason University to support their law and economic programs, according to the New York Times. Normally, generosity of that scale would be acknowledged by naming a building after the donor. But in this case the scale of donations only came to light after the FOIA request. Some of this funding entailed conditions that could be seen as compromising the independence of the researchers using the funds.

The New York Times noted that the FOIA also revealed a another huge gift to GMU: “executives of the Federalist Society, a conservative national organization of lawyers, served as agents for a $20 million gift from an anonymous donor.” What’s at issue is not whether political advocacy groups are entitled to provide grants, or whether or not the funded research is valid. What’s problematic is that research funding was not transparent.

Right now, it is difficult for citizens to “follow the money” when it comes to corporate-sponsored research on public policy issues such as the future of antitrust. Corporations are willing to provide funding for research that is sympathetic to their positions, but may not want to draw attention to their funding.

In the US, the EU, and elsewhere, elected officials and government regulators have discussed the possibility of bringing new antitrust investigations against Google. For many years, Google has funded research countering arguments that it should be subject to antitrust regulation. But Google has faced its own controversies about its funding transparency, according to a report from the Google Transparency Project, part of the Campaign for Accountability, which describes itself as “a 501(c)(3) non-profit, nonpartisan watchdog organization.” The report “Google Academics” asserts: “Eric Schmidt, then Google’s chief executive, cited a Google-funded author in written answers to Congress to back his contention that his company wasn’t a monopoly. He didn’t mention Google had paid for the paper.”

Google champions the use of metadata, especially the schema.org vocabulary. As Wikipedia notes, “Google’s mission statement is ‘to organize the world’s information and make it universally accessible and useful.’” I like Google for doing that, and hold them to a high standard for transparency precisely because their mission is making information accessible.

Google provides hundreds research grants to academics and others. How easy it is to know who Google funds? The Google Transparency Project tried to find out who Google funds by using Google Scholar, Google’s online search engine for academic papers. There was no direct way for them to search by funding source.

Searching for grants information without the benefit of metadata is very difficult. Source: Google Transparency Project, “Google Academics” report

They needed to search for phrases such as “grateful to Google.” That’s far short of making information accessible and useful. The funded researchers could express their appreciation more effectively by using metadata to indicate grants funding.

Google Transparency Project produced another report on the antitrust policy hearings that the FTC sponsored at GMU last month. The report, entitled “FTC Tech Hearings Heavily Feature Google-funded Speakers” concludes:“A third of speakers have financial ties to Google, either directly or through their employer. The FTC has not disclosed those ties to attendees.” Many of the speakers Google funded were current or former faculty of GMU, according to the report.

I leave it to the reader to decide if the characterizations of the Google Transparency Project are fair and accurate. Assessing their report requires looking at footnotes and checking original sources. How much easier it would be if all the relevant information were captured in metadata, instead of scattered around in text documents.

Right now it is difficult to use Google Scholar to find out what academic research was funded by any specific company or foundation. I can only hope that funders of research, Google included, will encourage those who receive their grants to reveal that sponsorship within the metadata relating to the research. And that recipients will add funding metadata to their online profiles.

The Future of Grants & Funding Metadata

How might the general public benefit from metadata on grants funding? Individuals may want to know what projects or people a funder supports. They want to see how funding sources have changed over time for an organization.

These questions could be answered by a service such as Google, Bing, or Wolfram Alpha. More skilled users could even design their own query of the metadata by using a SPARQL query (SPARQL is query language for semantic metadata). No doubt many journalists, grants-receiving organizations, and academics will find this information valuable.

Imagine if researchers at taxpayer-supported institutions such as GMU were required to indicate their funding sources within metadata. Or if independent non-profits made it a condition of receiving funding that they indicate the source within metadata. Imagine if the public expected full transparency about funding sources as the norm, rather than as something optional to disclose.

How You can get Involved

If you make or receive grants, you can start using the pending Grants metadata now in anticipation of its formal release. Metadata allows an individual to write information once, and reuse it often. When metadata is used to indicate funding, organizations have less worry about forgetting to mention a relationship in a specific context. The information about the relationship is discoverable online.

Note that the specifics of the grants proposal could change when it is released, though I expect they would most likely be tweaks rather than drastic revisions. Some specific details of the proposal will most interest research scientists who are concerns with research productivity and impact metrics that are of less interest to researchers working in public policy and other areas. While the grants proposal has been under discussion for several years now, the momentum for final release is building and it will hopefully be finalized before long. Many researchers plan to use the newly-released metadata terms for datasets, and want including funder information as part of their dataset metadata. (Sharing research data is often a condition of research grants, so it makes sense to add funding sponsorship to the datasets.)

If you have suggestions or concerns about the proposal, you can contribute your feedback to the schema.org community GitHub issue (no 383) for grants. Schema.org is a W3C community, and is open to contributions from anyone.

— Michael Andrews

The post Metadata for Appreciation and Transparency appeared first on Story Needle.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Content marketing rests on a simple premise: Great content will attract interest from readers.  It sounds simple — but the ingredients of great content, on closer inspection, seem ineffable.  We can come up with any number of criteria necessary for great content.  But satisfying these criteria won’t necessarily result in lots of people finding your content, and using it.  It is possible to have great writing about useful topics that is promoted diligently, and still find that the content fails to generate expected interest.  Hard work alone doesn’t explain outcomes.

How then does content marketing rank highly? I’m not an SEO, so I’m not going to offer SEO advice here.  I’m using the SEO term “ranking” in a more general sense of gaining visibility based on audience expressions of interest.  It may be  ranking in SERPs, or in social media shares, or bookmarks, or another metric that indicates how people vote for what content they find most useful.  The key to the ranking question is to think about online content as a market, where there are buyers and sellers.  Unfortunately, it is not a simple market, where there is a perfect match for everyone.  Some sellers never find buyers, and some buyers never find the right seller either, and have to settle for something less than optimal.  Online content is sometimes efficient, but very often is prone to market failure.

Navigating through the Content Glut

Like many other people who work with online content, I believe we face a content glut.  There’s too much content online. Too much content is ignored by audiences.   Many organizations consider it acceptable to create content that only receives one or two hundred views.  A shocking amount of content that’s created is never viewed at all!  It would be easy to dismiss all this content as low quality content, but that would not capture the full story.  It’s more accurate to say that this content doesn’t match the needs of audiences.  Not all content needs to generate high numbers of views — if it is intended for a narrow, specific audience.  But most content that’s created has a potential audience that’s far larger than the actual audience it attracts.  It gets lost in the content glut.

To understand how audiences select content, it helps to consider content as being traded in one of two different markets.  One market involves people who all have the same opinion about what is great.  The other market involves people who have different ideas about what is great.  It’s vitally important not confuse which group you are writing for, and hoping to attract.

More formally, I will describe these two markets as a “winner-takes-all” market, and as an “auction” market.  I’m borrowing (and repurposing) these terms from Cal Newport, a Georgetown computer science professor who wrote a career advice book called “So Good They Can’t Ignore You”. His distinction between winner-takes-all verses auction markets is very relevant to how online content is accessed, and valued, by audiences.

Winner-Takes-All Markets

When a large audience segment all want the same thing — applying the same standards —  it can create a race to determine who provides the best offering.  It gives rise of a winners-take-all market.  

Let’s illustrate the concept with a non-content example. Sport stars are a classic winner-takes-all market.  Fans like players who score exceptionally, so the player who scores most generally win the most fans.  The top players make much more money than those who are just short of being as good as them.  Fans only want so many stars.

Many content topics have a homogenous user preference profile.  Nearly everyone seeking health information wants up-to-date, accurate, comprehensive, authoritative information.  The US National Institutes of Health is the gold standard for that kind of information.  Other online publishers, such as the Mayo Clinic or WebMD, are being judged in comparison to the NIH.  They may be able to provide slightly friendlier information, or present emerging advice that isn’t yet orthodox.   But they need to have thoroughness and credibility to compete.  Lesser known sources of health information will be at a disadvantage.  Health information is a winner-takes-all market.  The best-regarded sources get the lion’s share of views.  Breaking into the field is difficult for newly established brands.  When everyone wants the same kind of information, and all the content is trying to supply the same kind of information, only the best content will preferred.  Why settle for second best?

 How do you know when a topic is a winners-takes-all market? A strong signal is when all content about the topic, no matter by whom it is published, has the same basic information, and often even sounds the same.  It is hard to be different under such circumstances, and to rank more highly than others.

Another example of a winner-takes-all market for content is SEO advice.  If you want to learn about (say) the latest changes Google announced last month, you will find hundreds of blog posts by different local SEO agencies, all of which will have the same information.  Only a few sources will rank highly, such as Moz or Search Engine Land.  The rest will be add to the content glut.

It is extremely hard to win the game of becoming the most authoritative source of information about a topic that is widely covered and has a uniformity of views.  Generally, the first-movers in such a topic gain a sustained advantage, as they develop a reputation of being the go-to source of information.  

There are a couple of tactics sellers of content use in winner-takes-all markets.  The first is to set-up a franchise, so that the publisher develops a network of contributors to increase their scale and visibility.  This is the approach used, for example, by Moz and other large SEO websites.  Contributors get some visibility, and build some reputation, but may not develop solid brand recognition.

The second tactic, advocated by some content marketing specialists, is to develop “pillar” content.  The goal of this tactic is to build up a massive amount of content about a topic, so that no one else has the opportunity to say something that you haven’t already addressed.  You can think of this approach as a “build your own Wikipedia”.  Some proponents advocate articles of 5000 words or more, cross-linked to other related articles.  It’s an expensive tactic to pursue, with no guarantees.  In certain cases, pillar content might work, for a topic that is not well covered currently, and for which there is a strong demand for extremely detailed information.  But otherwise, it can be a folly.  Pillar content tactics can trigger an arms race of trying to out-publish competitors with more and longer content.  In the race to become authoritative, the publisher can lose sight of what audiences want.  Do they really want 5000 word encyclopedic articles?  Generally they don’t.

Winner-takes-all applies to a competitive (non-captive) market.  If you have a captive audience (like your email list) you can be more successful with generic topics. But you will still be competing with the leaders.

Don’t forget: the characteristic of winner-takes-all markets is that there are few winners, and many losers.  Make sure you aren’t competing with a topic you are unprepared to win with.

Auction Markets

The defining characteristic of an auction market is that different people price stuff differently.  There’s no single definition of what the best is.  People value content differently, according to what they perceive as what’s unique or special about it.

A non-content example of an auction market is hiring an interior decorator.  It’s a very diverse market: decorators serve different segments of buyers (rich, budget, urban, suburban,…), and within segments people have widely different tastes (eclectic, mid-century modern, cutting edge, traditional…).  Different decorators are “right” for different people.  But that doesn’t mean there’s no competition.  Far more decorators want to design interiors that could be featured in Architectural Digest than there are clients looking to hire such decorators.  There’s an overabundance of decorators who favor the white sofa look that gets featured in Architectural Digest.  And budget buyers may have trouble finding a budget decorator who has sophisticated taste and who can hire affordable and reliable contractors.  It’s hard to get the niche right, where buyers want what you can offer.  

 The value that audiences assign to content in auctions depends on the parameters they most care about. A broad topic that has wide interest can potentially be discussed in different ways, by tailoring topic so that it is targeted at segment, offering a unique point of view (POV), or by accommodating a specific level of prior knowledge about the topic

Many areas of content marketing are auction markets.  Some consumers are enthusiastic about learning the details of  products;  others are reluctant buyers worried about costs or reliability.   For example, home repair is a topic of fascination for a handyman. It’s a chore and headache for an exasperated homeowner dealing with an emergency.  

Auction markets rank on the basis of differentiation.  Brands make an appeal: We are right for you! Others are wrong for you! And by extension: We are wrong for people who aren’t like you!  Brands aim for what  could be called the audience-content-brand fit.  The moment a brand tries to become a multi-audience pleaser, it risks losing relevance.  It is then playing the winner-takes-all strategy.

Audience segments most value content that addresses specific needs that seem unique, and is not offered by others.  This places a premium on differentiation.  Segmentation is based on approach.  How content addresses a topic will mirror how audience segments coalesce around themes, interests or motivations.

Many marketers have trouble addressing fuzzy segments.  Groups of people may be drawn to a combination of overlapping interests, be looking for fresh points of view, and have varying levels of knowledge.   Such segments are fiendishly hard to define quantitatively.  How many people fit in each box?  It can be more productive to define the box as a idea to test, rather than as a fixed number.  Auctions discover segments; they don’t impose them.  People vote their priorities in auctions.  One can’t know what people want in an auction before it happens.  By their nature, auctions are meant to surprise us.  

Auctions are fluid.  People’s interests shift.  Their knowledge may grow, or their willingness to learn may lessen.  It is even possible for an auction market to morph into a winner-takes-all market.  Today’s hottest debates can turn into tomorrow’s best practice orthodoxy. 

Matching the Content to the Audience

Ranking is fundamentally about being relevant.  Brands must offer content that is relevant.  Yet in the end, it is audiences who judge the relevance.

Marketers will find their content lost in the content glut if they fail to understand whether the audience segment they want to reach wants content that’s unique in some way, or wants the kind of content that everyone agrees is the best available.  

Brands should aim for share of mind, not just raw numbers.  Many marketers start by thinking about hypothetical reach.  They imagine all the people who, in theory, might be interested in the topic abstractly, and then work to improve their yield.  They create content they think a million people might want to read, without testing whether such an assumption is realistic.  They then try to improve on the minuscule portion of people viewing the content. That approach rarely builds a sustained audience. 

 It’s better to garner 20% of a potential audience of 1000 people, than 1% of a segment of 20,000 people, even if the raw numbers are the same (200 views).   A well-defined segment is essential to figure out how to improve what you offer them.  If everyone want exactly the same thing, then knowing what people want is that much easier.  But being the best when delivering to them is that much harder,

— Michael Andrews

The post Ranking in Content Marketing appeared first on Story Needle.

Read for later

Articles marked as Favorite are saved for later viewing.
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview