Top 5 Tips for Building Just Eat on Amazon’s Echo Show

Hi, I’m Andy May – Senior Engineer in Just Eat’s Product Research team. I’m going to take you through some top tips for porting your existing Alexa voice-only skill to Amazon’s new Echo Show device, pointing out some of the main challenges we encountered and solved.
pasted-image-0-3

Since we started work on the Just Eat Alexa skill back in 2016, we’ve seen the adoption to voice interfaces explode in popularity. Amazon’s relentless release schedule for Alexa-based devices has fueled this, but the improvements in the foundational tech (AI, deep learning, speech models, cloud computing) coupled with the vibrant third-party skill community look set to establish Alexa as arguably the leader in voice apps.
From an engineering perspective adapting our existing code base to support the new Echo Show was incredibly easy. But, as with any new platform, simply porting an existing experience across doesn’t do the capabilities of the new platform justice. I worked incredibly closely with my partner-in-crime Principle Designer Craig Pugsley to take advantage of what now became possible with a screen and touch input. In fact, Craig’s written some top tips about exactly that just over here…
In order to add a Show screen to your voice response you simply extend the JSON response to include markup that describes the template you want to render on the device. The new template object (Display.RenderTemplate) is added to a directives Array in the response.

{
…
“directives”: [
  {
  “type”: “Display.RenderTemplates”,
  …
  }
]
}

For more details on the Alexa response object visit https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interface-reference#response-body-syntax
Sounds simple, doesn’t it? Well, it’s not rocket science, but it does have a few significant challenges that I wished someone had told me about before I started on this adventure. Here are five tips to help you successfully port your voice skill to voice-and-screen.

1. You need to handle device-targeting logic

The first and main gotcha we found was that you cannot send a response including a template to a standard Echo or Dot device. We incorrectly assumed a device that does not support screens would simply ignore the additional objects in the response.
Our own Conversation Class that all Alexa requests and responses go though is built on top of the Alea Node SDK. The SDK did not exist when we first launched our Skill. We added a quick helper method from the Alexa Cook Book (https://github.com/alexa/alexa-cookbook/blob/master/display-directive/listTemplate/index.js#L589) to check if we are dealing with an Echo Show or voice only device.

get supportsDisplay(): boolean {
   const hasDisplay =
     this.alexa.event.context &&
     this.alexa.event.context.System &&
     this.alexa.event.context.System.device &&
     this.alexa.event.context.System.device.supportedInterfaces &&
     this.alexa.event.context.System.device.supportedInterfaces.Display;
   return hasDisplay;
 }

This method is called before we return our response to ensure we only send RenderTemplates to devices that support them.
Finally we extended our Response Class to accept the new template objects and include them in the response sent to Alexa. The result visual screens are displayed on the Echo Show alongside the spoken voice response.

2. Don’t fight the display templates

There are currently 6 templates provided to display information on the Echo Show. We decided to create one file this means the markup and structure is only declared once. We then pass the data we need to populate the template. Object destructuring, string literals alongside array.map and array.reduce make generating templates easy. We use Crypto to generic a unique token for every template we return.
4-tweakorder
Image of list – mapping basket to template listItems.
Image of basket list – reducing basket to single string.

justEatCuisineList(data) {
const { cuisineData, title } = data;
const cuisineItems = cuisineData.map(x => ({
  Token: “...”
  Image: {...}
  textContent: {...}
 }) )
// template object
return {
…
listItems: cuisineItems
...
}
}

Markup is limited to basic HTML tags including line breaks, bold, italic, font size, inline images, and action links. Action Links are really interesting but the default blue styling meant we have so far had to avoid using them.
Many of the templates that support images take an array of image objects however just the first image object is used. We experimented providing more than one image to provide a fallback image or randomise the image displayed. The lack of fallback images means that we need to make a request to our S3 bucket to validate the image exists before including in the template.
Don’t try to hack these templates to get them to do things that weren’t designed for. Each template’s capabilities have been consciously limited by Amazon to give users a consistent user experience. Spend your time gently stroking your friendly designer and telling them they’re in a new world now. Set their expectations around the layouts, markup and list objects that are available. Encourage them to read Craig’s post.

3. Take advantage of touch input alongside voice

The Echo Show offers some great new functionality to improve user experience and make some interactions easier. Users can now make selections and trigger intents but touching the screen or saying the list item number “select number 2”.
It is your job to implement capture touch and voice selection. When a user selects a list item you code will receive a new request object of type Display.ElementSelected.
The token attribute you specify when creating the list is passed back in this new request object:

"request": {
    "type": "Display.ElementSelected",
    "requestId": "amzn1.echo-api.request.xxx",
    "timestamp": "2017-06-06T20:05:04Z",
    "locale": "en-UK",
    "token": "Indian"
  }

In the above example we receive the value ‘Indian’ and can treat this in the same way we would the cuisine slot value. Our state management code knows to wait for the cuisine intent with slot value or Display.ElementSelected request.
Finally we create a new Intent, utterances an a slot to handle number selection. If our new Intent is triggered with a valid number we simply match the cuisine value from the cuisine array in state with a index offset.
Find out more about touch and voice selection – https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/display-interface-reference#touch-selection-events

4. Adapt your response based on device

The Echo Show provides lots of opportunities and features. In one part of our Skill we decided to change the flow and responses based on the device type.
When we offer users the opportunity to add popular dishes it made sense for us to shorten the flow as we can add use the screen in addition to the voice response.
We use the same supportsDisplay method to change the flow of our skill.

if(this.converstaion.supportsDisplay) {
 // Echo Show - skip to listing popular dishes
} else {
 // Standard device - prompt user to see if they want to add a popular dish
}

We use the same logic when displaying the list of popular dishes. Based on Amazon recommendations if the device supports display we don’t read out all the dishes.
You can find out more about our thoughts designing user experience for the Echo Show here.

5. The back button doesn’t work

The back button caused us some problems. When a user touches the back button the Echo Show will display the previous template. Unfortunately no callback is sent back to your code. This creates huge state management problem for us.
For example a user can get the checkout stage at this point our state engine expects only a 2 intents Pay Now or Change Something (exc back, cancel and stop). If a Echo Show user touched back the template would now show our Allergy prompt. The state engine does not know this change has taken place so we could not process the users Yes/No intents to move on from allergy as think the user is still on the checkout stage.
Just to add to this problem the user can actually click back through multiple templates. Thankfully you can disable the back button in the template response object:

backButton: 'HIDDEN'

To find out more about the Just Eat Alexa Skill visit https://www.just-eat.co.uk/alexa
For more information visit on developing Alexa Display Interface visit https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/display-interface-reference

1. You need to handle device-targeting logic

2. Don’t fight the display templates

3. Take advantage of touch input alongside voice

4. Adapt your response based on device

5. The back button doesn’t work

Related Articles

Top 10 Voice Design Tips for the Amazon Echo Show

From an IT consultancy to a food tech giant: what I've learned

Bootstrapping UX graduates