The arrival of good internet and easy availability, inexpensive devices had led everyone onto the internet. The average screen time for every user had shot up. Clinical studies for the effect on eyes have been alarming, to say the least.
In this context, the latest innovation from the technical world has been voice-activated gadgets and AI Helpers. Voice-based AI helpers like Siri, Alexa, and Google Assistant have become almost ubiquitous in our lives now and often offer two-way conversations.
Virtual assistants usually aim to make your everyday tasks faster and easier. This is primarily done by Alexa Skill Development or “Actions” for Google. Voice skills are apps that allow the assistant to interface between the hardware and software components.
In smart homes, this could involve picking up your voice command and using it to control your home temperature. It could also be something as simple as turning your speakers on for you. Or it could involve using the voice recognition software to interface with other software like Spotify or even the internet to look for answers.
Also Read: Amazon launches battery-powered Echo Input in India
The Alexa Skill Kit (ASK) allows you to build and develop these functionalities for your own custom experience. Amazon’s vast collection of self-service APIs, tools, documentation, and code samples also makes this task very easy. There are Alexa Skill Development Companies that create such skills but individual skill developers can do so too. One may want to create a skill to answer queries from the internet. One may want to ask Alexa to place a food order for you. Or the custom skill could be as complex as an interactive multiplayer game.
How does one build a voice skill?
To begin with, we must understand the basic pipeline of a simple query to Alexa or Assistant. When a user speaks a command, the software breaks up the audio into blocks. Then the software uses Speech to Text to convert the audio blocks into a series of requests.
These requests are processed in the cloud, interfaced with the required software/hardwares and an appropriate response is generated. The response generated is returned to the user by using Text to Speech which allows Alexa/Assistant to speak to you. You can break down the process of making a skill into 4 steps.
1. Designing your Skill
- When designing your skill, it is recommended you try to plan what your skill is going to do. What is the goal of your skill? Can the same information/task be done from a website and may not need voice support?
- What information needs to be collected from the user to process the task?
- You will also want to decide what features will enhance user experience. Will the skill support in-skill purchases? Will it be interfacing with some hardware? What other features can be added?
- You will have to design a voice-user interface. Write a script to map out how users interact with a skill and convert it into a “Storyboard”. How does Alexa respond when the skill is invoked for the first time? How will Alexa respond when it has enough information to perform a task? If it has not collected enough information, how can it ask for more? You will have to add variations to your script for better user experience and build a storyboard for each. As an example, there could be multiple welcome or good bye messages. You will also have to choose an invocation name for the skill that the user will speak to use the skill.
- Lastly, you might want to decide whether to publish your skill to different local or international markets. If so, will the skill need to be customised for them? How will you account for all the different languages, dialects and cultural connotations of phrases?
2. Set up the Skill in the developer console
- You will have to enter a name for your skill. This will be publicly displayed. You will also have to choose which language you will be using to code. On Amazon Web Services (AWS), node.js, java, python, C#, or go are supported and additional languages can be added.
- You will have to choose the ‘interaction model’ for your skill. The developer console offers four options. Custom interaction models will give you complete control over the skill. There are also the pre-built models for flash briefing, smart homes, and consuming and video content.
- Once configured on the console, you can actually build your skill.
3. Build the Skill
The main building task is to create a way to accept requests from the assistant and send responses back. You can do it on the Amazon Web Services (AWS), using lambda functions to run your code on a cloud without managing servers. You can also host your skill on any cloud provider by building a web service for it.
- If you are using pre-built models, you may need to perform account linking for some models. You may need to enable permissions to user information and consent. Other features like location services, reminders and messaging may also have to be enabled. You will also need an Amazon Resource Name (ARN) endpoint to know where to send the responses and end the skill. All these options can be found on the developer console.
- For custom skills, you will have to create an interaction model for your skill. You will have to build a collection of utterances, or phrases that you expect the user to speak to the assistant. Including subtle variations on phrases and words makes for a better user experience as not all users will invoke the skill in the same way. Each utterance can be broken down into intents and slots. Intents represent what action the user wants to do and what request Alexa can handle. Slots which represent information that Alexa needs for that action to be performed. If the user says “Alexa, plan a trip to city X on Date Y”, the intent is to plan a trip using a particular app. This may also involve booking tickets and accommodation. The slots are the application that Alexa must use to plan the trip, the city to be visited and the dates for the visit. If all information is not collected, Alexa must be prompted to ask for the missing slots.
- All these things can be built on the developer console or on the ASK command-line interface. For custom skills, the endpoint must be identified, any interfacing with external softwares or hardwares must be done and utterance ambiguity must be removed.
4. Test your skill
- You can use the utterance profiler to test how these are broken into intents and slots. Any ambiguity in utterances can be removed and a larger sample utterances data bank can be identified.
- The test page on the console will allow you to test your Alexa SKill Kit features without having a device, either though voice or text. There are different benchmarks for each model before a test can be successfully conducted.
- You also can test your skill on any device that has Alexa enabled.
- There also command line commands (invoke skill or simulate skill) that can be used for testing.
- You can also beta test your skill by making it available to a select group of users. This is optional
This is a brief overview of the skill building process. Detailed steps, breakdowns of the interaction models, and how to maximize user experience can be found on the development page of amazon. You can get a skill published and certified if it meets the guidelines on the certification page of the development console.
This will allow the general users to use it. Meanwhile, you can continue to modify it and make upgrades to your skill. The process of building custom skills for Alexa or Google Assistant is a multi-step process that can be learned with a little bit of effort. Excellent resources exist for the same. Go get started and build your custom experience today.
More to read:
- How to increase battery life on Android phones
- Best Android Tablets to Buy in 2020
- How to stream PUBG Mobile on YouTube