In this post, I will compare Alexa and Mycroft. Originally I created this comparison as part of a summer school that I helped to organize. The goal for the teams was to create a voice assistant for the medical sector to help patients and/or doctors. However (as it is with all major cloud-based voice assistants) privacy concerns arise.
Especially in Europe, transferring sensitive data like personal information about real people to servers outside the continent pose a real challenge under the current DSGVO-law, which explicitly introduces barriers to prevent this.
A solution we looked at is Mycroft, an open-source voice assistant that can process information locally, or at least they claim that they can. In this comparison we will look at how these systems work, and how difficult it is to develop a Skill for it.
Alexa
First, we will take a look at Alexa by Amazon. This side is quite well documented online already, so I will only describe it briefly.
How Alexa Works
As this is a trade secret of Amazon, not much detail can be found. However, we will look at the structure of the interaction. If you start a Skill on Alexa, Amazons own voice recognition (Speech to Text - STT) and synthetic voice (Text to Speech - TTS) is used. The logic behind the answers and the interactions is stored mostly on Amazon’s Lambda servers. You CAN host it yourself, however, most developers do not want to pay for the traffic of strangers, especially if your Skill becomes popular. This is why Amazon provides the Lambda functions for “free” if you host an Alexa Skill. The only downside is that it requires a credit card to create an account.
As you can see in the image above, you only have a single real interaction point with the whole ecosystem. Your code goes into the Lambda server, you connect it with your Alexa Skill, and that’s it. The rest is handled by Amazon.
How Do You Create an Alexa Skill?
As stated above, I will only give a short overview as it is well documented online already. First, you will need an Amazon Developer Account. There, you can create a new Skill, which requires a Invocation Name. This is the phrase that Amazon will use to start your Skill, which is like an App on Alexa. You have different ways to interact with the end user, however, most likely they will “enter” your app, if any form of dialog is used. For example, if your invocation name is ‘hospital app’, users can ask Alexa “Hey Alexa, start hospital app” and Alexa will start your Skill.
Next, there are different Intents, basically things you can do with your Skill. An example could be “What is the age of patient X?“. As you can see, we need a variable, in this case the name of a patient. For Alexa, these types of variables are called Slots. These can be defined in the Amazon developer dashboard, and use different kinds of preset types, like ‘names’ or ‘numbers’.
As a Result, you will get a language model, that hast to be interacted with. You will find this model as JSON in your Skill Dashboard. Sounds nice, however, you do not have any code yet.
To kickstart your code, a tool created by Amazon might help.
If you visit alexa.design/codegenerator, you can take your created JSON and paste it into the left box. After clicking “Generate Code”, you will get JavaScript-Code for a Node project.
Request Handler
The next code example shows a simple handler for an intent. This HelpIntent_Handler
is a special handler, as it is required for every skill. As you can see, the function handle
has access to all Slots, as well as a responseBuilder
. This is used to answer the user if they ask for help.
const AMAZON_HelpIntent_Handler = {
canHandle(handlerInput) {
const request = handlerInput.requestEnvelope.request;
return request.type === 'IntentRequest' && request.intent.name === 'AMAZON.HelpIntent';
},
handle(handlerInput) {
const request = handlerInput.requestEnvelope.request;
const responseBuilder = handlerInput.responseBuilder;
let sessionAttributes = handlerInput.attributesManager.getSessionAttributes();
let say = `You can ask me "What age is patient X?" or "Name all patients".`;
return responseBuilder
.speak('Okay, I will try to help you. ' + say)
.reprompt('Try again. ' + say)
.getResponse();
},
};
Connecting Both Parts
If you uploaded your code to Amazon Lambda as an Amazon Alexa Skill, you will get a Skill ID. This has to be inserted into the Amazon Developer Skill Dasboard, under ‘Endpoint’. If this all was successful, you can try it out with a virtual Alexa in the Dashboard. If this is not working for you, you might also need a real Alexa device.
Mycroft
Mycroft is fundamentally different. As an Open-Source alternative with On-Device-Logic, the development is a magnitude easier. However, it is far from perfect. First of all, the provided image for Raspberry Pi can break at any moment, or stop working, or crash. If you try to use a USB-Headset to interact with it, then good luck to you, because working with the installed drivers and forcing the inputs/outputs into a USB device in a terminal without proper documentation was horrible. However, after the summer school, Mycroft AI announced their second device, Mycroft Mark II, which you still cannot buy in 2021, but might improve the development experience in the future.
Getting started with Mycroft
That said, if you get it running, the development process is way easier than creating an Alexa Skill. First of all, you will develop everything in Python. Nice!
Secondly, all Skills are just folders on the device, in /opt/Mycroft/Skills
. If you want to create a new Skill, you only have to create a new Folder called mySkil.mycroftai
. In there, you have to create a tree of files, as shown here:
mySkill.mycroftai
vocab
en-us
patient.voc (For invocations like 'Hey Mycroft, name all patients.')
patient.intent (For invocations like 'Hey Mycroft, what is the age of patient X?')
dialog
en-us
answer.dialog (For answers like 'The patient is Y years old')
__init__.py (For the logic)
And that is everything necessary to set up a new Skill! It will even hot reload if you change some code.
Utterances & Intents
Mycroft has two ways of interaction. First of all, you won’t “enter” a Skill, because that is not a thing with Mycroft. You can hop between Skills at any moment. This can also be a disaster, like when Mycroft updates its own weather skill to recognize basically everything as a request. Which also happened at the summer school. However, as it is all locally, you just have to modify the other skill or delete it and it is all good!
An Utterance is basically an invocation without a slot. For example: ‘Hey Mycroft, name all patients.’ is one of those. An Invocation with a Slot is called a Intent, and here the fun really begins. As I mentioned earlier, the weather skill matched everything. How, you ask? Well, as it is open source, the matching is done with regular expressions, without limitations.
For example, the skill mentioned above would have a file patient.intent
in the vocab file, that includes What is the age of patient {name}?
The Code
Your __init__.py
might then look something like this:
from adapt.intent import IntentBuilder
from mycroft.skills.core import MycroftSkill, intent_file_handler
class PatientSkill(MycroftSkill):
def __init__(self):
super(PatientSkill, self).__init__(name="PatientSkill")
@intent_file_handler("patient.intent")
def handle_patient_intent(self, message):
name = message.data.get('name')
#look up age
#...
#
painlevel = self.get_user_response("pain.level")
summary = "Patient " + name + " is " + age + ", years old."
self.speak(summary)
def create_skill():
return PatientSkill()
Well, this is extremely easy, isn’t it? And that is the whole point. Because there is no overhead from hosting all the code on a server, development is significantly sped up.
Comparison of both systems
To compare both systems, I will show positive and negative sides of both systems.
Alexa
Pros:
- Incredibly robust interaction system
- Excellent voice recognition
Cons:
- All your data is transferred to amazon servers, including your code
- Complicated setup
- Bad documentation by Amazon
Mycroft
Pros:
- Open Source
- Good voice recognition
Cons:
- Slow processing and recognition
- Is in early development
- Bad documentation
As you can see, both systems are not perfect. However, I hope you could see how easy it is to develop a Mycroft Skill. Especially in a privacy, sensitive environment, a local-only processing might be worth it. If you like to try it out, then you can use my code from by Github:
github.com/lucasvog/mycroft-medical-example
Thanks for reading!