During a recently concluded 12-month study of the Alexa Skills Store review process, academics said they managed to smuggle 234 policy-breaking Alexa skills (apps) into the official Alexa store.
The study's results are actually worse than it looks because academics tried to upload 234 policy-breaking apps, and managed to get them all approved, without serious difficulties.
"Surprisingly, we successfully certified 193 skills on their first submission," the research team wrote this week on a website detailing their findings.
The research team said that 41 Alexa skills were rejected during the first submission, but they eventually got them on the official store after a second try.
The purpose of this peculiar research project was to test Amazon's skills review process for the Alexa Skills Store, the web portal where users go to install apps for their Alexa device.
Over the past few years, prior academic work [1, 2, 3, 4] revealed that research teams had no difficulties in uploading malicious Alexa skills on the official store, which they used to test their experiments.
With each project, researchers warned Amazon that the skills review process was insufficient, Amazon promised to do better, and then new research would come out months later, showing that researchers were still able to upload malicious skills regardless of Amazon's promises.
Placing policy-breaking skills in the kids category
During this experiment, the research team put together an ensemble of 234 Alexa skills that violated basic Amazon policies.
These were apps that weren't overtly malicious, but merely provided prohibited information to user questions, or collected private information by asking Alexa users about their names and other personal details.
The research team uploaded the apps on the Alexa Skills Store and got them approved and certified for the kids section of the Alexa store, where policies should be more strictly enforced than other sections.
Example Alexa skills the research team got listed on the kids section include:
An Alexa skill that provided instructions on how to build a firearm silencer (hidden inside a kids crafts skill)
An Alexa skill recommending the usage of a recreational drug (hidden inside a kids desert facts skill)
An Alexa skill pushing advertising (hidden inside a geography facts skill)
An Alexa skill collecting children's names (hidden inside a storytelling skill)
An Alexa skill collecting health data (hidden inside a healthcare skill)
The academic team cited several reasons why they were able to publish all their policy-violating skills on the official store:
- Inconsistency in checking - Researchers said that different skills breaking the same policy received different feedback from reviewers, suggesting that reviewers weren't viewing or applying Amazon policies in the same way across submissions.
- Limited voice checking - Reviewers did limited checking of the skill's voice commands and its code. This allows threat actors to publish malicious apps on the official store just by delaying the initial malicious responses, enough to bypass the short review process.
- Overtrust placed on developers - Researchers said that Amazon seems to natively trust skill developers and will approve skills based on answers developers provide in forms submitted during the skill review process. This allowed the researchers to claim that their app didn't collect user information, something that Amazon never verified during the actual review.
- Humans are involved in certification - The research team said that based on the inconsistency in various skill certifications and rejections has led them to believe that the skill certification largely relies on manual testing, as some issues could have been detected by some automated systems.
- Negligence during certification - The review process wasn't thorough enough to detect obvious policy-breaking skills.
- Possibly outsourced and not conducted in the US - Based on skill review timestamps, some reviews appear to have been conducted by non-native English speakers or by reviewers not familiar with US laws.
Review of current kids skills
After conducting their research, the academics team removed their malicious skills, to avoid having a user accidentally stumble across it and install it on their devices.
However, the research team also wanted to know if other bad skills made it on the official Alexa Skills Store in the past. They did this by selecting 2,085 negative reviews from skills listed in the kids category, and identifying the 825 Alexa skills on which they were posted.
"Through dynamic testing of 825 skills, we identified 52 problematic skills with policy violations and 51 broken skills under the kids category," researchers said.
This included Alexa skills that were suspect of collecting user information, skills that included ads, or skills that promised various compensations for positive reviews on the Alexa store.
Amazon disagrees with the study but promises to do better
In an email today, Amazon disagreed with the report's findings, citing additional processes that are involved in the review of child-directed skills that the research team didn't take into consideration.
This included additional audits for child-centered skills that take place after skills are listed and certified on the official store and a skill monitoring system that scans skill responses for inappropriate content.
Since the "bad" apps were removed immediately after getting certified, these additional systems didn't get to kick in.
"Customer trust is our top priority and we take violations of our Alexa Skill policies seriously," an Amazon spokesperson told ZDNet.
"We conduct security and policy reviews as part of skill certification and have systems in place to continually monitor live skills for potentially malicious behavior or policy violations. Any offending skills we identify are blocked during certification or quickly deactivated.
"We are constantly improving these mechanisms and have put additional certification checks in place to further protect our customers. We appreciate the work of independent researchers who help bring potential issues to our attention."
If these new certification checks will make a difference remains to be seen, most likely during a future round of research.
Additional details are available in a paper titled "Dangerous Skills Got Certified: Measuring theTrustworthiness of Amazon Alexa Platform" [PDF] that was presented this week at the FTC's PrivacyCon 2020 virtual conference.
The research team also ran similar tests on the Google Assistant store, but said that Google handled it much better.
"While Google does do a better job in the certification process based on our preliminary measurement, it is still not perfect and it does have potentially exploitable flaws that need to be tested more in the future," researchers said.
"In total, we submitted 273 policy-violating actions that are required by Amazon/Google, and observe if they can pass the certification. As a result, 116 of them got approved. We submitted 85 actions for kids and got 15 approved; for other categories, 101 actions approved among 188 actions.
Here is an example of Assistant actions (apps) that were approved during tests, collecting children's names: