Perhaps you’ve heard about our brilliant artificially-intelligent future: self-driving cars, voice-based interfaces, instant translation, self-service chatbots – all based on software that simplifies and automates the complexities of life in the information age. It's a market that's predicted to grow to as large as $40 billion worldwide by 2020; when you add Machine Learning, that number is closer to $125 billion.
Did you ever wonder what makes those Artificial Intelligence (AI) systems so smart?
It’s people. Big crowds of people. And when it comes to crowds, quality counts.
It Takes a Village To Raise an AI. Just like a human child, every advanced AI system needs to be “trained” before it can interact properly and respond appropriately in different situations. Kids need parents, teachers, books and experiences. AIs need data sets, machine learning systems and actual human beings to interact with.
To a great extent, the quality of the AI system once it's deployed -- its ability to understand and interpret ordinary language, its ability to make appropriate decisions from clues and context, and its accuracy in anticipating the needs of human users -- depends on the quality of interaction during its initial development phase.
This is enough of a problem that data scientists spend 90% of their time ensuring data quality. And while companies like Apple, Google, Microsoft and others are investing billions creating next-generation AI experiences like Siri, hardly anyone has a solution for getting those AIs up to speed with high quality interactions. That’s a huge bottleneck for AI systems that depend on massive amounts of data to shape their learning and interactions with humans.
Building a Data Refinery. Data scientist Dr. Daniela Braga has been working on this problem for more than a decade as a researcher for Microsoft and VoiceBox. In 2015, she and a partner launched DefinedCrowd, a Seattle-based startup that promises a unique approach to improve the performance and dependability of AI systems faster, at lower cost, and with more controlled processes. And today they announced the launch of their first product.
“Imagine you have an expensive, high performance Ferrari sports car,” says Braga, CEO of DefinedCrowd. “You don’t want to put crude oil in the tank; it will ruin the engine. You need refined, high-test gasoline to really make it go. AI systems are the same way. If you start with bad data, you will get bad results.”
DefinedCrowd, she explains, is a refinery for data.
Like an oil refinery, Defined Crowds uses a multi-step process to ensure that AI systems receive the highest level input and the most efficient training.
“First we apply data science,” explains Braga. Specifically, the company developed templates and processes to streamline the training process, focusing on critical steps and concepts in a systematic way.
Then come the crowds. DefinedCrowd has built a network of qualified experts and students to interact with AI systems according to the templated procedures. They call this "Crowd-as-a-Service," and it combines the scale, flexibility and cost advantages of crowdsourcing with the ability to precisely target the right people for the right tasks.
Finally DefinedCrowd applies machine learning algorithms to turbo-charge the knowledge-acquisition and logic processes.
Quality in, quality out. If you wonder why the process of training an AI system matters, look no further than Tay, the chatbot that Microsoft turned loose on Twitter in March, 2016 for crowdsourced optimization at scale. It took less than 24 hours of exposure to an “undefined” crowd to turn Tay’s innocent conversation into a stream of racist hatred.
DefinedCrowd says they eliminate the uncertainties of mass crowdsourcing by recruiting selected individuals and assigning them a clear role in the training process. The data science and machine learning layers ensure data quality and process integrity, resulting in significantly faster system optimization with far less impact on the client’s data science resources and staff. Considering how tight the labor market for data scientists is, that productivity boost by itself is a huge benefit.
From idea to product. Since DefinedCrowd opened its doors in August, 2015, it’s already attracted attention from customers including Google; Nuance Communications, the global leader in speech solutions; VoiceBox and GBO, a hot Bay Area startup.
Today the company has announced the alpha release of their Software-as-a-Service (SaaS) for enterprise-scale speech technologies and natural language processing (NLP). According to the company, the alpha version of the SaaS for enterprise customers focuses on speech and NLP data workflows.
The platform is optimized for AI and machine learning (ML) applications and enables enterprise data scientists to collect and enrich data from scratch or have their own data being cleaned and structured. “Instead of spending months building engineering infrastructure to collect and label data, with the new platform from DefinedCrowd, data scientists can select from a series of specialized built-in data workflows that will guide them through a smart combination of humans in the loop and machine learning models in 46 languages to deliver back high quality structured data in a faster turnaround time than typically experienced,” said Braga.
In the alpha version, the data workflows released include: speech data collections, speech transcription and validation, data collection and semantic tagging for chatbots building, and sentiment tagging, and semantic annotation.
Braga says she hopes the release of the new product, the first in the company's roadmap, will make it easier for companies to develop, optimize and deploy their AI and ML solutions faster, so that those tools can help us poor humans cope with the complexities of the digital world more easily and make better decisions.
People helping machines help people. That sounds like an idea that might just draw a crowd.