Meet Kelly Davis, the Manager/Technical Lead of the machine learning group at Mozilla. His work at Mozilla includes developing an open speech recognition system with projects like Common Voice and Deep Speech (which you can help contribute to). Beyond his passion for physics and machine learning, read on to learn about how he envisions the future of AI, and advice he offers to young people looking to enter the field.
We interviewed Kelly as part of AI4ALL’s Role Models in AI series, where we feature the perspectives of people working in AI.
As told to Nicole Halmi of AI4ALL by Kelly Davis; edited by Panchami Bhat
NH: How did you decide to get a bachelor’s and pursue a PhD in physics? Were you interested in the field at a young age, or did you discover it in college? And how did you come to specialize in machine learning?
KD:
Physics, and understanding our world at a deeper level fascinated me endlessly when I was young. The more I learned about the world the more I retained this sense of awe.
In the late 90s during the first internet boom, I was living in Washington DC and my friends and I could dimly see the future of technology — that computers will be able to talk to us, understand us, and hold conversations with us. Instead of developing a start-up to directly associate with that, we decided to start an art collective. We’d create installation pieces, and for these pieces to interact with gallery goers, we ended up learning about neural networks and machine learning. That’s sort of the first part of how I got interested in machine learning.
Later, in 2011 with another friend, from the German Research Center for Artificial Intelligence (DFKI), we created a start-up.The start-up created AI agents to answer common knowledge questions using the web as a data source. This type of machine learning was our secret weapon as we needed machines to learn to do things that we couldn’t do, and to write code we didn’t have the capacity to. We delved deeply into machine learning technologies to create this agent.
Then I joined Mozilla in 2015 as part of Firefox OS, a smartphone operating system Mozilla was working on at the time. My role was to create a virtual assistant, much like Siri but for Firefox OS. Unfortunately Firefox OS didn’t pan out as everyone would’ve liked. However, in trying to create this virtual assistant I recognized there were gaping holes in the open source community for speech recognition and associated speech datasets. So, I worked on trying to patch these holes in the open source community. Initially, the machine learning group was focused on creating a speech recognition engine and also collecting data so we could open source data to actually train this engine with. Because of our success we have more freedom to look at projects from anything like automatic summarization, to speech synthesis, to conversational agents.
Can you describe what you do as a machine learning researcher at Mozilla? What does a typical day look like for you? What kind of projects do you work on?
I’m a manager of the machine learning group, so a lot of what I do is tending to the flock. I wake up early because my focus is the best in the morning. I usually start my day reading and writing research papers, taking my dogs for a walk around the Spree River (which runs through the middle of Berlin), eating breakfast, and answering emails.
After that, there’s a myriad of things I work on, starting with one on one meetings with my team. I check their progress on projects, and provide support with any barriers they’re encountering in their work. We also do an internal journal club, where every week someone presents a research paper that they’ve found interesting over the week.
I meet with Mozilla external partners. Partners may want to help pool data resources for Common Voice, or talk about the internationalization effort that’s going on now at Common Voice. There are also internal groups at Mozilla that are using our software and I may meet with them to see what we can change or improve, or to understand new machine learning technologies that we can supply.
Can you talk about Project Common Voice? What is the goal of this project? Why is it important? Can people still contribute?
Common Voice and Deep Speech are our two-pronged approach to actually opening up speech recognition. Deep Speech in particular is about opening speech recognition algorithms and associated models to the world. We’ve basically created our own speech recognition engine from scratch, using the TensorFlow machine learning framework.
One of the big problems in training such a speech recognition system is there’s not enough data that’s in the open. Existing data is controlled by a few big companies, and open datasets are basically only available in English. Data available for purchase isn’t sufficient to produce a production-quality speech recognition engine.
Common Voice is addressing this data problem. Individuals can record themselves reading sentences, and we’ll save that data. Alternatively, individuals can listen to another person reading a sentence out loud, and then verify whether or not that person accurately spoke the sentence displayed. The dataset of sentences and associated audio that we’re collecting from Common Voice is used to train speech recognition engines.
We’re starting with English but we’re going to expand into other languages. Then we’ll be able to create open speech recognition engines and open models in various languages and various accents, irrespective of gender, sex, or age. We’re opening up speech to the world.
What has been the proudest or most exciting moment in your work so far?
When our speech recognition system became super-human. We benchmark where our particular speech recognition engine is based on a dataset. The dataset has been tested on humans, so we know that around 5.8% of words are incorrectly understood by humans for this particular dataset. Just recently our speech recognition system actually beat this human benchmark at 5.6%.
Where do you see AI making the biggest impact in the next 5 years? What are some of the important things people should be doing to create a positive future for AI?
It’s become harder to answer the five year question. A lot of AI and deep learning work have become very successful. Twenty years ago it was easier to see 5 years into the future, whereas now it’s becoming harder and harder because things that seemed impossible are now becoming possible.
AlphaGo is a concrete example of that. Two years ago, people generally thought that creating an algorithm to actually beat the top professional in Go was a 10-year problem. They thought, “Oh, in 10 years we might be able to do this.” However, a year later, there was AlphaGo and it had beat the best professionals at Go. It’s becoming harder to predict what’s going to happen 5 years out because of this compounding of AI progress.
To create a more positive future for AI, we need to expand and diversify the pool of talent that’s working on AI.
A problem that people encounter in AI, or in any research field for that matter, occurs when there is unified view of particular problems. It’s not conducive to finding solutions if everyone does the same frontal attack and they’re all failing. However, with a diversity of viewpoints and ideas, within a particular field, problems become more easily solvable. This is core to the work of Mozilla’s Open Innovation team, which is designing projects, infrastructure and incentives to best possibly allow for collaborative, multiperspective problem solving.
Another reason for diversification is that, there’s a widening economic gap, especially in the US. It’s clear that AI has become, and will become, more prevalent and people working on AI will have relatively reasonably paid jobs. One way to narrow the economic gap is to diversify the talent pool in AI.
What advice do you have for young people who are interested in AI who might just be starting their career journeys?
Two pieces of advice. First: learn the basics, and learn them well. Make sure you know linear algebra, probability theory, information theory, and algorithms, because these fundamental units are used again and again in the work you’ll be doing for years to come.
Second is to learn how to learn. Throughout your career you’ll be called upon to learn new things, could be algorithms, technologies, machine learning techniques. If you take the time to understand how you learn best, the effort invested in that will compound itself and your knowledge will compound over time.
About Kelly
Kelly Davis studied Mathematics and Physics at MIT, then went on to graduate work in Superstring Theory/M-Theory, working with “Genius Grantees” (MacArthur Fellows) such as Daniel Friedan, Nathan Seiberg, and Stephen Shenker. He then went on to code, joining a startup that eventually went public in the late 90’s. After that he decided to move to Berlin and come on board a startup working on natural language understanding.
In 2002 he joined the Max Planck Institute for Gravitational Physics where he worked on software systems used to help simulate black hole mergers. Jumping the fence again he went back to industry and worked at Mental Images/NVIDIA writing 3D rendering software. After that he worked on natural language understanding at a startup, 42, creating a system, based off of IBM’S Watson, that is able to answer general knowledge questions. Kelly joined Mozilla in 2015 where he now leads the machine learning group working on STT, TTS, NLU, and various other machine learning problems. He’s based in the Mozilla Berlin office.
Follow along with AI4ALL’s Role Models in AI series on Twitter and Facebook at #rolemodelsinAI. We’ll be publishing a new interview with an AI expert on Wednesdays this winter. The experts we feature are working in AI in a variety of roles and have taken a variety of paths to get there. They bring to life the importance of including a diversity of voices in the development and use of AI.