Episode 3: How Machine Learning powers noise cancelation software Krisp

On the 3rd episode, Tigran Petrosyan hosted Davit Baghdasaryan - the co-founder and CEO of Krisp, the world’s number one noise-canceling app powered by AI. Before Krisp, Davit used to lead product security for Twilio product lines. Here, he shared the founding story of Krisp, how they turned their AI innovation into a product that is now loved by millions of users and businesses. Davit also introduced the latest Krisp 2.0 release and how its transcription technology allows users to easily generate summaries of their conversations and take quick actions following them.

TIGRAN PETROSYAN: Hello, everyone. I'm very excited to welcome you to the next episode of the Real-Life AI podcast series. Today I'm very excited to have Davit Baghdasaryan as my guest. Davit is the co-founder and CEO at Krisp AI. 

I have known Krisp for a while already, and it's particularly exciting for me since I've been seeing the development of the company firsthand. I’ve been seeing how quickly it became one of the key players in the audio and audio AI space, how it is widespread all over the world right now, and how much that influenced us as a company to start and see its success. 

So, Davit, welcome 

DAVIT BAGHDASARYAN: Thanks, Tigran. It's a pleasure to be on this podcast, and it's a pleasure to be your guest, in particular. I'm excited to see what we will be talking about today. 

TIGRAN PETROSYAN: Perfect. Yeah, just to get started, can you share your founding story and how you decided to start Krisp? 

DAVIT BAGHDASARYAN: Yeah, absolutely. I used to work at Twilio, so I was living in San Francisco, and I was traveling a lot to Armenia, where I am originally from. And every time I was in Yerevan, Armenia, during the summers, and if you haven't been to Yerevan in the summertime, in the evenings, you don't want to be at home; you want to be outside because of the great weather and everything. But my team was in San Francisco, so because of the time zone difference, all my morning Pacific meetings were evening time in Yerevan. So I always ended up wishing there was a button I could click that would give me some privacy and would hide where I'm calling from. So that was like the natural need. I had a pain I needed to solve. So I thought to myself, why don’t people use AI for this? It was back in 2016 when I met my soon-to-be co-founder, Arto, and I shared this idea with him, and apparently, he was looking for an interesting idea in AI and voice to start working on. So there was a really good match at that point, and that's how it all started. So, Arto started working on this with our Chief Scientist, Stepan, and at some point a year after that, we decided to start the formal company and pursue this journey. 

TIGRAN PETROSYAN: Yeah, super exciting. I know Arto as well because we were childhood friends. We started school together, and it is interesting to see this all as we were discussing this idea, and he was sharing it with me. And, of course, as with any idea, initially, it seemed quite impossible or quite difficult to achieve. Can you share? I mean, first, maybe what exactly Krisp does at the beginning, what was the original goal, and how did you make the impossible possible with time? 

DAVIT BAGHDASARYAN:  Yeah, absolutely. So Krisp removes background noise from conversations like this from online meetings. Basically, it uses AI to differentiate what's background noise and what's the human voice and separates them from each other in real time. And wehave  pioneered this technology in the industry in 2017. Before that, state-of-the-art noise cancellation was very multi-microphone dependent, so hardware dependent, like what we have done. We have removed the hardware dependency from there by doing everything in software using AI. And when you do it with AI, the accuracy and more difficult noises can be entirely removed, unlike if you do it with multiple microphones. That's the problem we were trying to solve, but the funny thing is that we didn't have any experience in audio, so we decided to approach the problem with AI only, which I think a lot of the audio experts considered to be a silly idea at that point. So I always joke that if we had more experience in audio, we probably wouldn't have started this company because we would think this is either like a really difficult problem or just doing it through AI is a bad idea. Sometimes, knowing more harms you more. 

TIGRAN PETROSYAN: Yeah, I can certainly relate to that. When we started our company straight out of Ph.D., we absolutely had no clue about how difficult it is to build a business. I mean, that's one thing. But of course, having technology is one thing, and then building the business, product, customers, fundraising, and really the chances of succeeding in an area which is super hot, and knowing that big tech companies are building it as well, that certainly I can relate to because the more you know, the more difficult it becomes. And then, magically, it works when you have a very strong focus on what you do, what you want to achieve, and going beyond what is possible. I can certainly relate to that. Can you share a little more about how you're leveraging AI with Krisp and why you feel like your technology is the best in the market? Because I know that there are other noise-canceling techs. Sometimes, Google Meets and Zoom trying to incorporate around that. Yeah, can you share more about it? 

DAVIT BAGHDASARYAN: Yeah. So from day one, we were very serious about the quality of this technology. The difficulty when it comes to voice and audio is that our ears are very sensitive to any artifacts or any too much suppression that you might introduce. So while it's easy to have some early results in noise cancellation with AI, what really is difficult is making it a usable production quality. So, that's the gap, right? There is a big gap there. When you process voice, obviously, you want to understand whether you have impacted the voice itself or not, right? Because if you degrade the voice, it's not going to be usable, and it's really hard not to degrade the voice because of the difficulty of DSP and math involved here. So there are some metrics in the world that the audio industry has been using, or the voice industry has been using for many years. And I even remember very early on, when we had tiny funding, we found this great company in Germany, Head Acoustics, which was the best in the world when it comes to these tests. We partnered with them to run tests and understand what's our objective measurement metrics right for voice degradation and noise cancellation. From there on, we have been constantly running these metrics and also running our competition, the competing technologies through these metrics. And that's what sort of gives us a lot of confidence that what we are doing is very high quality and is the best out there in the industry. You might think that it's noise cancellation, and we do it once, and that's it, but that's not really the case. We continue, this is almost the 6th year, and we continue improving this technology because there are so, so many noises that we have in real life. Some of the noises are very close to the human voice, so it's really difficult to differentiate them, and what's also very difficult is if you want to separate your voice, the primary speaker voice, from a background voice that is very close. That's another sort of variation of the problem that we have solved and continue improving it. So the topic is really sophisticated, but we have very strong tooling for comparison, for benchmarks, and so on. 

TIGRAN PETROSYAN: Yeah, I can absolutely relate because I remember when the Pandemic just hit, and we were just getting started working from home. I believe Krisp was known before that, but I felt like this was one very inflicting point where so many people started working from home. And then, of course, at home, there's so much noise. The dogs are barking, kids are crying, and outside there is some construction where any kind of noise was so annoying to hear, not only for the other side but also for the listener, or for me as well. And that was kind of a game changing for me when I started working from home, and I really felt like I haven't even left the office. And I can imagine this also brought you a lot of users right away. Can you share how was that experience? Because suddenly you have triple, 10x more users in a very short time. How did you handle this load? Generally, this is a kind of a startup question rather than a technology question. 

DAVIT BAGHDASARYAN:  Sure, yeah, it's interesting. We launched our product (Beta), but actually, it was like, really, our first time live. It went live in June 2019, so almost eight months later or nine months later, COVID-19 hit, and a month prior to COVID-19, we had hired our first salesperson. So we were sort of ready, but we were not ready, really. We were just lucky, in a sense, that we had some structure in place when this all started, right? And you're right, when people went home during COVID-19, there was no solution for removing noise, and a lot of people started looking for such solutions, and we already had, like, SEO in place, they found us, so we were sort of ready for that, right? And the nature of Krisp is that you don't know right now that I'm using this unless I start showing it or promoting it with my word of mouth, right? You won't know. And a lot of people started doing that because it was a very magical experience. They have never experienced it. They have started to tell others, their colleagues, and their partners, just on social media, and all that started. A lot of new sign-ups for Krisp. And not only sign-ups but we have been awarded four or five awards in 2020, like AI companies to watch, best at 50 AI companies, right? Or Cloud AI companies, I don't remember the exact name of the award. And then the Webby Award, and then one of the most innovative companies out there, and then a Gartner cool vendors. There was a lot of love coming from the industry, and it was awesome, right? It was a great time for if you're building a startup, that's what you want, right? You feel your startup’s growth, and yes, everything grew, like 10x and more, and a lot of things broke, right? But it's interesting one example that I vividly remember is that we signed our first B2B deal within three days. It was our first B2B deal. I think it was like $50,000 or something, which was, like, a month prior to that, we hired our first salesperson. A lot of interesting stuff happened in 2020, for sure. 

TIGRAN PETROSYAN: Yeah. That's very exciting. So will you touch a little bit about the users? Can you share more about what are the main users for Krisp on a B2B side? On the B2C side? 

DAVIT BAGHDASARYAN: Yeah, and that's a good question. I mean, when COVID-19 hit, we didn't even know who our users should be. Like, we were not clear yet. We had some understanding of it. Like, they were like remote workers, which was a thing, but a very new thing. And it became a thing after COVID-19, right? Or, you know, like there were a lot of online teachers that were happily using Krisp, but was that a segment that we had to focus on? Right? Pretty much because noise cancellation is such a universal problem, everyone needs that. Everyone has that pain. Apparently, some people have higher pain than others, which was very difficult to understand during COVID-19 because everyone was having a big pain. So that's good and bad because it's a large audience, but at the same time, it doesn't let you focus. Having an ICP is so important for the focus of the company, especially in B2B, well, even if you're in B2C, that's the case. So, like today, Krisp has a lot of traction in call centers. When I talk B2B, clearly, a lot of pain in call centers. This hasn't been a new problem in call centers. It has accelerated because all the old solutions were just impossible because the agents were working from home, right? So that's a big use case. It continues to grow for us. And salespeople, like people who have external calls, have higher pain, because they care more about professional calls and so on. But we have a lot of usage for internal communication within the companies as well, right? Whether those are just developers talking to each other or pretty much any team that is doing online calls internally. 

TIGRAN PETROSYAN:  Yeah. It's so cool to see how widespread the applications can be for such technology. From the first side, it might look simple, and I really understand how difficult it is actually to clean and have such a clean noise cancellation, which I'm so proud you guys have achieved and continue to improve. But what I want to focus more on next is Krisp 2.0. I've seen a very exciting announcement lately. Maybe I will let you elaborate more on what Krisp 2.0 is and take it from there. 

DAVIT BAGHDASARYAN: Sure. Yeah. So look, the way Krisp works, it's an app that you install on your laptop, and then you configure it as your microphone and speaker, right? You can do it on the system, or you can choose it from a target application like Podcastle that we are using right now or like Zoom or Teams, or any other one. So out of the box, Krisp supports hundreds of such applications which work with the microphone and are compatible with Krisp. And we always thought that, all right, we are within the conversation. We are making the conversation better, like communication, more effective by removing the noise. And as a company, our mission is to make all communication, all voice communication, more effective. So what is the next thing that we want to do? And clearly, when you think about communication, voice communication which is essential for humanity, right? Everything starts with voice, right? Whether that's business or other, especially in business communication. And you can break it into two parts, whether the first part is during the meeting, communication, what happens during the meeting, and then post meeting, right? Why post-meeting? Because there are certain things you discuss and there are action items, there are follow-ups to that meeting. So we are considering both parts of voice communication. When you zoom into voice communication itself, there are so many things right there that can make our communication better. That starts from, you know, the clarity of the audio and voice, the quality of voice, and goes to understanding accents, understanding the language of the other person. 

And then, it goes to the effectiveness of the conversation, which is how you say things, are you to the point, or are you using are you two verbals? Right? So there are so many things you can improve by not just reporting back to the user but suggesting why the conversation is happening. Very interesting space. All of that is super interesting for Krisp, right? Everything in that space is super interesting for us, again, because it sort of goes hand-in-hand with our mission. Now, we always wanted to get into the content of the conversation so that we can create more value and help our users better, and the way we have designed Krisp, and this is pretty important, we have designed our technology to run on a device. We never wanted to see the voice recording or voice conversations, audio conversations on our servers; that's another thing that is very special about our technology, right? It runs on your device and does all this magic. Now, having said all that, we started working on transcription technology and speech-to-text technology a while ago, right? And the criteria for us was to do the transcription on a device rather than sending the audio to our cloud and doing that like others are doing in the industry. It's a really difficult problem to do it on a device, and I'm not even sure if there are other companies doing this near real-time. And that's what we shipped a week ago, right? Or ten days ago. So basically, we have incorporated speech-to-text technology inside Krisp. And now, when I’m peaking in a conversation or like in a call like this, all the conversations are being transcribed for me. We are not recording the meeting; we are not pretty much touching the audio, right? Or we're not saving the audio anyhow. But once the meeting ends, we generate the summary and discussion items for the conversation, and we do it in such a seamless UX where you don't have to add a bot to your call, which is a weird, awkward experience, right? It just happens. I don't do anything, and all my conversations are summarized for me for my later use. Um, yeah, that's what we have launched ten days ago, and we have big, big plans that we're going to add on top of this down the road. 

TIGRAN PETROSYAN: Yeah, this is so fascinating. I can see this problem for myself. Taking notes while speaking and while listening to the speaker and understanding what they say and reflecting is a really difficult problem, and I was facing that. And then having those notes properly managed in a paper or some documents, and then I'm able to find the right info and summarize. This was really a nightmare for me, and as I started leveraging more and more with Krisp lately, I could see how much of a life-changing event that was for me. So I can imagine so many people starting to use it and really seeing that big change in their lives with this transcription and notes taking and summarizing. Can you share a little bit more about the summarization part? Are you leveraging any generative AI applications, and how are you leveraging that since this is such an exciting direction of AI lately? 

DAVIT BAGHDASARYAN: Yeah, absolutely. So we are using the latest LLMs for this, so surprisingly, the quality of it, like we always had this vision that we need to summarize the calls and generate notes and everything. A year ago, we thought the industry is far away from this, and we were even considering starting to build our technology here and seeing whether we can make any progress or not. But within the last six months, that has changed. I think after Chat GPT, people realized more of the possibilities, right? So I would say that I'm very surprised by the quality that it produces. You need to spend a lot of time on proper prompt engineering and everything to sort of tune it to do this right for this use case. But the quality is just amazing, and I can see that it's going to get better and better over time. Again, as I said, with proper prompt engineering and newer versions, so yeah. And furthermore, I think you can use LLMs for more than just summarization; they're just super clever, you know? Technologies that are still going to create a lot and make a lot of people surprised at how many new things they can do through that. So we have big plans around the LLMs and are super excited about that. 

TIGRAN PETROSYAN:  If it's not a secret, can you share some of the big plans? Next steps for Krisp?

DAVIT BAGHDASARYAN: Yeah, as I said, everything related to making online conversation more effective is interesting to us, right? And there are certain things that you can count on, like understanding different accents is a pain, and understanding different languages is a pain. Generating really high-quality summaries and action items, and follow-ups is a pain. And then multiply that by a team when you are in a company and scale it to the company's pains. Right? I think that's, like, I just shared a ten-year roadmap with you of what you can build for both for individuals and also enterprises. And doing this across the company and different devices, different use cases, there's a lot here. 

TIGRAN PETROSYAN: Yeah. This is certainly so exciting, to see already what we can do with noise cancellation, the meeting assistance, transcription, the summary, and just imagining all that coming up in the next few years is just mind-blowing. Of course, we all are so mind blown with Chat GPT, another Generative AI tech, and seeing that coming to those technologies is just going to make our lives so much easier. And I want to thank Davit so much with Krisp for making this big part of our lives and making our life easier, making it in a way that is very user-friendly, very secure for us, for our data, and then constantly pushing the boundaries to the levels that were not possible before. I'm very excited to have Davit as my guest in this Real-Life AI podcast. 

Just to summarize, today we've talked about Krisp, its founding story, the technology around noise cancellation, Krisp 2.0, how it not only cancels the noise but also transcribes the voice, creates some summaries and some follow-up steps for you, and then what comes next in the future. So stay tuned with Krisp, use Krisp, and make sure that your life gets better with Krisp. Thanks, Davit. 

DAVIT BAGHDASARYAN:  Thanks a lot, Tigran. Bye.