Episode 1: AI talks with Trevor Darrell from UC Berkeley
TIGRAN PETROSYAN: Hello, everyone. I'm excited to welcome you all to our new podcast series called Real Life AI. I'm your host, Tigran Petrosyan, the co-founder and CEO at SuperAnnotate. We're building end-to-end training data infrastructure – the backbone for various machine learning applications. In our podcast, we bring prominent people from various industries and universities to talk about the real-life applications of AI, and these are generally the folks who are actually the frontiers building those applications. And today, I'm very excited to have Professor Trevor Darrell as our guest, a prominent figure in machine learning and computer vision space. He's a professor at the faculty of CS and EE divisions at UC Berkeley. A few things to mention are that he founded and co-leads Berkeley’s Artificial Intelligence Research lab called BAIR lab, the Berkeley DeepDrive (BDD), and also the recently launched BAIR Commons program in partnership with Facebook, Google, Microsoft, Amazon, and other partners. So, very excited to have you, Trevor, today. How are you doing?
TREVOR DARRELL: It's great to be here. I'm doing well, and thanks for inviting me.
TIGRAN PETROSYAN: Yeah. Where are you at the moment?
TREVOR DARRELL: I'm in Berkeley, California, at home, near my lab, and I have a nice view of San Francisco out the window right now.
TIGRAN PETROSYAN: Oh, yeah. I remember when we were part of the Berkeley accelerator program called Skydeck. It was such an amazing view from there, and just the whole opportunity of being in Berkeley was so fascinating because, as a startup, when we're just getting started, the whole Berkeley atmosphere, the collaboration, the very inspiring views you see to the bay, it's just incredible. Can you share a few words - why UC Berkeley? I see you have been a professor there for 15 years already, and you basically were from Stanford before that. Why UC Berkeley?
TREVOR DARRELL:Well, I did teach at Stanford very briefly, but I was mostly at MIT as a professor, and then I moved to Berkeley. And it's not just because of the view, although I think the view is part of it and the weather is part of it. The quality of life in the Bay Area, both personally and professionally, is quite high. I think, as many people know, the ecosystem of tech here is very invigorating. Yeah. And it's just a great place to be.
TIGRAN PETROSYAN: Yeah, I can fully agree with that. I think as a startup when we were just getting started, just being in that atmosphere, and Berkeley's machine learning, especially computer vision lab, when we saw the people who are working there, and we got the opportunity of actually getting to know them and actually working, especially when we met - it was about four years back - there was such a big boost for us as a company to get started. So this was such an exciting time, and I can't agree more on what an exciting place Berkeley is to get inspired, create, and build the innovation and the next generation of anything, whether AI or anything else. So I can fully agree with that. You're engaged in so many things, from professorship, advising a lot of companies, startups, leading all the labs from BAIR, BDD, and I'm sure you're also supervising a lot of students. How are you managing to do all that together? How are you managing that time?
TREVOR DARRELL: And I have two teenage and preteen-age kids at home, so another big chunk of time that's really important. Yeah, I think that's the big challenge in life, how to allocate time. And the best part is when you work with good people and good organizations, hopefully, you can trust them to often get a lot done. So the best part about collaborating with Berkeley students and Berkeley startups is that you can operate at a high level often and sometimes dive really deep down to get into some specific technical issue. But yeah, it's really a pleasure to be able to have the leverage of all the strong people in this ecosystem.
TIGRAN PETROSYAN: Yeah, absolutely. Let's talk a little bit about probably one of the main focuses, I guess, in the university, the Berkeley Artificial Intelligence Research lab -BAIR. Can you share more about what they do? What is BAIR about?
TREVOR DARRELL: Yeah, sure. BAIR is an affiliation of faculty at Berkeley who are interested in AI, and it's one of the mechanisms we use to engage in industry outreach. You mentioned the BAIR Commons program and the Berkeley DeepDrive program, both of which are BAIR initiatives through which we engage, collaborate with, and get sponsorship from the industry. The sponsorship and the funds are very important, so we can buy the GPUs and acquire the space that we sit in. But actually, I'd say the biggest point of leverage is trying to enable actual collaboration between top industry labs and startups and AI researchers in academia, at Berkeley, in particular. The best research right now… it's hard to do the best research alone in academia, and I think it's even hard often to do the best research alone in the industry. There's something about the synergy between the resources that industry offers and the kind of rich idea space and youthful energy that comes out of academia that if we can collaborate together, we really can have an impact. So that's been, I think, the most important part of the BAIR programs from my perspective.
TIGRAN PETROSYAN: Yeah, great. If you think about some projects from BAIR, what do you think are the ones that really have this very short-term real-life AI application right now that you can specify?
TREVOR DARRELL: Going back a few years as BAIR was being founded or in the precursor years, the development of convolutional networks for vision. And they're kind of bringing them out at industrial strength through the Caffe framework, fully convolutional networks, some of the early vision language models, those all came out of BAIR, and other labs also had some similar efforts, of course, at the same time, it was a very rapid advance. Those, especially in the fully convolutional networks for the autonomous driving space, I think it had a clear impact, sort of evolved into u-nets. U-nets are still in the next model, which has come out in the last few years, again, out of collaboration through the BAIR Commons collaboration. These models are still at the core of many supervised and unsupervised learning systems that we see moving forward with the rapid pace of advances in the field right now.
TIGRAN PETROSYAN: Yeah, it's really interesting. I'm guessing so many listeners probably would not guess that a lot of the foundation of machine learning algorithms or libraries like Caffe were originally from BAIR. And I guess many of you have heard about Berkeley DeepDrive, which also originates from BAIR. Can you maybe share more about that? What’s its story, how it started, and what’s its mission?
TREVOR DARRELL: Yeah, I mean, the mission is what I said earlier, which is to bring together the faculty at Berkeley who are interested in this area across several departments. I mean, most of us or many of us are in computer science or electrical engineering, but there are other departments as well, including folks who are interested in neuroscience and psychology. And to have a place for us to interact together and for us to interact with industry - that really didn't exist at Berkeley prior to the faculty getting together and creating the structure. And we now have our own lab. We have a new space in a new building, also with excellent views. I encourage you to drop by and join us. I think the act of bringing together ourselves in order to engage with the industry actually creates synergies inside the university at the same time. So it's just been a very productive structure, and it brings, it kind of puts the point on a collaborative style of research. I think actually one of the things that’s an example of Berkeley research and BAIR research is our collaborative nature. Many of our faculty work together on papers. We see this in the industry too. Papers now have 30 authors on them, but for quite a while, Berkeley and BAIR AI researchers, I think we're a bit ahead of the curve in terms of how many projects were cross-group. A vision group student and a robotics group student would work together on a paper or students across several groups. It's not quite the academic silo that we used to have in academia. And I think there at Berkeley, we’re ahead of the trend in that regard, and it's served us well.
TIGRAN PETROSYAN: Yeah, it's so exciting to see that, you know, research and university, especially in industrial applications, are coming together in programs like BDD. Can you share more about what, specifically, what kind of problems it solves for industry or generally for research and the public?
TREVOR DARRELL: Yeah, I think BDD has been and continues to be the home of much of the perception and robotics research at Berkeley. And we focused in the early years of that program on core models for pixel labeling and autonomous driving scenes. Is this pixel a car? Is this pixel a person? Also, of course, detection models in concert with pixel labeling. And I think many of the competitive architectures arose at Berkeley or near Berkeley, or certainly, there’s very active literature in that area. And hand in hand with new models came new data. I think one of the key things that BDD achieved and maybe one of its more lasting products has been the data set by Professor Fisher Yu, who's now a professor at Zurich. He was a postdoc at Berkeley, leading much of BDD for several years. He led the data collection effort and data labeling effort in collaboration with other partners who provided the data. Nexar, which is also a company that I've advised and continue to advise, provided dash cam data. And so we collected this, we labeled it, we made it available to the academic community. And, of course, all of the industry has its own data sets, often closed, and BDD may no longer be the largest data set out there, although I think in certain aspects, it may still have that title. But having an open, academic-centered data set still turns out to be highly important as we explore both supervised labeling and increasingly semi-supervised and unsupervised labeling ideas.
TIGRAN PETROSYAN: Yeah, definitely. We at SuperAnnotate definitely very much appreciate such datasets. Of course, that further accelerates work to push for autonomous driving, and certainly, a lot of companies in this industry also very much appreciate that. One of the other areas I've seen… It's interesting in computer vision that you have so many applications, especially from satellite imagery. I've seen a project you're working on in BAIR in SAR. Can you share more about that and what kind of problem it solves? What is SAR imaging? Since this is more of a real-life AI application, I want to touch on some of the cases that can get the general public really excited about as well.
TREVOR DARRELL: Yeah, as many people know, there's a great need for geospatial imaging and aerial image interpretation for a variety of applications: land management, maybe disaster response, responding to wildfires. But also, the events in Europe and Ukraine have been on many people's minds, and many people at Berkeley wanted to explore what kind of applications were relevant in that space. SAR Imaging is a synthetic aperture radar technology. It's an increasingly deployed sensor system that's now available through commercial imagery. It's an imaging-like modality. It's radar, though. And just as an automotive radar is somewhat different than what we call electro-optical or traditional camera sensors on cars. People haven't had the same kind of machine learning technology for radar sensors that they've had for electro-optical sensors. So we just took a deep learning multimodal approach and showed how the combination of supervised and unsupervised learning models using mass reconstruction techniques could learn a fused model, a joint model of SAR and electro-optical imaging of aerial imagery or geospatial imagery and perform tasks like damage assessment: “Has a building been destroyed?” or “Are there vehicles on the road?”. Questions like that are hard to answer from traditional imagery alone. Often some place is covered in clouds, and obviously, visible light doesn't penetrate clouds, but radar does. So if you have a multimodal model that is able to process SAR imagery, and you can train it jointly with visible data that you've labeled for things that you're interested in, and you can leverage unsupervised learning, so you can have orders of magnitude fewer labels than you would otherwise need. Suddenly, you can achieve a capability for processing geospatial imagery day or night or in various weather conditions. So there's a blog post you can look at and read out of the BAIR lab that released this kind of model, I think, in April of this year.
TIGRAN PETROSYAN: To think about how this kind of technology can really give so much analytics for such hot topics, especially in areas like wars, this is really fascinating how universities can engage in these kinds of applications right before people will think, oh, it's more industrial military, but there's so much work from universities that can be helping there. So it's really fascinating to see. I know that you're also involved in companies building computer vision, like the autonomous checkout system Grabango as a consultant scientist. What are the key problems that computer vision is solving there?
TREVOR DARRELL: I think that the problem is understanding the indoor environment involving shoppers and products in contrast to trying to respond to a disaster or scenario where you have a camera or an imaging sensor that might be quite far away and has to deal with a completely unknown environment. Here you've got in a store the same camera and similar lighting conditions day in and day out so you can get much more signal and a very precise understanding of the movement and activities in an environment. So that's the type of computer vision challenge that Grabango solves and does that to try and create very positive shopping experiences for people to avoid having to stand in lines because… I hate standing in lines, and so I like to avoid that whenever possible.
TIGRAN PETROSYAN: Yeah, that's such a cool application of computer vision. How far do you think we are for the full-scale implementation of this kind of technology? I have seen Amazon Go type of stores being up here and there, but do you feel like it's coming soon as a widespread application?
TREVOR DARRELL:I think so. I'm not going to speculate on the exact timescale. And there are important differences between the Grabango technology and other competitors' technology in particular, in terms of the ease of deployment and certain business case issues that - the cost of the sensors and the requirement for retrofitting and things like that, that I think, are going to make it very easy in the future for environments to become visually enabled. But it's coming, and I think in the future, we can have places where we don't need to have human beings just sitting at a cash register. We supervise these systems and hopefully make better use of their time and more productive environments, both in a business sense and in a personal sense.
TIGRAN PETROSYAN: Yeah, makes sense. I can't wait to have that in every store. [laughing] It's really going to save so much time. And then maybe one last question for me. There are so many people now, data science has become such a hot job or area of research right now. What would be your advice for starting new data scientists or the ones that just want to start in that field?
TREVOR DARRELL: That's a great question. I think, find great mentors and learning environments. If you have the ability to access top labs like Berkeley or Oxford, or European labs, that's great. But if you don't actually, it's been so democratized now you can work hard and find good local mentors and follow the latest blogs that are coming out of the open-source community; it's just striking how much is happening and how quickly it's going to evolve. Yeah, do good work. And I think the other thing I'll say is there's a trend that has always come and gone in our field, and that trend is whether the great innovations in AI happen in these enormous industry labs with rows and rows of hot iron churning away, boiling the oceans with GPUs now or supercomputers in the past. And those labs often do make innovations, and we can see some that have come out in the last year on diffusion models, for example. But then, every once in a while, there's an innovation that just comes out of a group of two or three people with maybe $10,000 worth of equipment or on the order of that. And that was the original revolution that GPUs came into AI and deep learning through the Krizhevsky model. Suddenly people said, “Oh, we knew more than play games on this hardware.” But there was a time when no one was using GPUs for AI, and it was less than ten years ago or around ten years ago that that happened. Prior to that, everyone thought you had to have 10,000 CPUs in a data center to do AI, and now you have to have 10,000 GPUs. But the game is changing. And even now, I've just seen a few papers, and some of the hottest models on image generation came out of a small group in Germany that now has a startup around it, but it wasn't necessarily coming out of an enormous company. So even small teams can still have a big impact, I think, either in academia or in startups. And I encourage everyone to be a part of this future revolution.
TIGRAN PETROSYAN: This is really great advice. Thanks, Trevor. This was such a great pleasure talking to you, and I'm really excited about this opportunity. And, yeah, for our guests, this has been our first episode of the real-life AI podcast. Our guest is Professor Trevor Darrell. We've talked about a lot of applications they're working on at UC Berkeley, BAIR Lab, BDD, some other applications he's involved in, like cashier-less checkout systems with Grabango, and SAR images. Such a wide range of topics we've covered within just 20 minutes. Thanks again, Trevor, and have a great day, and I'm really excited to chat again next time. Thank you.
TREVOR DARRELL: Yeah, thanks so much, and it's always great fun to talk to you.
TIGRAN PETROSYAN: Thanks.