Visual captions: Using large language models to augment video conferences with dynamic visuals
Google AI Blog
by Google AI
2h ago
Posted by Ruofei Du, Research Scientist, and Alex Olwal, Senior Staff Research Scientist, Google Augmented Reality Recent advances in video conferencing have significantly improved remote video communication through features like live captioning and noise cancellation. However, there are various situations where dynamic visual augmentation would be useful to better convey complex and nuanced information. For example, when discussing what to order at a Japanese restaurant, your friends could share visuals that would help you feel more confident about ordering the “Sukiyaki”. Or when talking ab ..read more
Visit website
AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR
Google AI Blog
by Google AI
4d ago
Posted by Arsha Nagrani and Paul Hongsuck Seo, Research Scientists, Google Research Automatic speech recognition (ASR) is a well-established technology that is widely adopted for various applications such as conference calls, streamed video transcription and voice commands. While the challenges for this technology are centered around noisy audio inputs, the visual stream in multimodal videos (e.g., TV, online edited videos) can provide strong cues for improving the robustness of ASR systems — this is called audiovisual ASR (AV-ASR). Although lip motion can provide strong signals for speech r ..read more
Visit website
Retrieval-augmented visual-language pre-training
Google AI Blog
by Google AI
5d ago
Posted by Ziniu Hu, Student Researcher, and Alireza Fathi, Research Scientist, Google Research, Perception Team Large-scale models, such as T5, GPT-3, PaLM, Flamingo and PaLI, have demonstrated the ability to store substantial amounts of knowledge when scaled to tens of billions of parameters and trained on large text and image datasets. These models achieve state-of-the-art results on downstream tasks, such as image captioning, visual question answering and open vocabulary recognition. Despite such achievements, these models require a massive volume of data for training and end up with a tre ..read more
Visit website
Large sequence models for software development activities
Google AI Blog
by Google AI
6d ago
Posted by Petros Maniatis and Daniel Tarlow, Research Scientists, Google Software isn’t created in one dramatic step. It improves bit by bit, one little step at a time — editing, running unit tests, fixing build errors, addressing code reviews, editing some more, appeasing linters, and fixing more errors — until finally it becomes good enough to merge into a code repository. Software engineering isn’t an isolated process, but a dialogue among human developers, code reviewers, bug reporters, software architects and tools, such as compilers, unit tests, linters and static analyzers. Today we d ..read more
Visit website
Foundation models for reasoning on charts
Google AI Blog
by Google AI
1w ago
Posted by Julian Eisenschlos, Research Software Engineer, Google Research Visual language is the form of communication that relies on pictorial symbols outside of text to convey information. It is ubiquitous in our digital life in the form of iconography, infographics, tables, plots, and charts, extending to the real world in street signs, comic books, food labels, etc. For that reason, having computers better understand this type of media can help with scientific communication and discovery, accessibility, and data transparency. While computer vision models have made tremendous progress usi ..read more
Visit website
Barkour: Benchmarking animal-level agility with quadruped robots
Google AI Blog
by Google AI
1w ago
Posted by Ken Caluwaerts and Atil Iscen, Research Scientists, Google Creating robots that exhibit robust and dynamic locomotion capabilities, similar to animals or humans, has been a long-standing goal in the robotics community. In addition to completing tasks quickly and efficiently, agility allows legged robots to move through complex environments that are otherwise difficult to traverse. Researchers at Google have been pursuing agility for multiple years and across various form factors. Yet, while researchers have enabled robots to hike or jump over some obstacles, there is still no genera ..read more
Visit website
Differentially private clustering for large-scale datasets
Google AI Blog
by Google AI
1w ago
Posted by Vincent Cohen-Addad and Alessandro Epasto, Research Scientists, Google Research, Graph Mining team Clustering is a central problem in unsupervised machine learning (ML) with many applications across domains in both industry and academic research more broadly. At its core, clustering consists of the following problem: given a set of data elements, the goal is to partition the data elements into groups such that similar objects are in the same group, while dissimilar objects are in different groups. This problem has been studied in math, computer science, operations research and stati ..read more
Visit website
Google Research at I/O 2023
Google AI Blog
by Google AI
1w ago
Posted by James Manyika, SVP Google Research and Technology & Society, and Jeff Dean, Chief Scientist, Google DeepMind and Google Research Wednesday, May 10th was an exciting day for the Google Research community as we watched the results of months and years of our foundational and applied work get announced on the Google I/O stage. With the quick pace of announcements on stage, it can be difficult to convey the substantial effort and unique innovations that underlie the technologies we presented. So today, we’re excited to reveal more about the research efforts behind some of the many ex ..read more
Visit website
Resolving code review comments with ML
Google AI Blog
by Google AI
2w ago
Posted by Alexander Frömmgen, Staff Software Engineer, and Lera Kharatyan, Senior Software Engineer, Core Systems & Experiences Code-change reviews are a critical part of the software development process at scale, taking a significant amount of the code authors’ and the code reviewers’ time. As part of this process, the reviewer inspects the proposed code and asks the author for code changes through comments written in natural language. At Google, we see millions of reviewer comments per year, and authors require an average of ~60 minutes active shepherding time between sending changes fo ..read more
Visit website
Making ML models differentially private: Best practices and open challenges
Google AI Blog
by Google AI
2w ago
Posted by Natalia Ponomareva and Alex Kurakin, Staff Software Engineers, Google Research Large machine learning (ML) models are ubiquitous in modern applications: from spam filters to recommender systems and virtual assistants. These models achieve remarkable performance partially due to the abundance of available training data. However, these data can sometimes contain private information, including personal identifiable information, copyright material, etc. Therefore, protecting the privacy of the training data is critical to practical, applied ML. Differential Privacy (DP) is one of the m ..read more
Visit website

Follow Google AI Blog on Feedspot

Continue with Google
Continue with Apple
OR