Hindsight PRIORs for Reward Learning from Human Preferences
Apple Machine Learning Research
by
10h ago
Preference based Reinforcement Learning (PbRL) has shown great promise in learning from human preference binary feedback on agent's trajectory behaviors, where one of the major goals is to reduce the number of queried human feedback. While the binary labels are a direct comment on the goodness of a trajectory behavior, there is still a need for resolving credit assignment especially in limited feedback. We propose our work, PRIor On Rewards (PRIOR) that learns a forward dynamics world model to approximate apriori selective attention over states which serves as a means to perform credit ..read more
Visit website
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Apple Machine Learning Research
by
1w ago
*Equal Contributors Contrastive pretraining of image-text foundation models, such as CLIP, demonstrated excellent zero-shot performance and improved robustness on a wide range of downstream tasks. However, these models utilize large transformer-based encoders with significant memory and latency overhead which pose challenges for deployment on mobile devices. In this work, we introduce MobileCLIP -- a new family of efficient image-text models optimized for runtime performance along with a novel and efficient training approach, namely multi-modal reinforced training. The proposed training ..read more
Visit website
Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Apple Machine Learning Research
by
2w ago
Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms. Hence, increasing the learning capacity of such streaming models (i.e., by adding more parameters) to improve the predictive power may not be viable for real-world tasks. In this work, we propose a new loss, Streaming Anchor Loss (SAL), to better utilize the given learning capacity by encouraging the model to learn more from essential frames. More specifically, our SAL and its focal variations dynamically modulate the frame-wise cross entropy ..read more
Visit website
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Apple Machine Learning Research
by
2w ago
..read more
Visit website
A Multi-signal Large Language Model for Device-directed Speech Detection
Apple Machine Learning Research
by
3w ago
We present an architecture for device-directed speech detection that treats the task as a text-generation problem. We use a multi-modal fusion approach that combines acoustic information from the recorded audio waveform with text and confidence information obtained from an automatic speech recognition system. The audio waveform is represented as a sequence of continuous embeddings by an audio encoder and presented as a prefix token to a pretrained large language model (LLM). We demonstrate that using multi-modal information within LLMs yields equal error rate improvements over text-only and ..read more
Visit website
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Apple Machine Learning Research
by
1M ago
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple ..read more
Visit website
MotionPrint: Ready-to-Use, Device-Agnostic, and Location-Invariant Motion Activity Models
Apple Machine Learning Research
by
1M ago
Wearable sensors have permeated into people's lives, ushering impactful applications in interactive systems and activity recognition. However, practitioners face significant obstacles when dealing with sensing heterogeneities, requiring custom models for different platforms. In this paper, we conduct a comprehensive evaluation of the generalizability of motion models across sensor locations. Our analysis highlights this challenge and identifies key on-body locations for building location-invariant models that can be integrated on any device. For this, we introduce the largest multi-location ..read more
Visit website
Corpus Synthesis for Zero-shot ASR Domain Adaptation using Large Language Models
Apple Machine Learning Research
by
1M ago
While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data is usually not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a novel data synthesis pipeline that uses a Large Language Model (LLM) to generate a target domain text corpus, and a state-of-the-art controllable speech ..read more
Visit website
Randomized Algorithms for Precise Measurement of Differentially-private, Personalized
Apple Machine Learning Research
by
1M ago
This paper was accepted at The 5th AAAI Workshop on Privacy-Preserving Artificial Intelligence. Personalized recommendations form an important part of today's internet ecosystem, helping artists and creators to reach interested users, and helping users to discover new and engaging content. However, many users today are skeptical of platforms that personalize recommendations, in part due to historically careless treatment of personal data and data privacy. Now, businesses that rely on personalized recommendations are entering a new paradigm, where many of their systems must be overhauled to be ..read more
Visit website
Vision-Based Hand Gesture Customization from a Single Demonstration
Apple Machine Learning Research
by
1M ago
Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization requires efficient usage of user-provided data. We introduce a method that enables users to easily design bespoke gestures with a monocular camera from one demonstration. We employ transformers and ..read more
Visit website

Follow Apple Machine Learning Research on FeedSpot

Continue with Google
Continue with Apple
OR