Pathscopes: Inspect Hidden Representation of Neural Networks!
MLOps Newsletter
by Bugra Akyildiz
1d ago
Articles Pathscopes is a new framework from Google for inspecting the hidden representations of language models. Language models, such as BERT and GPT-3, have become increasingly powerful and widely used in natural language processing tasks. However, the inner workings of these models are often opaque, making it challenging to understand how they process and represent language. PatchScopes aims to address this challenge by providing a comprehensive and standardized approach to analyzing the hidden representations of language models. Understanding Hidden Representations The hidden representa ..read more
Visit website
Llama3 is out and it is awesome!
MLOps Newsletter
by Bugra Akyildiz
1w ago
This week’s newsletter is completely focused on Llama and its ecosystem as Llama3 was released last week! Articles Llama3 is out and available for public consumption in two different sizes(8B and 70B). The model architecture delta is in the following: Llama 3 uses a tokenizer with a vocabulary of 128K tokens that encodes language more efficiently Grouped query attention (GQA) is adopted across both the 8B and 70B sizes. Sequences of 8,192 tokens are used through using a mask to ensure self-attention does not cross document boundaries. Training data delta is in the following: Llama ..read more
Visit website
Pinterest's Text to SQL system through LLMs!
MLOps Newsletter
by Bugra Akyildiz
2w ago
Generative Recommenders We have recently open-sourced our next generation recommender system: Generative Recommender, check it out, give stars and fork if you are interested in contributing! The paper that accompanies code is also here. If you have questions/comments, please send it to my way as well! Now, back to the original programming: Articles Pinterest wrote a blog post on generating SQL queries from text. A core part of this process involves writing SQL queries to analyze data and solve analytical problems. However, the Pinterest team identified that translating analytical questions ..read more
Visit website
DSPy through a RAG System
MLOps Newsletter
by Bugra Akyildiz
3w ago
Articles Databricks wrote an article on how DSPy can be used to build AI systems specifically Retrieval Augmented Generation(RAG). I have covered DSPy in multiple previous newsletters; one, second and third. The last one is the following: If we go back to the main article; DSPy uses programming instead of prompting to achieve better results. It accomplishes this by allowing users to define the components of their AI systems. This is done by composing layers, for this application; a retrieval layer and a generation layer. These layers are then used together to create a program that can answ ..read more
Visit website
Pinterest introduces LinkSage, Google combines Neural Networks with Bayesian theory
MLOps Newsletter
by Bugra Akyildiz
1M ago
Articles Pinterest wrote an article on LinkSage that allows them to do offline content understanding by taking the following problem to solve: Challenges of Understanding Off-Site Content: Understanding off-site content is challenging because Pinterest doesn't have direct control over the content or the way it is structured. This makes it difficult to use traditional techniques like natural language processing (NLP) to understand the content. LinkSage addresses this challenge through: Using a GNN to learn a representation of off-site content that can be used for tasks such as ranki ..read more
Visit website
Mamba ands DSPy explained!
MLOps Newsletter
by Bugra Akyildiz
1M ago
Articles We covered Mamba as an introduction in one of the previous newsletter: This article from Gradient expands a lot more about the advantages and technical details with comparison to Transformers: Limitations of Transformers(now it should be obvious to everyone as talked in the previous newsletter!): Quadratic Bottleneck: Transformers use an attention mechanism that allows every token to look at every other token in the sequence, leading to a quadratic increase in training time complexity for long sequences. Memory Bottleneck: Storing information about all previous tokens requires ..read more
Visit website
X.ai releases Grok-1!
MLOps Newsletter
by Bugra Akyildiz
1M ago
Articles X.ai released the Grok’s first model and its weights in a very short blog post. Model is Jax based and it is available in GitHub, it uses mixture of experts model and it has a Transformer based architecture. Eagle 7B model is available as open source and this is an excellent and very efficient model that builds on top of RWKV, but what is RWKV? RWKV (pronounced as RWaKuV) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). RWKV is an Open Source, non profit group, under the linux foundation. Supported by our s ..read more
Visit website
Representation Engineering for Control Vector
MLOps Newsletter
by Bugra Akyildiz
1M ago
Articles Vgel wrote a blog post on the representation engineering, focusing on the control vector in LLMs. If you are interested and want to learn about AI safety and how to customize an already trained LLM, this post goes over couple of different ways of doing so. Control vector is a vector (technically a list of vectors, one per layer) that you can apply to model activations during inference to control the model's behavior without additional prompting. This is conceptually very powerful as it can customize the behavior of the model in inference time without changing the tra ..read more
Visit website
Google open-sources Gemma(2B, 7B parameter models)
MLOps Newsletter
by Bugra Akyildiz
2M ago
Articles Google launches their new model series called Gemma. After disastrous rollout of Gemini, they wanted to also open-source smaller models for community to try out and adopting for their GCP(Google Cloud Platform) more when it comes to try out the models. This is an excellent strategy for providing different options to the community, but also have solutions for enterprise and end user like ChatGPT. Main features of the Gemma model family are in the following: They release two models with two different sizes: Gemma 2B and Gemma 7B. Each size is released with pre-trained and inst ..read more
Visit website
Small Language Models(SLM): Phi-2!
MLOps Newsletter
by Bugra Akyildiz
2M ago
Articles Microsoft Research introduces Phi-2, a surprisingly powerful small language model (SLM) with only 2.7 billion parameters. Despite its size, Phi-2 surpasses much larger models on various benchmarks, highlighting the potential of SLMs. Key Differences: Superior Performance: Phi-2 outperforms larger models (7B and 13B parameters) on language reasoning and coding tasks, even without alignment or fine-tuning. Efficiency: Its smaller size requires less computational resources, making it more accessible and practical for diverse applications. Customizability: SLMs like Phi-2 are eas ..read more
Visit website

Follow MLOps Newsletter on FeedSpot

Continue with Google
Continue with Apple
OR