Видео 111
Просмотров 4 360

Loss-to-Loss Prediction: Scaling Laws for All Datasets

15:19

Understanding LLM Embeddings for Regression

19:54

Star Attention: Efficient LLM Inference over Long Sequences

14:35

Attamba: Attending To Multi-Token States

15:06

InfAlign: Inference-aware language model alignment

13:20

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

31:27

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Large language models have demonstrated substantial advancements in reasoning capabilities, particularly through inference-time scaling, as illustrated by models such as OpenAI's o1. However, current Vision-Language Models (VLMs) often struggle to perform systematic and structured reasoning, especially when handling complex visual question-answering tasks. In this work, we introduce LLaVA-CoT, a novel VLM designed to conduct autonomous multistage reasoning. Unlike chain-of-thought prompting, LLaVA-CoT independently engages in sequential stages of summarization, visual interpretation, logical reasoning, and conclusion generation. This structured approach enables LLaVA-CoT to achieve marked...

Видео

Loss-to-Loss Prediction: Scaling Laws for All Datasets

15:19

Loss-to-Loss Prediction: Scaling Laws for All Datasets

Просмотров 619 часов назад

While scaling laws provide a reliable methodology for predicting train loss across compute scales for a single data distribution, less is known about how these predictions should change as we change the distribution. In this paper, we derive a strategy for predicting one loss from another and apply it to predict across different pre-training datasets and from pre-training data to downstream tas...

Understanding LLM Embeddings for Regression

19:54

Understanding LLM Embeddings for Regression

Просмотров 1119 часов назад

With the rise of large language models (LLMs) for flexibly processing information as strings, a natural application is regression, specifically by preprocessing string representations into LLM embeddings as downstream features for metric prediction. In this paper, we provide one of the first comprehensive investigations into embedding-based regression and demonstrate that LLM embeddings as feat...

Star Attention: Efficient LLM Inference over Long Sequences

14:35

Star Attention: Efficient LLM Inference over Long Sequences

Просмотров 520 часов назад

Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star Attention, a two-phase block-sparse approximation that improves computational efficiency by sharding attention across multiple hosts while minimizing communication overhead. In the first phase, the context is p...

Attamba: Attending To Multi-Token States

15:06

Attamba: Attending To Multi-Token States

Просмотров 620 часов назад

When predicting the next token in a sequence, vanilla transformers compute attention over all previous tokens, resulting in quadratic scaling of compute with sequence length. State-space models compress the entire sequence of tokens into a fixed-dimensional representation to improve efficiency, while other architectures achieve sub-quadratic complexity via low-rank projections or sparse attenti...

InfAlign: Inference-aware language model alignment

13:20

InfAlign: Inference-aware language model alignment

Просмотров 4День назад

Language model alignment has become a critical step in training modern generative language models. The goal of alignment is to finetune a reference model such that the win rate of a sample from the aligned model over a sample from the reference model is high, subject to a KL divergence constraint. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, ...

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

31:27

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Просмотров 19День назад

The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexplored. The medical domain, though distinct from mathematics, also demands robust reasoning to provide reliable answers, given the high standards of healthcare. However, verifying medical reasoning i...

An analytic theory of creativity in convolutional diffusion models

14:55

An analytic theory of creativity in convolutional diffusion models

Просмотров 9День назад

We obtain the first analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-based diffusion models can generate highly creative images that lie far from their training data. But optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we ident...

Finding Missed Code Size Optimizations in Compilers using LLMs

9:21

Finding Missed Code Size Optimizations in Compilers using LLMs

Просмотров 818 часов назад

Compilers are complex, and significant effort has been expended on testing them. Techniques such as random program generation and differential testing have proved highly effective and have uncovered thousands of bugs in production compilers. The majority of effort has been expended on validating that a compiler produces correct code for a given input, while less attention has been paid to ensur...

Predicting the Performance of Black-box LLMs through Self-Queries

10:31

Predicting the Performance of Black-box LLMs through Self-Queries

Просмотров 618 часов назад

As large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. While a great deal of work in the field uses internal representations to interpret model behavior, these representations are inaccessible when given solely black-box access through an API. In this paper, we extract features of LLMs in a black-box manner by using follow-up pro...

Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs

14:43

Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs

Просмотров 618 часов назад

Can LLMs pick up language structure from examples? Evidence in prior work seems to indicate yes, as pretrained models repeatedly demonstrate the ability to adapt to new language structures and vocabularies. However, this line of research typically considers languages that are present within common pretraining datasets, or otherwise share notable similarities with these seen languages. In contra...

Easing Optimization Paths: a Circuit Perspective

11:17

Easing Optimization Paths: a Circuit Perspective

Просмотров 1218 часов назад

Gradient descent is the method of choice for training large artificial intelligence systems. As these systems become larger, a better understanding of the mechanisms behind gradient training would allow us to alleviate compute costs and help steer these systems away from harmful behaviors. To that end, we suggest utilizing the circuit perspective brought forward by mechanistic interpretability....

Optimization Algorithm Design via Electric Circuits

11:14

Optimization Algorithm Design via Electric Circuits

Просмотров 2818 часов назад

We present a novel methodology for convex optimization algorithm design using ideas from electric RLC circuits. Given an optimization problem, the first stage of the methodology is to design an appropriate electric circuit whose continuous-time dynamics converge to the solution of the optimization problem at hand. Then, the second stage is an automated, computer-assisted discretization of the c...

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

11:46

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

Просмотров 1018 часов назад

Large language models (LLMs) have demonstrated their remarkable capacity across a variety of tasks. However, reasoning remains a challenge for LLMs. To improve LLMs' reasoning ability, process supervision has proven to be better than outcome supervision. In this work, we study using Monte Carlo Tree Search (MCTS) to generate process supervision data with LLMs themselves for training them. We sa...

Titans: Learning to Memorize at Test Time

17:19

Titans: Learning to Memorize at Test Time

Просмотров 1020 часов назад

Titans: Learning to Memorize at Test Time

Longhorn: State Space Models are Amortized Online Learners

18:36

Longhorn: State Space Models are Amortized Online Learners

Просмотров 2520 часов назад

Longhorn: State Space Models are Amortized Online Learners

9:02

Reforcement Learning: An Overview

Просмотров 620 часов назад

Reforcement Learning: An Overview

Gated Delta Networks: Improving Mamba2 with Delta Rule

21:11

Gated Delta Networks: Improving Mamba2 with Delta Rule

Просмотров 81День назад

Gated Delta Networks: Improving Mamba2 with Delta Rule

Better & Faster Large Language Models via Multi-token Prediction

14:43

Better & Faster Large Language Models via Multi-token Prediction

Просмотров 90День назад

Better & Faster Large Language Models via Multi-token Prediction

Multi-Head Latent Attention and Multi-token Prediction in Deepseek v3

20:58

Multi-Head Latent Attention and Multi-token Prediction in Deepseek v3

Просмотров 172День назад

Multi-Head Latent Attention and Multi-token Prediction in Deepseek v3

21:03

Grid Long Short-Term Memory

Просмотров 19День назад

Grid Long Short-Term Memory

A Survey on Large Language Model based Autonomous Agents

36:34

A Survey on Large Language Model based Autonomous Agents

Просмотров 98День назад

A Survey on Large Language Model based Autonomous Agents

19:11

deepseek v3 technical report

Просмотров 238День назад

deepseek v3 technical report

Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture

15:40

Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture

Просмотров 2014 дней назад

Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture

Shaping AI's Impact on Billions of Lives

29:14

Shaping AI's Impact on Billions of Lives

Просмотров 2114 дней назад

Shaping AI's Impact on Billions of Lives

Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases

18:04

Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases

Просмотров 7314 дней назад

Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases

Meta-Learning in Neural Networks: A Survey

24:03

Meta-Learning in Neural Networks: A Survey

Просмотров 1714 дней назад

Meta-Learning in Neural Networks: A Survey

15:31

a survey of RWKV 4-6

Просмотров 1214 дней назад

a survey of RWKV 4-6

RWKV-7: A Revolutionary Large Language Model Architecture

14:16

RWKV-7: A Revolutionary Large Language Model Architecture

Просмотров 4414 дней назад

RWKV-7: A Revolutionary Large Language Model Architecture

pi0 a vision-language-action flow model for general robot control

17:30

pi0 a vision-language-action flow model for general robot control

Просмотров 4014 дней назад

pi0 a vision-language-action flow model for general robot control

@diodin8587 18 часов назад
ai generated?
@lukaspfitscher8737 5 дней назад
its notebook llm haha
@JAIShankar-w8n 13 дней назад
AWESOME I WANT TO BE YOUR STUDENT, TEACH ME MORE WITH THIS KIND OF STUFFS. THANK YOU
@hailrider8188 14 дней назад
This format is much better than the default conversation provided from NotebookLM.
@mlliarm 17 дней назад
"AND" not "A&D" :) thanks for the long video though
@Milano606 26 дней назад
Hey super interesting video! Where did you source the audio from? What is the podcast name?
@1960nox 26 дней назад
Huh, the spoken text was made with NotebookLM, correct?. But well, nicely made to soften "dry" input. Thanks.
@TrueHelpTV Месяц назад
"think, and make judgements like we do" except we don't know how "We do" it ourselves which stands to argue that AI will never have human intelligence, because we will teach it to surpass that flawed system from the get go.
@Xiaoliu.x Месяц назад
This is a fascinating philosophical observation about artificial intelligence and human cognition. Let me break this down: 1. The paradox you're highlighting is quite profound: - We try to make AI "think like humans" - Yet we don't fully understand how human thinking works - We can't perfectly replicate something we don't fully comprehend 2. However, I'd suggest a slight reframing: - Rather than trying to replicate human intelligence exactly - We're creating a different kind of intelligence that can complement human capabilities - Like how airplanes don't fly exactly like birds, but achieve flight through different means 3. Regarding "surpassing a flawed system": - Human cognition isn't necessarily flawed - it's incredibly sophisticated - It evolved to be highly effective for our survival and development - But it does have limitations and biases - AI systems can potentially avoid some human cognitive limitations while having their own different constraints The key insight might be that we don't need to fully understand or replicate human intelligence to create useful AI systems. Just as we developed powered flight without fully replicating bird flight, we can develop AI systems that think effectively in their own way, complementing rather than copying human intelligence.
@zichenwang8068 Месяц назад
哈哈，好有意思，还有捧哏，跟说相声一样🤣
@bloodywolftr Месяц назад
I liked the format and the AI talk,so thanks for putting good content.I have tested Marco 01 model,it is good and fast.pretty much usable. Alibaba team is doing a good job on open source development. We are very lucky have these models released.Keep up the great content. Subbed to the channel.
@corgirun7892 Месяц назад
wow! is it a real conversations or AI-generated?

Xiaol.x

Видео

Комментарии