- Видео 111
- Просмотров 4 360
Xiaol.x
Гонконг
Добавлен 22 май 2010
X: x.com/xiaolGo
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Large language models have demonstrated substantial advancements in reasoning capabilities, particularly through inference-time scaling, as illustrated by models such as OpenAI's o1. However, current Vision-Language Models (VLMs) often struggle to perform systematic and structured reasoning, especially when handling complex visual question-answering tasks. In this work, we introduce LLaVA-CoT, a novel VLM designed to conduct autonomous multistage reasoning. Unlike chain-of-thought prompting, LLaVA-CoT independently engages in sequential stages of summarization, visual interpretation, logical reasoning, and conclusion generation. This structured approach enables LLaVA-CoT to achieve marked...
Просмотров: 8
Видео
Loss-to-Loss Prediction: Scaling Laws for All Datasets
Просмотров 619 часов назад
While scaling laws provide a reliable methodology for predicting train loss across compute scales for a single data distribution, less is known about how these predictions should change as we change the distribution. In this paper, we derive a strategy for predicting one loss from another and apply it to predict across different pre-training datasets and from pre-training data to downstream tas...
Understanding LLM Embeddings for Regression
Просмотров 1119 часов назад
With the rise of large language models (LLMs) for flexibly processing information as strings, a natural application is regression, specifically by preprocessing string representations into LLM embeddings as downstream features for metric prediction. In this paper, we provide one of the first comprehensive investigations into embedding-based regression and demonstrate that LLM embeddings as feat...
Star Attention: Efficient LLM Inference over Long Sequences
Просмотров 520 часов назад
Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star Attention, a two-phase block-sparse approximation that improves computational efficiency by sharding attention across multiple hosts while minimizing communication overhead. In the first phase, the context is p...
Attamba: Attending To Multi-Token States
Просмотров 620 часов назад
When predicting the next token in a sequence, vanilla transformers compute attention over all previous tokens, resulting in quadratic scaling of compute with sequence length. State-space models compress the entire sequence of tokens into a fixed-dimensional representation to improve efficiency, while other architectures achieve sub-quadratic complexity via low-rank projections or sparse attenti...
InfAlign: Inference-aware language model alignment
Просмотров 4День назад
Language model alignment has become a critical step in training modern generative language models. The goal of alignment is to finetune a reference model such that the win rate of a sample from the aligned model over a sample from the reference model is high, subject to a KL divergence constraint. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, ...
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Просмотров 19День назад
The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexplored. The medical domain, though distinct from mathematics, also demands robust reasoning to provide reliable answers, given the high standards of healthcare. However, verifying medical reasoning i...
An analytic theory of creativity in convolutional diffusion models
Просмотров 9День назад
We obtain the first analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-based diffusion models can generate highly creative images that lie far from their training data. But optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we ident...
Finding Missed Code Size Optimizations in Compilers using LLMs
Просмотров 818 часов назад
Compilers are complex, and significant effort has been expended on testing them. Techniques such as random program generation and differential testing have proved highly effective and have uncovered thousands of bugs in production compilers. The majority of effort has been expended on validating that a compiler produces correct code for a given input, while less attention has been paid to ensur...
Predicting the Performance of Black-box LLMs through Self-Queries
Просмотров 618 часов назад
As large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. While a great deal of work in the field uses internal representations to interpret model behavior, these representations are inaccessible when given solely black-box access through an API. In this paper, we extract features of LLMs in a black-box manner by using follow-up pro...
Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
Просмотров 618 часов назад
Can LLMs pick up language structure from examples? Evidence in prior work seems to indicate yes, as pretrained models repeatedly demonstrate the ability to adapt to new language structures and vocabularies. However, this line of research typically considers languages that are present within common pretraining datasets, or otherwise share notable similarities with these seen languages. In contra...
Easing Optimization Paths: a Circuit Perspective
Просмотров 1218 часов назад
Gradient descent is the method of choice for training large artificial intelligence systems. As these systems become larger, a better understanding of the mechanisms behind gradient training would allow us to alleviate compute costs and help steer these systems away from harmful behaviors. To that end, we suggest utilizing the circuit perspective brought forward by mechanistic interpretability....
Optimization Algorithm Design via Electric Circuits
Просмотров 2818 часов назад
We present a novel methodology for convex optimization algorithm design using ideas from electric RLC circuits. Given an optimization problem, the first stage of the methodology is to design an appropriate electric circuit whose continuous-time dynamics converge to the solution of the optimization problem at hand. Then, the second stage is an automated, computer-assisted discretization of the c...
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Просмотров 1018 часов назад
Large language models (LLMs) have demonstrated their remarkable capacity across a variety of tasks. However, reasoning remains a challenge for LLMs. To improve LLMs' reasoning ability, process supervision has proven to be better than outcome supervision. In this work, we study using Monte Carlo Tree Search (MCTS) to generate process supervision data with LLMs themselves for training them. We sa...
Titans: Learning to Memorize at Test Time
Просмотров 1020 часов назад
Titans: Learning to Memorize at Test Time
Longhorn: State Space Models are Amortized Online Learners
Просмотров 2520 часов назад
Longhorn: State Space Models are Amortized Online Learners
Gated Delta Networks: Improving Mamba2 with Delta Rule
Просмотров 81День назад
Gated Delta Networks: Improving Mamba2 with Delta Rule
Better & Faster Large Language Models via Multi-token Prediction
Просмотров 90День назад
Better & Faster Large Language Models via Multi-token Prediction
Multi-Head Latent Attention and Multi-token Prediction in Deepseek v3
Просмотров 172День назад
Multi-Head Latent Attention and Multi-token Prediction in Deepseek v3
A Survey on Large Language Model based Autonomous Agents
Просмотров 98День назад
A Survey on Large Language Model based Autonomous Agents
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture
Просмотров 2014 дней назад
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture
Shaping AI's Impact on Billions of Lives
Просмотров 2114 дней назад
Shaping AI's Impact on Billions of Lives
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases
Просмотров 7314 дней назад
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases
Meta-Learning in Neural Networks: A Survey
Просмотров 1714 дней назад
Meta-Learning in Neural Networks: A Survey
RWKV-7: A Revolutionary Large Language Model Architecture
Просмотров 4414 дней назад
RWKV-7: A Revolutionary Large Language Model Architecture
pi0 a vision-language-action flow model for general robot control
Просмотров 4014 дней назад
pi0 a vision-language-action flow model for general robot control
ai generated?
its notebook llm haha
AWESOME I WANT TO BE YOUR STUDENT, TEACH ME MORE WITH THIS KIND OF STUFFS. THANK YOU
This format is much better than the default conversation provided from NotebookLM.
"AND" not "A&D" :) thanks for the long video though
Hey super interesting video! Where did you source the audio from? What is the podcast name?
Huh, the spoken text was made with NotebookLM, correct?. But well, nicely made to soften "dry" input. Thanks.
"think, and make judgements like we do" except we don't know how "We do" it ourselves which stands to argue that AI will never have human intelligence, because we will teach it to surpass that flawed system from the get go.
This is a fascinating philosophical observation about artificial intelligence and human cognition. Let me break this down: 1. The paradox you're highlighting is quite profound: - We try to make AI "think like humans" - Yet we don't fully understand how human thinking works - We can't perfectly replicate something we don't fully comprehend 2. However, I'd suggest a slight reframing: - Rather than trying to replicate human intelligence exactly - We're creating a different kind of intelligence that can complement human capabilities - Like how airplanes don't fly exactly like birds, but achieve flight through different means 3. Regarding "surpassing a flawed system": - Human cognition isn't necessarily flawed - it's incredibly sophisticated - It evolved to be highly effective for our survival and development - But it does have limitations and biases - AI systems can potentially avoid some human cognitive limitations while having their own different constraints The key insight might be that we don't need to fully understand or replicate human intelligence to create useful AI systems. Just as we developed powered flight without fully replicating bird flight, we can develop AI systems that think effectively in their own way, complementing rather than copying human intelligence.
哈哈,好有意思,还有捧哏,跟说相声一样🤣
I liked the format and the AI talk,so thanks for putting good content.I have tested Marco 01 model,it is good and fast.pretty much usable. Alibaba team is doing a good job on open source development. We are very lucky have these models released.Keep up the great content. Subbed to the channel.
wow! is it a real conversations or AI-generated?