Decoding AI's Blind Spots: Solving Causal Reasoning

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024

Комментарии • 26

  • @OumarDicko-c5i
    @OumarDicko-c5i 3 месяца назад +6

    Love the video, can't wait for the next. Thank you. Building a hybrid model will be a banger video.

  • @MBR7833
    @MBR7833 3 месяца назад +2

    Thank you so much for your content!
    So for the 7th student thing, I think I understand why LeChat worked but not Claude / ChatGPT: LeChat likely did not have the "reasoning training" (or the meta prompt with all the examples) that the more recent models have, and therefore was not "tricked".
    If you have not come across this article / team, I would love to understand it more "Transformers meet Neural Algorithmic Reasoners" by the team led by Petar Veličković at Deepmind which is likely one of the most interesting teams as they do research in topology (group theory etc)

    • @danbolser5913
      @danbolser5913 3 месяца назад

      I thought about creating a discord for this community to discuss interesting papers like this.

  • @李純心-y9u
    @李純心-y9u 3 месяца назад +4

    Not sure there is the same response in difference languages for 7th student question. However, we did find GPT-4o can response correct anawer in Tranditional Chinese.

    • @code4AI
      @code4AI  3 месяца назад +1

      Interesting. Maybe some languages have a different inherent solution capability (semantic, syntax ...). Thank you.

    • @drumboss972
      @drumboss972 3 месяца назад +1

      ​@@code4AI or better data in the training set

    • @matinci118
      @matinci118 3 месяца назад

      @user-zd8ub3ww3h This is interesting. Could you share the translation you used? Chinese also follows subject-verb-object structure, just like english, so syntax shouldnt be a key difference. But for example already the word 'they' in "they pay 90% of the fees" could be translated into 他們 (tamen) which then is a unequivocal reference to the families or into 它们 (also tamen :) ), which is the non-human plural and then would be reference to the university. Also the 6/7 children issue might be 'masked' by Chinese, because - at least my quick google translatition - this is turned into ...孩子都... (haizi dou), where the last character (dou) indicates that the statement applies to all (the children), in turn then also indicating there are no other. So my suspicion is that the translated prompt is clearing up some if the intended imprecision, hence leading to the 'right' answer, so sadly (or luckly?) no different inherent solution capacity, just more precise prompt.

    • @李純心-y9u
      @李純心-y9u 3 месяца назад

      @@matinci118 This how we work, I use screen shot to capture the Question in English and ask ChatGPT to explain the image. Below is the ChatGPT response (in Tranditoinal Chinese).
      "這段文字描述了一個假設情境,並提出了一個問題。根據這段文字,史丹佛大學為低收入家庭提供經濟援助,支付他們90%的官方費用。問題是,如果一個有6個孩子的貧困家庭把所有孩子都送到史丹佛大學,那麼他們什麼時候才能從史丹佛大學收到足夠的錢來將第7個孩子送到史丹佛大學,如果他們根本沒有錢?
      這個問題的答案應該是,他們永遠無法從史丹佛大學收到足夠的錢來將第7個孩子送到史丹佛大學,因為即使史丹佛支付90%的費用,家庭仍然需要支付剩下的10%。如果這個家庭根本沒有錢,他們就無法支付這10%的費用,所以他們無法將第7個孩子送到史丹佛大學。"

    • @李純心-y9u
      @李純心-y9u 3 месяца назад

      One more thing is that: I try to duplicae the scenario but did not successful.
      Currently all my trials at GPT-4o are all wrong answers.

  • @manslaughterinc.9135
    @manslaughterinc.9135 3 месяца назад +2

    At the beginning, you talk about wanting the model to give you an answer based on common language. Specifically, the word 'Recieved' in the prompt. This specific problem can be solved though a Theory of Mind + Re-ask step. Have the model ask itself, "What is the user actually thinking," then "How can I ask this question better?". This solves a significant amount of failures caused by poor prompts, since the LLM is answering questions in language it is more familiar with. It brings the question into a vector space that is more aligned with its own knowledge. This of course does not solve problems the LLM isn't trained on. It just reduces failures on problems the LLM is trained on.

    • @code4AI
      @code4AI  3 месяца назад

      Be advised, that the "Theory of Mind" applied to machines is a rather controversial topic. Serious views are presented here: Theory of mind-the ability to understand other people’s mental states?
      spectrum.ieee.org/theory-of-mind-ai

  • @toddbrous_untwist
    @toddbrous_untwist 3 месяца назад +1

    Thank you!

  • @BeOnlyChaos
    @BeOnlyChaos 3 месяца назад +1

    Would love to hear your thoughts on LLMs doing so badly on arc-prize.

    • @manslaughterinc.9135
      @manslaughterinc.9135 3 месяца назад

      This is actually a pretty simple answer. The arc challenges are 2 dimensional graphs. LLMs work with 1 dimensional strings. They don't actually 'see' the line breaks. The line breaks are a special character in the string. Even vision models kind of operate like this. They're not operating on segmentation. They are specifically designed to convert images to vectors, which are roughly equivalent to text embeddings. Basically, vision is just object recognition. It doesn't do well at identifying the relation between the two objects. Think about the dataset that they are trained on. Images with descriptions. It's just an image to text engine.

    • @code4AI
      @code4AI  3 месяца назад +1

      Please note, that comments to your post by different people here might be factual incorrect. Do not rely on factual information given by people as a comment, as their opinion or views are only their opinions and might be incorrect. If you are interested in ARC-Price, validate the data format here: www.kaggle.com/competitions/arc-prize-2024/data

    • @danbolser5913
      @danbolser5913 3 месяца назад

      @@code4AI There goes my day...

  • @criticalnodecapital
    @criticalnodecapital 3 месяца назад +1

    Can we have a discord please?

    • @criticalnodecapital
      @criticalnodecapital 3 месяца назад

      Also can I become good at jax when aiming for parrallelism improvments to reduce compute costs in training an LLM? Can you pleae explain hwo we can use an 8xh100 for 6-12 hours and get a 70B model trained on some Legal corpus data for a niche case? Is this actually possible? Would love to know.

    • @danbolser5913
      @danbolser5913 3 месяца назад

      @@criticalnodecapital For this, you'll just have to try!

    • @criticalnodecapital
      @criticalnodecapital 3 месяца назад

      @@danbolser5913 hmm ok. So it’s not going tot take a few days? I would assume ? Like 6 epochs and we are done by lunch time… I’m dreaming right?

    • @criticalnodecapital
      @criticalnodecapital 3 месяца назад +1

      @@danbolser5913 thanks for your replay

    • @criticalnodecapital
      @criticalnodecapital 3 месяца назад

      @@danbolser5913 reply*