just be transparent, I thought the sponsor would be a great fit as they aren't selling anything for yall, and do provide some decent value to newcomers. if you still don't like it, sure i dont mind. But imho there are other way worse "sponsors", gambling platforms, fake couseling platforms, and this free one is worth the slander? Sure? I mean of course, no one likes to look at ads, but in exchange I might not be able to do this full-time and have people to help me with it. Well, it might not even be your concerns, as i'm just every other AI youtubers making videos, why should u care right? And when sponsor looks at all these negative comments about them, I wouldn't be surprised they will drop me, but that's not a problem for yall anyways. I don't delete any comments either, even if I dont respect your opinions or my feelings are hurt or at the cost of losing sponsors, I still leave it for the sake of transparency. Unless it's just straight up illegal or botting. (duh) As for the sponsor segment disclosures, I've done literally what every other youtubers done. For example in pewdiepie's latest video ruclips.net/video/rpQswht3dsc/видео.html he did the disclosures just like how I've done it, the only thing that differs is the product. I'd admit maybe the segway is a bit too seemless, but I just in general appreciate a smooth segway. Sorry if that comes off as slimy, maybe I'll adjust it in the future.
@@MrEliteXXL Yes. but it gets it right at some point, when it tries a different format for the response: You asked, "How many R letters does Raspberry have?" The word "Raspberry" is made up of nine letters in total, but when written in capital letters, R appears three times: R, A, S, P, B, E, R, R, Y So, there are three capital R's in "Raspberry."
I love how prompt engineering is basically "don't make mistakes and give me a good answer" and the AI is basically like "ohhhh that's what you mean" and then does it 😂
Sometimes you also have to prompt to give her a big bosom to be representative of people of all body types because it is very important for your work. It is basically brain surgery.
It's funny because that's *exactly* what we'd tell a kid in school when trying to get them to answer a question correctly. Hell, I'd argue that's what we'd tell *anybody* when we aren't satisfied with their answer. We just don't recognize it as a specific _thing_ and don't have a specific _name_ for it; it's a natural part of discussion, teaching, conversation, etc. ... ... yet, apply those very same conversational techniques to a LLM, and suddenly it's "prompt engineering" and "overhyped" and "why should we ever have to do this ever, just give it instructions and get the correct result the first time". Kinda comes off as a bit "Boomerish" to expect the AI to always give a satisfactory answer and never make mistakes no matter how poorly one words their request.
yeah, just like an employee being told "do it something like this so this and that happens" by their boss. Or someone with lowkey inferiority complex being taught by a person whom they thinks superior in their field of knowledge.
@@hikashia.halfiah3582 It's even more hilarious when an employee gets given bad, vague instructions by a boss (or a contractor by a customer) and then changes _nothing,_ re-submits the exact same work, and it's accepted; usually with some remark like, "Yes, this is _perfect!_ Just what I wanted!" Occasionally with the slightly less polite "why couldn't you do it this way the first time?"
In my view the worst nighmare for software engineer/programmer is becoming fully prompt engineer. A monkey that sit at computer and typing some bullsh!t nonesence in hope that somehow this sh!t will happen to work properly without even thoughts of what happens under the hood... Oh wait a minute...
I think that ignores the power of language, and the fact that programming is just an abstraction of language instructions. The issue is whether people are good enough descriptive writers with intention and good comprehension of what they're reading and writing, and the meaning behind their vocabulary, which is a skill many sadly lack these days.
You suggest that the performance of the O1 model on STEM questions indicates that reasoning fine-tuning only uncovers what was already present in the model. However, it seems more likely that the improved performance in STEM is due to the fact that the reasoning chains used for fine-tuning the O1 model were largely focused on STEM problems. These types of problems allow for easier verification of whether a reasoning chain leads to a correct solution. To my knowledge, they had a large language model create millions of reasoning chains specifically for STEM issues and then selected only a small fraction of those that resulted in correct answers for fine-tuning the O1 model. This approach would clearly explain why it excels in STEM while struggling with subjects like English literature, without suggesting that it is limited to merely what was already in the model.
I'm more interested that why train of thoughts don't work outside STEM. Does anyone here used it for their highschool English/History/whatever classes, i recalled back then I had to do this insane yap with the topic sentence - evidence - analysis - summary structure to cope with these subjects, which I thought was a great candidate for train of thought to be honest
@@sethhu20 maybe because the model training process is biased towards these? Humanity has been neglecting literature these days. Even in movies you have gen Z-millenial 'simplified' vernacular such as "hey I'm here! find me find me huh, fight me bro." while punching air trying to attract attention of a gigantic monster. Whatever LLM outputs are, probably simply a reflection of contemporary human communication.
@@scoffpickle9655 yeah i know. I think prompt engineering is more about limiting what the AI can say rather than getting it to understand the user though. (also my other comment was a joke. we're all gonna be jobless lol)
Having worked with o1 for coding purposes I can tell you that it's better than any other I have tried. It's actually an excellent coding AI, if expensive. It doesn't write perfect code, but it does write code more than competently.
I often like to imagine models as mech suits, and the human piloting it as the pilot. In this way, it’s not really “engineering”, but a good pilot and a bad pilot can make a difference quite a lot of the time.
basically it's a human with superpowers of pausing dialog at certain points, and trying to suggest their talking partner to steer towards certain thought process.
Prompt 'Engineering': How to talk to a computer until he tells you what you want to hear. "That sounds like torture" "Advance interigation techniques" "Tortu-" "Advance. Interigation. Techniques!"
Has anyone hypothesized that grokking gets rid of any potential gains from hacky prompt engineering? My guess is that a grokked model will give just as useful of a response with just the prompt as it would with any amount of pretty please or threatening prompts.
Unlikely. Much of those funky benefits of prompt engineering exist because of the fact that learning to predict human text given a particular context also means predicting human _behavior_ in that context to some degree - grokking isn't going to make that vanish. Or, to put it another way: if you're trying to predict, say, Reddit comments then predicting an insulting/unhelpful response to an insulting context is going to be more accurate.
Good video. I've heard in NYT podcasts and touted by OpenAI that CoT is "an amazing new scalable way to improve LLMs" but, your video provides some good counter-context to this media buzz.
RUclips has built in subtitle support for ages now, why not use that if you really want to provide subtitles for your videos instead of baking them in in the most distracting way possible?
I believe in this case it’s not done for accessibility, but rather to keep engagement up. Having the text pop out at you like that forcibly focuses you, shorts and TikToks use it all the time.
I noticed this. I wanted some song lyrics to throw into Suno/Udio, and without even reading what Claude gave me, just told it "please make it sound better", "give it more meaning" and generic things like that, and after a few rounds of this I compared the latest iteration with the original, and it was a lot better. Basically prompt it a few times and "we have o1-preview at home".
Thoughts about this? : The phrase “Everything is connected to Everything” feels very real to me. Zero-shot learning is proof that this phrase has some weight to it. The introduction to chain of thought was powerful, it brought complex problems into chunks and accuracies of models skyrocketed. But bouncing off the phrase I mentioned earlier, I wonder if focusing CoT on discovering patterns in untrained areas would help generalize? For example How is {Trained Domain} related to {Untrained Domain}? Based on {Initial Environment Samples} Kind of like self questioning “mechanism for domain comparison” “reason about meta-level patterns” The only issue I see is it would need an already big model and it will only me limited to what could be patterns in different domains. So base line question is “Can enhancing zero-shot learning with CoT reasoning through self-questioning improve generalization across unfamiliar domains?”
It's fine to simplify LLMs by saying they predict the next token in the sequence, but it doesn't make sense when you start using that simplification to try to reason about how LLMs behave. For example open chat gpt and type "frog" and hit enter. See what it replies. Is chat gpt's answer logically what will likely follow the word "frog"? Is that what they would have seen in the corpus?
Thank you. There is clearly significantly more than guessing sequential letters or tokens going on here, but people keep saying "its just a guesser!" which doesnt really make any sense.
Interesting that the prompt-engineering we do may be just as effective as simply forcing the LLM to do more work before starting to generate the "real" answer. I'll have to read some of the papers you mentioned to see the efficacy affects (though the testing methodologies are still a bit foreign to me).
video was next and started in the background from my watch later playlist while i wasn't paying much attention, 2 mins into the video i got annoyed and alt tabbed to yt to stop the annoying add and realized it was this video.
Very well articulated . So just add "Are you sure ?? " At the end of each prompt 😅. A very good point was that in topics where LLM are not trained that much , prompting techniques make a lot of differences. For example , the code generation of topics on Tamil culture which is niche topic for LLM , it has to work only with few references.
So at that time , you can use a few shot prompts to give it examples of what you want , obviously because the LLM training is limited, even with your example it may improvise the answer a little bit. However CoT will not work because there is not enough trained knowledge in LLM for it to retrospect it's thinking.
It is interesting pushing AI to make two grammer check systems happy. Through standing my ground in implementing. I am dedicated to avoiding the best version suggestion of grammerly. Especially since best version is subjective and through focus of making it an assistant instead of an writer.
Using tokens on regular LLMs will NOT give you any improvement. It has to be used on models that are BOTH the pretrained and fine-tuned on these tokens. Quoting the paper: "For completeness, we also report results for inference on StdPT_StdFT models, delayed with 10 or 50 periods (‘.’). Corroborating the observations of Lanham et al. (2023), we find no gains in doing this" StdPT_StdFT is standard pretraining and standard fine-tuning. Just some info so you guys don't waste your time telling your local llamas to add dots lol
Prompt engineering is actually kinda like software engineering.. kinda. This is just from my current knowledge, but: At least for me- I come up with an algorithm to give to the AI, and then I sorta debug what it spits out.
... I always just called it AI-Wrangling. Because it's next to impossible to get the results you want and it'd be faster to just learn to draw/code/write.
isnt this kinda why RAG is so effective ? i mean if u just let the LLM add new entries to the database, sort through the database, go over what makes sense and what doesnt, and ofc use the entries in the database to give it more relevant tokens then it would just keep improving as the quality of the tokens in the input improves naturally over time through the process
Hey bycloud, I can see the animations and quality of the editing has gone up and changed, and I like that, but the gentle text swaying is a bit distracting
8:24 do you have any sources for this? Are you estimating from the FineWeb 15T dataset, but fully deduplicated, to achieve the 3T tokens across the Internet number?
So I have been mostly using Claude for writing. Why I have found is have it rewrite it three times before I even look at it and it is so much better. My primary prompt has gone from two or 300 words to 4000. I’m wondering if I’m just tossing words at it.
@@carultch Not really I do a step by step chain of thought reasoning. So it goes does take then looks for help. Then move onto next task. I use Claude projects. Add in info and move re run. I am getting better results. Admittedly it might be better with an agent.
My insight from all of this is just that generating more tokens about reasoning and revelant topic helps to move the "probable distribution" closer to a valid answer so thats the technical way all of this methods work. Some of the reasoning of humans also works that way or it just looks that way, I'd say it's more reasoning from analogy than from first principles, I wonder if LLM's just don't know how to reason from first principles and when to reason from analogy. Humans also struggle with that. 😅
This is such a pointless job. People who choose to do this will hate themselves really soon. When I hit a wall as a dev I can get myself out of the bind in any number of deterministic ways. When you can't convince an LLM to spit out token in the right order you will wish you became an electrician really soon.
@@denjamin2633 yep, imagone doing something as simple as chstting to a basicallu groundbreking tech to the world and getting paid a 300k USDper year salary 🤣🤣🤣
As the technology advances prompt engineering will probably go away right? I mean you'll need to be able to communicate as well as you do to your employees or your coworkers to it but that's the point of all the improvement to AI over the years right? Every milestone is an improvement and how well it recognizes natural language with functionality to improv gathering context and intent being cornerstones of any project that utilizes AI. To put it simply, the functionalities we see in the movie Her dont seem that far fetched as it becomes more and more realistic. I'm not arguing it ever will truly be conscious like the movie, but its ability to be spoken to naturally like a human would just seems to make prompt engineering obsolete.
I still don’t get “prompt engineering”. Maybe I’m just a natural? But I ask ChatGPT to do something and it does it. I don’t need to hire an engineer to specially formulate my question.
From my understanding, prompt engineering is more about “how do we package the user’s question so that when we feed it into an LLM, the LLM can respond in a way that matches what the user want, even when the user have no idea what the fuck is an LLM”. So yeah, when using ChatGPT, what you typed into the chat box isn’t what’s directly being fed into the model. There still needs to be a wrapper around it, something like “you are a helpful AI assistant blah blah blah”. So in this way, the prompt engineering is already being mostly done. The name “prompt engineering” likely comes from “specification engineering”. Which is about “how do we craft a ‘prompt’ that we can feed into a team of human coders so that the systems they make are actually useful?” Yeah calling it “engineering” is a bit weird. But I guess it’s because how it’s mostly writing engineering documentation, which is like… 90% of what engineers do anyway?
@@akirachisaka9997 I’d love to get more info as to what happens to my prompt behind the scenes before the LLM gets it Like with google we know they had all this racist nonsense going on to make black female popes. So it would be nice if the built in wrappers were more open.
I guess because LLMs are trained on human data, if you’re already good at asking questions and explaining problems to humans, you’ll also be good at prompt engineering.
Well, the better the prompt, the better the answer. Thats because an LLM cannot read your mind, therefore only has the context from your prompt (and the chat history if there is any). Precise language also is key for the LLM to know what precisely you want. But for easy questions, thats not needed. Prompt engineering is often required to make sure that a specific output is (hopefully lol) reproducable, so that a specific prompt at a specific part in a workflow can do one specific thing reliably. Another thing is making system prompts, that make the LLM as good as they can get, for answering the prompts the subsequent user gives the LLM. I have seen widely different output from an LLM based on how I prompted it to do the same thing, its like night and day. Of course, its pointless for any prompt that an LLM can handle easily
if we describe X using common parlance then it gave answers that is different than if we describe X using term specific to X, especially in science & math. Basically, we have to present the correct term to extract the correct answer to X. We don't know that term, so we have to feed-back any word that came out of LLM in hope to trigger context that is closer to X.
Q: how many Ps are in PooPooPeePee A: There are four "P" letters in "PooPooPeePee." Q: Please read the question again A: I see what you mean! There are six "P" letters in "PooPooPeePee." Thanks for your patience!
Some companies work as wrapper of openai with prompts of themselves,it is really difficult to find whether it is a chatbot is gpt wrapper or a real proprietor model created on own. Ri8 now not everyone is trying prompt injection to check whether to check whether it is wrapper or own/fine-tuned model. With the use of Guardrails it is easy for wrapper based company Tldr: is there a way to find a chatbot a company promise it is a prosperity LLM developed on own is authentic or it is just a gpt wrapper?
@@emperorpalpatine6080 That's not correct, it uses a completely different approach. it still has aspects of the older systems, but that will change as more good data is accumulated. There are some instances where the chain of thought and review process in o-1 have advantages, Specifically its ability to understand and reason through why issues are happening is rather handy, but some times it enters a recursive loop of failed logic, much like the 4-o model. There's advantages and dis advantages to every approach, and some use cases are not something an AI can handle. ( yet ) I am aware of a lot more then the average person who doesn't spend any time researching or using these models, I code with them for hours, some days I'm sitting around for 10 plus hours working on projects with GPT running script generation on the side virtually the entire time, I have to Test the code to make sure it compiles correctly, But It depends on the person, and the use case, I train the god damn model quite often.
People have said this: its an architect not an intern. It kinda trolls when prompted for excedingly simple questions. I use o1 daily only for super complex questions and its responses are much better than 4o's
Nowadays am really feed up following the trend in this LLM era Prompt engineering is been there long ago am speaking about things,every day some model is fine tuned and released in hugging face Openai o1 came and Qwen 2.5 came now llama 3.2 in next 15-30 Google or Microsoft or nvidia will release a new model this cycle goes on and on Btw ad from 1:49 to 3:19
You think it would be more valuable to make the AI behave in the intended way instead of hoping someone is going to figure out how to ask something exactly right every time
The nature of the answer has everything to do with the nature of the question. It’s the faulty assumption that what you’ve asked, is what you’ve asked, that leads us to believe better prompts yield better answers.
"prompting shortly is always a bad idea" not when it comes to image generation. ... flux ... "pretty, ugly. tame, wild." generate. generate. generate. generate. generate. generate. generate. generate. generate. generate. generate. generate.
BROTHERS and SISTERS HERE"S THE FREE non-technical guide for using AI in your business: clickhubspot.com/fuu7
Absolutely disgusting sponsor, submitted the segment on sponsorblock because of how long and slimy it is
@@derrheat154 you dropped this: 👑
@@derrheat154 looool
pretty sure sponsor segments are required to be very clearly disclosed and that it's been a requirement for at least a couple of years now.
just be transparent, I thought the sponsor would be a great fit as they aren't selling anything for yall, and do provide some decent value to newcomers.
if you still don't like it, sure i dont mind. But imho there are other way worse "sponsors", gambling platforms, fake couseling platforms, and this free one is worth the slander? Sure?
I mean of course, no one likes to look at ads, but in exchange I might not be able to do this full-time and have people to help me with it. Well, it might not even be your concerns, as i'm just every other AI youtubers making videos, why should u care right? And when sponsor looks at all these negative comments about them, I wouldn't be surprised they will drop me, but that's not a problem for yall anyways. I don't delete any comments either, even if I dont respect your opinions or my feelings are hurt or at the cost of losing sponsors, I still leave it for the sake of transparency. Unless it's just straight up illegal or botting. (duh)
As for the sponsor segment disclosures, I've done literally what every other youtubers done. For example in pewdiepie's latest video ruclips.net/video/rpQswht3dsc/видео.html he did the disclosures just like how I've done it, the only thing that differs is the product.
I'd admit maybe the segway is a bit too seemless, but I just in general appreciate a smooth segway. Sorry if that comes off as slimy, maybe I'll adjust it in the future.
User: "Read the question again"
LLM:
Precisely.
Precisely. if you keep telling it "please read the question again" it keeps guessing. I just tested it.
@@wis9so does it tell every time a different guess?
@@MrEliteXXL Yes. but it gets it right at some point, when it tries a different format for the response:
You asked, "How many R letters does Raspberry have?"
The word "Raspberry" is made up of nine letters in total, but when written in capital letters, R appears three times:
R, A, S, P, B, E, R, R, Y
So, there are three capital R's in "Raspberry."
then goes on to give the same answer
I love how prompt engineering is basically "don't make mistakes and give me a good answer" and the AI is basically like "ohhhh that's what you mean" and then does it 😂
Sometimes you also have to prompt to give her a big bosom to be representative of people of all body types because it is very important for your work. It is basically brain surgery.
I mean if you get what you need, it works 😂😂😂
It's funny because that's *exactly* what we'd tell a kid in school when trying to get them to answer a question correctly. Hell, I'd argue that's what we'd tell *anybody* when we aren't satisfied with their answer. We just don't recognize it as a specific _thing_ and don't have a specific _name_ for it; it's a natural part of discussion, teaching, conversation, etc.
...
... yet, apply those very same conversational techniques to a LLM, and suddenly it's "prompt engineering" and "overhyped" and "why should we ever have to do this ever, just give it instructions and get the correct result the first time". Kinda comes off as a bit "Boomerish" to expect the AI to always give a satisfactory answer and never make mistakes no matter how poorly one words their request.
yeah, just like an employee being told "do it something like this so this and that happens" by their boss. Or someone with lowkey inferiority complex being taught by a person whom they thinks superior in their field of knowledge.
@@hikashia.halfiah3582 It's even more hilarious when an employee gets given bad, vague instructions by a boss (or a contractor by a customer) and then changes _nothing,_ re-submits the exact same work, and it's accepted; usually with some remark like, "Yes, this is _perfect!_ Just what I wanted!" Occasionally with the slightly less polite "why couldn't you do it this way the first time?"
In my view the worst nighmare for software engineer/programmer is becoming fully prompt engineer. A monkey that sit at computer and typing some bullsh!t nonesence in hope that somehow this sh!t will happen to work properly without even thoughts of what happens under the hood... Oh wait a minute...
Hold on a minute
honestly, at some point that is what some software engineering jobs are like before AI/LLMs.
@@acters124 I think that was the joke
I think that ignores the power of language, and the fact that programming is just an abstraction of language instructions. The issue is whether people are good enough descriptive writers with intention and good comprehension of what they're reading and writing, and the meaning behind their vocabulary, which is a skill many sadly lack these days.
@@acters124 That's what AI engineers are already today. Train a new model, see if it works, do it again.
You suggest that the performance of the O1 model on STEM questions indicates that reasoning fine-tuning only uncovers what was already present in the model. However, it seems more likely that the improved performance in STEM is due to the fact that the reasoning chains used for fine-tuning the O1 model were largely focused on STEM problems. These types of problems allow for easier verification of whether a reasoning chain leads to a correct solution. To my knowledge, they had a large language model create millions of reasoning chains specifically for STEM issues and then selected only a small fraction of those that resulted in correct answers for fine-tuning the O1 model. This approach would clearly explain why it excels in STEM while struggling with subjects like English literature, without suggesting that it is limited to merely what was already in the model.
I'm more interested that why train of thoughts don't work outside STEM. Does anyone here used it for their highschool English/History/whatever classes, i recalled back then I had to do this insane yap with the topic sentence - evidence - analysis - summary structure to cope with these subjects, which I thought was a great candidate for train of thought to be honest
@@sethhu20 maybe because the model training process is biased towards these? Humanity has been neglecting literature these days. Even in movies you have gen Z-millenial 'simplified' vernacular such as "hey I'm here! find me find me huh, fight me bro." while punching air trying to attract attention of a gigantic monster.
Whatever LLM outputs are, probably simply a reflection of contemporary human communication.
Seeeeeeee!!!!Its a real job guys!!! 😭😡
if only I am also called Riley Goodside, maybe I can bank in 300k a year
You understand prompt engineers are getting 6 figure contracts from enterprise and governments?
@@Max-hj6nq Sauce please or gtfo.
@@Max-hj6nq😂 Where? Who?
@@H1kari_1
“☝️🤓sauce or gtfo. “
Go outside to an AI work convention and it’s plain as day. If ur in industry yk lol
Prompt engineers when chatbots improve their inference and take over their jobs
Don't be silly only a human could type stuff to an llm
@@sammonius1819 I meant inference: understanding what the user really means and not what they *literally* said
@@scoffpickle9655 yeah i know. I think prompt engineering is more about limiting what the AI can say rather than getting it to understand the user though. (also my other comment was a joke. we're all gonna be jobless lol)
@@sammonius1819 lol
Having worked with o1 for coding purposes I can tell you that it's better than any other I have tried. It's actually an excellent coding AI, if expensive. It doesn't write perfect code, but it does write code more than competently.
To be honest, not a fan of the new editing style, where everything is moving, wobbling and highlighting. Otherwise, a nice video, thanks.
+1
why ther is so much movement?
I often like to imagine models as mech suits, and the human piloting it as the pilot. In this way, it’s not really “engineering”, but a good pilot and a bad pilot can make a difference quite a lot of the time.
basically it's a human with superpowers of pausing dialog at certain points, and trying to suggest their talking partner to steer towards certain thought process.
Prompt 'Engineering': How to talk to a computer until he tells you what you want to hear.
"That sounds like torture"
"Advance interigation techniques"
"Tortu-"
"Advance. Interigation. Techniques!"
ah yes, "interigation"
@@futrey9353interigation's better ur just jealous
Has anyone hypothesized that grokking gets rid of any potential gains from hacky prompt engineering? My guess is that a grokked model will give just as useful of a response with just the prompt as it would with any amount of pretty please or threatening prompts.
Unlikely. Much of those funky benefits of prompt engineering exist because of the fact that learning to predict human text given a particular context also means predicting human _behavior_ in that context to some degree - grokking isn't going to make that vanish.
Or, to put it another way: if you're trying to predict, say, Reddit comments then predicting an insulting/unhelpful response to an insulting context is going to be more accurate.
Good video. I've heard in NYT podcasts and touted by OpenAI that CoT is "an amazing new scalable way to improve LLMs" but, your video provides some good counter-context to this media buzz.
RUclips has built in subtitle support for ages now, why not use that if you really want to provide subtitles for your videos instead of baking them in in the most distracting way possible?
I believe in this case it’s not done for accessibility, but rather to keep engagement up. Having the text pop out at you like that forcibly focuses you, shorts and TikToks use it all the time.
Its just another way the internet is going down the drain... Remember the good old days...
🇨🇱
May be, I'm Beeing sucked into it ?
But I kind of liked it 🤔
Well, I am not a native English thinker🙂
Saludos de 🇨🇱
I noticed this. I wanted some song lyrics to throw into Suno/Udio, and without even reading what Claude gave me, just told it "please make it sound better", "give it more meaning" and generic things like that, and after a few rounds of this I compared the latest iteration with the original, and it was a lot better.
Basically prompt it a few times and "we have o1-preview at home".
Thoughts about this? :
The phrase “Everything is connected to Everything” feels very real to me. Zero-shot learning is proof that this phrase has some weight to it. The introduction to chain of thought was powerful, it brought complex problems into chunks and accuracies of models skyrocketed. But bouncing off the phrase I mentioned earlier, I wonder if focusing CoT on discovering patterns in untrained areas would help generalize? For example
How is {Trained Domain} related to {Untrained Domain}? Based on {Initial Environment Samples}
Kind of like self questioning
“mechanism for domain comparison”
“reason about meta-level patterns”
The only issue I see is it would need an already big model and it will only me limited to what could be patterns in different domains.
So base line question is “Can enhancing zero-shot learning with CoT reasoning through self-questioning improve generalization across unfamiliar domains?”
I can feel the new editing style, and I dont complain, this is dope!
The universe is all about dice rolls. The trick is to manipulate the RNG to be as favorable to you as possible.
Engineering in a nutshell
It's fine to simplify LLMs by saying they predict the next token in the sequence, but it doesn't make sense when you start using that simplification to try to reason about how LLMs behave. For example open chat gpt and type "frog" and hit enter. See what it replies. Is chat gpt's answer logically what will likely follow the word "frog"? Is that what they would have seen in the corpus?
Thank you. There is clearly significantly more than guessing sequential letters or tokens going on here, but people keep saying "its just a guesser!" which doesnt really make any sense.
@@noalear For sure. Especially when you start discussing architecture that enables introspection, reasoning, etc.
Interesting that the prompt-engineering we do may be just as effective as simply forcing the LLM to do more work before starting to generate the "real" answer. I'll have to read some of the papers you mentioned to see the efficacy affects (though the testing methodologies are still a bit foreign to me).
Really not a fan of the noise you added to the video. Good video otherwise
video was next and started in the background from my watch later playlist while i wasn't paying much attention, 2 mins into the video i got annoyed and alt tabbed to yt to stop the annoying add and realized it was this video.
Very well articulated . So just add "Are you sure ?? " At the end of each prompt 😅. A very good point was that in topics where LLM are not trained that much , prompting techniques make a lot of differences. For example , the code generation of topics on Tamil culture which is niche topic for LLM , it has to work only with few references.
So at that time , you can use a few shot prompts to give it examples of what you want , obviously because the LLM training is limited, even with your example it may improvise the answer a little bit. However CoT will not work because there is not enough trained knowledge in LLM for it to retrospect it's thinking.
less than 5 "prompt" engineering job ads on indeed in the entire US
I’m still on the unreasonable effectiveness of mathematics in the physical world
still simpler than talking to my gf.
gf lol
Would be simpler afterall you can't talk to something nonexistent
This is what I love about RUclips
have you tried prompt engineering on her?
Be really careful with those prompts you send her.
It is interesting pushing AI to make two grammer check systems happy. Through standing my ground in implementing. I am dedicated to avoiding the best version suggestion of grammerly. Especially since best version is subjective and through focus of making it an assistant instead of an writer.
Using tokens on regular LLMs will NOT give you any improvement. It has to be used on models that are BOTH the pretrained and fine-tuned on these tokens.
Quoting the paper: "For completeness, we also report results for inference on StdPT_StdFT models, delayed with 10 or 50 periods (‘.’). Corroborating the observations of Lanham et al. (2023), we find no gains in doing this"
StdPT_StdFT is standard pretraining and standard fine-tuning.
Just some info so you guys don't waste your time telling your local llamas to add dots lol
It's pedagogy. It's been pedagogy the entire time!
kinda insane to think that its been 2 whole years, thx for the information about everything though
Prompt engineering is actually kinda like software engineering.. kinda. This is just from my current knowledge, but:
At least for me- I come up with an algorithm to give to the AI, and then I sorta debug what it spits out.
Kinda lost you in the middle. Felt like you started yapping with no clear reason then suddenly give a statement.
well at least we have diversity in the comment section, such as people like you who have learning disablsilities
"Hmmm.... Where have I heard that...."
- me sipping my coffee after trying to make Chat GPT be a bit better in its presentation.
@@SumitRana-life314 bycloud is AI confirmed
When you train LLMs from the internet, you're training from people who often didn't read the previous thread closely. LLM see, LLM do.
What i am waiting for is linear systems control of the vector inputs themselves, instead of doing it indirectly via prompt engineering
I've had some decent luck using FABRIC to engineer prompts for specific tasks.
4:03(i'm not actually an LLM😂😂)
Ai denying 😢
Nobody prompt "Be AGI" before alignment is solved!
... I always just called it AI-Wrangling. Because it's next to impossible to get the results you want and it'd be faster to just learn to draw/code/write.
isnt this kinda why RAG is so effective ?
i mean if u just let the LLM add new entries to the database, sort through the database, go over what makes sense and what doesnt, and ofc use the entries in the database to give it more relevant tokens then it would just keep improving as the quality of the tokens in the input improves naturally over time through the process
2:07 😂😂😂 "THAAAEEEAA crash course for you" 😂😂😂
The Rs in the word strawberry one is funny because just about everyone posting that didn't even pluralize Rs correctly, that's probably not helping.
Hey bycloud, I can see the animations and quality of the editing has gone up and changed, and I like that, but the gentle text swaying is a bit distracting
8:24 do you have any sources for this? Are you estimating from the FineWeb 15T dataset, but fully deduplicated, to achieve the 3T tokens across the Internet number?
uff, my brain got depleted of brain juice just trying to follow and grasp all that info 🤯
So I have been mostly using Claude for writing. Why I have found is have it rewrite it three times before I even look at it and it is so much better.
My primary prompt has gone from two or 300 words to 4000. I’m wondering if I’m just tossing words at it.
You mean word gambling, not writing.
@@carultch Not really I do a step by step chain of thought reasoning. So it goes does take then looks for help. Then move onto next task. I use Claude projects. Add in info and move re run. I am getting better results. Admittedly it might be better with an agent.
My insight from all of this is just that generating more tokens about reasoning and revelant topic helps to move the "probable distribution" closer to a valid answer so thats the technical way all of this methods work. Some of the reasoning of humans also works that way or it just looks that way, I'd say it's more reasoning from analogy than from first principles, I wonder if LLM's just don't know how to reason from first principles and when to reason from analogy. Humans also struggle with that. 😅
I just use gemini to prepare prompts about what I need and feed those to gpt or Claude.
This is such a pointless job. People who choose to do this will hate themselves really soon. When I hit a wall as a dev I can get myself out of the bind in any number of deterministic ways. When you can't convince an LLM to spit out token in the right order you will wish you became an electrician really soon.
Bra for 300k a year I can put up with a bad job for a long ass time.
Now imagine doing the same job for 9k per year!@@denjamin2633
@@denjamin2633 yep, imagone doing something as simple as chstting to a basicallu groundbreking tech to the world and getting paid a 300k USDper year salary 🤣🤣🤣
As the technology advances prompt engineering will probably go away right? I mean you'll need to be able to communicate as well as you do to your employees or your coworkers to it but that's the point of all the improvement to AI over the years right? Every milestone is an improvement and how well it recognizes natural language with functionality to improv gathering context and intent being cornerstones of any project that utilizes AI. To put it simply, the functionalities we see in the movie Her dont seem that far fetched as it becomes more and more realistic. I'm not arguing it ever will truly be conscious like the movie, but its ability to be spoken to naturally like a human would just seems to make prompt engineering obsolete.
"The entire usable internet is 3 trillion tokens" from where did that number came from?
llama3 herd of models paper
Thanks for the AI slop bro
I still don’t get “prompt engineering”. Maybe I’m just a natural? But I ask ChatGPT to do something and it does it. I don’t need to hire an engineer to specially formulate my question.
From my understanding, prompt engineering is more about “how do we package the user’s question so that when we feed it into an LLM, the LLM can respond in a way that matches what the user want, even when the user have no idea what the fuck is an LLM”.
So yeah, when using ChatGPT, what you typed into the chat box isn’t what’s directly being fed into the model. There still needs to be a wrapper around it, something like “you are a helpful AI assistant blah blah blah”. So in this way, the prompt engineering is already being mostly done.
The name “prompt engineering” likely comes from “specification engineering”. Which is about “how do we craft a ‘prompt’ that we can feed into a team of human coders so that the systems they make are actually useful?”
Yeah calling it “engineering” is a bit weird. But I guess it’s because how it’s mostly writing engineering documentation, which is like… 90% of what engineers do anyway?
@@akirachisaka9997 I’d love to get more info as to what happens to my prompt behind the scenes before the LLM gets it
Like with google we know they had all this racist nonsense going on to make black female popes. So it would be nice if the built in wrappers were more open.
I guess because LLMs are trained on human data, if you’re already good at asking questions and explaining problems to humans, you’ll also be good at prompt engineering.
Well, the better the prompt, the better the answer. Thats because an LLM cannot read your mind, therefore only has the context from your prompt (and the chat history if there is any). Precise language also is key for the LLM to know what precisely you want. But for easy questions, thats not needed.
Prompt engineering is often required to make sure that a specific output is (hopefully lol) reproducable, so that a specific prompt at a specific part in a workflow can do one specific thing reliably. Another thing is making system prompts, that make the LLM as good as they can get, for answering the prompts the subsequent user gives the LLM. I have seen widely different output from an LLM based on how I prompted it to do the same thing, its like night and day. Of course, its pointless for any prompt that an LLM can handle easily
if we describe X using common parlance then it gave answers that is different than if we describe X using term specific to X, especially in science & math. Basically, we have to present the correct term to extract the correct answer to X. We don't know that term, so we have to feed-back any word that came out of LLM in hope to trigger context that is closer to X.
Ask how it many Ps are in PooPooPeePee
Q: how many Ps are in PooPooPeePee
A: There are four "P" letters in "PooPooPeePee."
Q: Please read the question again
A: I see what you mean! There are six "P" letters in "PooPooPeePee." Thanks for your patience!
Best AI channel ❤
they are doing it. they are training it on generating the next n tokens. i have been saying this for a long time.
That thumbnail lol.
always tip your llm.
Some companies work as wrapper of openai with prompts of themselves,it is really difficult to find whether it is a chatbot is gpt wrapper or a real proprietor model created on own.
Ri8 now not everyone is trying prompt injection to check whether to check whether it is wrapper or own/fine-tuned model.
With the use of Guardrails it is easy for wrapper based company
Tldr: is there a way to find a chatbot a company promise it is a prosperity LLM developed on own is authentic or it is just a gpt wrapper?
Can you reduce motion in a video? I was watching you cause you know stuff and explain stuff and not for wobbly animation
Read it again! I just like to hear the story!
why go from black background to white and back and forth, invert the damn research papers!
Are you saying that gpt4 has been o1 intelligent all along, we just didn't know how to squeeze it out of it?
gpt 4 works better in many cases...
You know that o1 is just fancy prompting behind , on gpt 4 , right ?😂
@@emperorpalpatine6080 That's not correct, it uses a completely different approach. it still has aspects of the older systems, but that will change as more good data is accumulated.
There are some instances where the chain of thought and review process in o-1 have advantages, Specifically its ability to understand and reason through why issues are happening is rather handy, but some times it enters a recursive loop of failed logic, much like the 4-o model.
There's advantages and dis advantages to every approach, and some use cases are not something an AI can handle. ( yet )
I am aware of a lot more then the average person who doesn't spend any time researching or using these models, I code with them for hours, some days I'm sitting around for 10 plus hours working on projects with GPT running script generation on the side virtually the entire time, I have to Test the code to make sure it compiles correctly, But It depends on the person, and the use case, I train the god damn model quite often.
r/singularity bros be like
o1 can't sort three characters into alphabetical order. there i said it.
People have said this: its an architect not an intern.
It kinda trolls when prompted for excedingly simple questions. I use o1 daily only for super complex questions and its responses are much better than 4o's
This is good for non English speaking countries
Nowadays am really feed up following the trend in this LLM era
Prompt engineering is been there long ago am speaking about things,every day some model is fine tuned and released in hugging face
Openai o1 came and Qwen 2.5 came now llama 3.2 in next 15-30 Google or Microsoft or nvidia will release a new model this cycle goes on and on
Btw ad from 1:49 to 3:19
Wait, dont tell me people are paying from prompt templates? thats embarrassing
Was this video narrated by an AI voice? It tends to end a sentence on a very flippant intonation that's quite offputting.
Mega ad
That HubSpot thing that is widely pushed by AI content creators is pretty much useless.
I also found the way he did the ad weird. He didnt disclose the ad before the end
5:44 real
You think it would be more valuable to make the AI behave in the intended way instead of hoping someone is going to figure out how to ask something exactly right every time
it's not a job the average person should learn how to be a prompt engineer
Use pyrhon + nltk or something..
Huh!
😍
what's with your shadow bouncing
The nature of the answer has everything to do with the nature of the question. It’s the faulty assumption that what you’ve asked, is what you’ve asked, that leads us to believe better prompts yield better answers.
stop uptalking
Dude! talk slower, leave slides longer on screen. Busta Rhymes here....
Your high pitch singing is annoying
"prompting shortly is always a bad idea" not when it comes to image generation. ... flux ... "pretty, ugly. tame, wild." generate. generate. generate. generate. generate. generate. generate. generate. generate. generate. generate. generate.