BEST Prompt Format: Markdown, XML, or Raw? CONFIRMED on Llama 3.1 & Promptfoo

IndyDevDan

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 сен 2024

Комментарии • 39

@WouterLombard Месяц назад ⁺⁸
Just love your whole approach to AI and coding in general
@MegaClockworkDoc Месяц назад ⁺⁵
I've been curious about this topic. I really appreciate how you approached the evaluation. I would have liked to see an n of 5 for each example to limit errors related to model entropy.
@user-ie9co7ex6s Месяц назад ⁺⁴
Great comparison.
Something to consider is to break down the scores by model. Why?
To see if there are preferences of format by model.
E.g. we know that Anthropic likes XML and that format might be the best for their models. That does not mean that this holds true for other models.
@rcj1337 Месяц назад
True
@orthodox_gentleman Месяц назад ⁺³
Thanks for all your hard work! You do such a great job brother. Appreciate you very much.
@horrorislander Месяц назад ⁺¹⁰
Shouldn't it be possible to layer a deterministic MD-to-XML convertor in your prompting process? Then you, as a human, could still work in MD while your LLMs get the XML they crave.
@BTFranklin Месяц назад ⁺⁵
Absolutely possible, but not as easy as you'd think at first blush. For example, the XML tags you choose have information in them, telling them "what the thing is" that you're wrapping in the tag, whereas in markdown all you really have is "sections" and various types of divisions. I can say this as an experienced programmer who tried to create a Markdown-based parser for exactly this purpose. It's *way* harder to cleanly interpret semantic divisions when all you have to work with is stuff like blank lines.
@horrorislander Месяц назад ⁺¹
@@BTFranklin I don't think XML *has* to have more information, and for this particular test I assume it doesn't. If the XML prompts he's using do indeed provide more information than the markdown ones do, doesn't invalidate these results as a measure of format (and only format) effectiveness?
@rcj1337 Месяц назад ⁺¹
Incredible value, please more of this type of content
@pyrotecx123 Месяц назад ⁺⁴
Great content! Would have been nice to also compare YAML.
@Adrian_Galilea Месяц назад
YAML is nice for toying around but is an awful format once you start using it, make a google search "yaml sucks" and you'll see, I regret having adopted it in some projects.
@GraysonChalmers1 Месяц назад ⁺²
I’m with you! I started with YAML and then moved to some mix of that and TOML/XML. That would be fun to have a central leaderboard for prompt format performance tracking based on different metrics like here!
@BTFranklin Месяц назад ⁺¹
This is an excellent, detailed analysis. Highly appreciated, sir. Subbed.
@DarrenHinde 29 дней назад ⁺¹
Always great insights, need to give promptfoo a shot!
@andrewandreas5795 Месяц назад ⁺⁴
One of the best videos I have seen regarding all things LLMs. Do you think the results from 4o-mini replicate with 4o, 4-turbo and gpt4?
@user-eq9ls2xw1b Месяц назад ⁺¹
your videos always do real help, great work.
@thevirtualfront Месяц назад
What a great video and unexpectedly outcome, I’ve been using MD but am swapping to XML for complex persona instructions. Great video!
@PrincessKushana Месяц назад
Fascinating. I've been using raw with small json elements where strucutre was needed in autogen based flows. Works really well. Json does get brittle when there's too much of it though. I'm not shocked that the whole prompt in json wasn't great.
That being said, definitely going to try some xml.
@jonmichaelgalindo Месяц назад ⁺¹
XML is what I've been using since day 1. 😊
@newfrontiers5673 20 дней назад ⁺¹
I started using markdown but after looking over the anthropic workbench I started using xml. Havent looked back.
@Techonsapevole Месяц назад ⁺¹
Great tests, which open model 8B or 9B is the best with long context ? To my tests Gemma2 q4_k_m performs quite well
@IdPreferNot1 Месяц назад
Great setup. Please evaluate the Gemini Flash. Capabilities of these low cost workhorse models are the most important edge cases to understand.
@seb_balls 27 дней назад
On top of that, could be interesting to provide an xsd (xml schema definition) so that the response format is fully predictable.
@ManjaroBlack Месяц назад
This is what I’ve been looking to test myself. I suspected Markdown wasn’t performing well. I asked llama3.1 what it prefers, and it gave me XML.
@jmirodg7094 Месяц назад
Good use of markdown 2 XML converters so we can conveniently write the prompt in markdown then send it as XML to the LLM.
@HistorIAsImposibles776AC Месяц назад ⁺¹
Please, a basic video related with llms how to deploy, expected uses of local llms ... I think it will be interesting for creating a Small company's running by theamself
@MrJohndoe845 Месяц назад ⁺²
do you have this code on github? would love to play around with it myself
@techfren Месяц назад
Amazing content as always thank you.
@eintyp4389 Месяц назад ⁺³
When JSON is the worst performing format. Feels bad men. I will keep this in mind... never wouldh have guessed that it handels xml so well but then again most of the data is raw text and html wish looks like xml because of the tags so i see why llms wouldh be good at understanding and generating with it.
@tryingET Месяц назад
Have you tought about mixing xml tags into your markdown prompt? Like claude sonnet does in the prompt generator?
@ChrizzeeB Месяц назад
Really useful video!
@roccobooysen3611 Месяц назад
Dan, is there a way to get access to the files you used in this video? I dont have coding knowledge and am learning about prompt scripting. From the video and the files you ran it comes across as if you have a methodology to write your scripts that could help me with developing my own scripts following your examples.
@okasi 14 дней назад
Does it really matter with tab indentations and newlines when using XML tags? 🤔
@12wsaqw Месяц назад
In my testing of llama3.1 8b for instruction following I find it severely lacking compared with codestral. Llama3.1 8b was unable to return a simple yes or no response. It always included a fluffy explaining response (which was correct but not requested). YMMV.
@davidrobertson6371 Месяц назад ⁺⁵
There’s a couple things that you missed. To make this video actually useful, you need to experiment more.
- 1 you missed using yaml, it’s a dark horse and I’ve had stellar results with it. - 2 use something harder like tool calling
- 3 try instructions that are system prompt heavy
- 4 try prompts that put the Instructions as the very last thing the model sees
- Use the seed param
- use an automation that changes the temp by 0.1 for each call.
I have to say I’m a bit disappointed with the video, I mean I kind of get it, but I want to see these models tested on the bleeding edge of what they can do, I want to see it where your dialling in that last couple of percent of performance. They’re so much more powerful than the examples in the video.
@techfren Месяц назад
Do you share results in any other format
@VinCarbone Месяц назад
My approach its Xml for titles tags and inside i write in markdown.
It works and still its really human readeable
Full xml its not the best to read.
@bodyguardik Месяц назад
best prompt format is l337 sp33ch
@Brymcon Месяц назад
Markdown and xml hands down for reports. Markdown converted to vectors.

Следующие

Автовоспроизведение

Why fine-tune LLMs? GPT-4o fine-tune for PERFECT FLUX Image Prompts