I would be curious to see tested a variation of the NL-to-Format in a single generation instead of two subsequent ones. Meaning: reply to this (thinking step by step, and all other instructions). Then, include a JSON version of the response in the following format: { ... }. From what I'm seeing, it seems to improve the overall reasoning quality while keeping JSON for parsing in industrial applications. It would be nice to have it formally tested to benchmark it properly
my understanding is that gemini 1.5 pro and gpt-4o/4 were specially trained on constrained structured output (cso).. furthermore, gemini flash doesn't support json mode.. when I tested openai structured output with chatgpt 3.5, it didn't work... i haven't tested claude json support enough to comment.. so the paper results don't apply in the latest state of the art cso like gpt-4o and gemini 1.5 pro.. I agree with the conclusion of the paper in the case of the low end models... it's a result anyone who did a bit of ai application using these models have witnessed thank you for the informative video
Thanks for sharing your experience. Definitely need to be constantly monitoring performance for this specifically. The benchmarks are also not representative of all real world tasks and they mention that in the discussion section.
I would be curious to see tested a variation of the NL-to-Format in a single generation instead of two subsequent ones. Meaning: reply to this (thinking step by step, and all other instructions). Then, include a JSON version of the response in the following format: { ... }.
From what I'm seeing, it seems to improve the overall reasoning quality while keeping JSON for parsing in industrial applications. It would be nice to have it formally tested to benchmark it properly
my understanding is that gemini 1.5 pro and gpt-4o/4 were specially trained on constrained structured output (cso).. furthermore, gemini flash doesn't support json mode.. when I tested openai structured output with chatgpt 3.5, it didn't work... i haven't tested claude json support enough to comment.. so the paper results don't apply in the latest state of the art cso like gpt-4o and gemini 1.5 pro.. I agree with the conclusion of the paper in the case of the low end models... it's a result anyone who did a bit of ai application using these models have witnessed
thank you for the informative video
Thanks for sharing your experience. Definitely need to be constantly monitoring performance for this specifically. The benchmarks are also not representative of all real world tasks and they mention that in the discussion section.
You choose papers well.
I try based on the audience interest. Hard to choose sometimes with so many papers coming out every day.
Many thanks!
of course the json restricted prompt performed worse as it removed chain-of-thought