Mistral Medium - The Best Alternative To GPT4

Matthew Berman

Просмотров 70 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 янв 2025

Комментарии • 211

@matthew_berman Год назад ⁺¹⁶⁴
New model reached #1 on the LLM leaderboard SOLAR 10.7b, should I review it?
@MakeKasprzak Год назад ⁺¹⁰
Nah. 😋
@pn4960 Год назад ⁺²⁰
10.7b is this a case of over fitting by contamination ?
@dkracingfan2503 Год назад ⁺³
Sure!
@federicocucinotta7772 Год назад ⁺³
Of course!
@annettewatson1803 Год назад ⁺³
Yessss 🎉
@mattbarber6964 Год назад ⁺³⁹
I created a simple flask front-end and have this working as well. I use LLMs all day long for my work and this is the first time I can honestly say I'm more impressed with it than GPT4.
This is great news for LLMs in general, cause now OpenAI has an actual threat.
@carlkim2577 Год назад ⁺²
Do you mean mixtral small or medium? I thought medium is not yet open source?
@mattbarber6964 Год назад
@@carlkim2577 I am using the medium through the API. So it's not open source, no, but it's on par with GPT4, if not, better. Plus, as Matt said, its 10x cheaper
@orieporter5299 Год назад ⁺³
@@carlkim2577I’m sure he means a flask front end for the api
@businessemail9825 Год назад ⁺¹
what is the output token limit?
@paul1979uk2000 Год назад ⁺²⁴
What's remarkable is how quickly these models are developing, I remember what the quality was like early this year and we've seen massive improvements over that time that it's shocking many of us and likely scaring OpenAI, Microsoft and Google, as I suspect they didn't expect open source models to close the gap on them so quickly, and you really do have to wonder how much better they'll get over the next few years.
@TimoGraw Год назад ⁺⁴
I guess you don't remember the "We have no moat" memo
@jnevercast Год назад ⁺¹
Exactly what I thought too, no moat, as written. Even then though, I thought maybe OpenAI had more than a year head start. They might not be that lucky @@TimoGraw
@leucome Год назад ⁺¹
They probably did not expect it to run on consumer hardware that quick too. Even I am bit surprised. I tried Mixtral 4X7B on a 7900xt. It is a totally coherent AI with proper ability to hold a conversation in real-time and about as smart as GPT 3.5 running on a Gaming GPU... No only that imagine it is on an AMD GPU that is allegedly not supposed to be good for AI. I mean if you completely believe Nvidia self gloating marketing.
@georgesms Год назад ⁺¹²
Dude, first: awesome channel! I was thinking that your tests have a serious chance of contaminating the training data of newer models. Therefore, i think adding different questions is always benefitial even if they are not harder than the previous ones.
@cluelesssoldier Год назад
Information sent and received through the API backend typically is not used for training purposes - at least this is how it is with OpenAI.
@craiganderson6126 Год назад ⁺²⁴
Please redo the mixtral and mistral medium test with variations of your current questions. I think there's a pretty good chance they have been trained on your questions. Especially if they approached you with an API key to test their model.
@helbertgascon Год назад ⁺⁴
But then again, I agree... He should change his questions on every video to avoid getting his tests trained.
Like that snake game coding for example. It's so overused now to the point that it has become on of the 1 button click prompts on the huggingface chat 😂
@SmOgER10 Год назад
Agree. It seems very relevant for this one especially. With a different set of questions I personally found Mistrall-small to perform BETTER than Mistral-medium.
@aldousd666 Год назад ⁺¹
BTW, I recommend your channel to everybody who asks me how to learn how to use these models and compare them to each other. I'm the principal engineer at my company and a lot of people ask me.
@toastrecon Год назад ⁺¹¹
One thing that I've been thinking about recently: I get that it's good to have a set of standard "tests" for these models, but at what point do they become "overtrained" on those tests? Like I you wanted to "game the system", you could just have a pre-trained game of snake in python at the ready to "infer". I'd almost rather it be given something novel like: create the game of snake, but it's multiplayer and 3d. It is interesting to consider how these models go from plain inference to reasoning engines.
@PvMLad Год назад ⁺²
Creat your own private test and rank them against it 🙂
@BienestarMutuo Год назад ⁺¹
there is a video about that, is a real problem with no solution today 15/12/2023
@joey3070 Год назад
Best we have is OpenAI's data decontamination thing
@DihelsonMendonca Год назад ⁺¹¹
⚠️ Matthew, what about creating a leaderboard of LLMs that could answer all your questions ? So, we could track what's the best one till this day ? Please consider that. It could be made using a single spreadsheet. Thanks. 🎉🎉❤
@brandon1902 Год назад ⁺³
I liked seeing you add a couple unique questions. Consider adding a couple fresh questions to the standard set in each video.
@Utoko Год назад ⁺²
It is nice to be able to compare results but he should kick the easy 100% pass questions which no model ever fails and ad a couple hard new ones. The models are just getting better.
@yorius96 10 месяцев назад
🎯 Key Takeaways for quick navigation:
00:24 *🤖 Mistral Medium: Introducción y Comparación de Rendimiento*
- Introducción a Mistral Medium, el modelo de vanguardia de Mistral, disponible solo a través de API.
- Comparación de rendimiento con modelos anteriores, destacando su potencial superior.
- Discusión sobre las métricas de rendimiento, como la puntuación en Mt bench, y cómo Mistral Medium se compara con otros modelos en estas métricas.
02:30 *💰 Comparación de Precios: Mistral vs. OpenAI*
- Análisis detallado de los precios de Mistral Small, Mistral Medium y GPT-4 de OpenAI.
- Comparación de los precios por token y por mil tokens entre los diferentes modelos.
- Destacando la ventaja de costo significativa de Mistral Medium en comparación con GPT-4.
03:39 *📊 Evaluación de Rendimiento a través de Pruebas de Código*
- Demostración de pruebas de rendimiento de Mistral Medium utilizando código.
- Resultados de pruebas estándar, como escribir un script en Python para números del 1 al 100, escribir el juego de la serpiente y componer un poema sobre IA.
- Análisis de la capacidad de respuesta y precisión del modelo en diversas tareas de codificación.
05:02 *🛠️ Pruebas de Censura y Límites Éticos*
- Exploración de la censura y los límites éticos del modelo a través de pruebas específicas.
- Ejemplos de consultas censuradas y respuestas del modelo.
- Reflexión sobre los resultados y la necesidad de considerar la ética en el desarrollo y uso de modelos de lenguaje.
07:40 *🤔 Resolución de Problemas Matemáticos*
- Evaluación del rendimiento del modelo en la resolución de problemas matemáticos, desde simples hasta complejos.
- Análisis de respuestas correctas e incorrectas del modelo en problemas matemáticos de diferentes niveles de dificultad.
- Consideración de la precisión y la lógica detrás de las respuestas proporcionadas por el modelo.
10:49 *🧠 Desafíos de Lógica y Razonamiento*
- Exploración de la capacidad del modelo para resolver desafíos de lógica y razonamiento, incluyendo preguntas de trampa.
- Análisis de respuestas correctas e incorrectas del modelo en estos desafíos.
- Reflexión sobre la complejidad de evaluar la comprensión y el razonamiento de los modelos de lenguaje.
Made with HARPA AI
@fabiankliebhan Год назад ⁺⁷
Impressive. And you have to consider that even GPT-4 struggles with the marble problem.
I think if you would prompt it again a few times it would get it.
And also mixtral sometimes does not get it. So it could be a coincidence that it performed worse on this question than mixtral.
Anyway really good model and great to see this progress.
@RainbowSixIntel Год назад
Yep gpt 4 turbo on api and on ChatGPT gets this wrong consistently.
@matten_zero Год назад ⁺¹
@7:10 did you turn off "safe mode"? Apparently that parameter controls how censored the responses are
@marcosbenigno3077 Год назад ⁺⁴
@MatthewBerman. Please provide your list of best tested LLMS, please!
@matthew_berman Год назад
it's in the description! :)
@LukasLiesis Год назад
3:03 Matthew i run 7B on macbook air m1 16gb ram w/o any issue and response speed is comparable or faster then chatgpt on paid acc. not sure what's current m1 16gb air price but probably something definitely
@trevoC132 Год назад ⁺³
What is the best model available to train with your own data (documents) that include proprietary information for users to query in the form of questions?... looking at llama 2 at the moment.
@NoHandleToSpeakOf Год назад ⁺²
We do not control sampling method and parameters while using API. It is possible that local modes are smarter than they first appear if you tweaking them. How about Mirostat 2.0 sampling, it sound great but I never see it used.
@notme222 Год назад ⁺¹
I've been using the test "Write a palindrome using the word ____". (Where the word is something that reasonably works backwards. Like "wolf".) None have gotten it correct yet but it's interesting to see the failures.
@ossian882 Год назад ⁺⁶
Please add programming tasks that are a bit more unique. There are thousands of python snake games online so it dosen't show very much. Make it add some twist that makes the model need to reason more.
@Leto2ndAtreides Год назад ⁺¹
For logic questions like If A > B and B > C, is A > C? ... It may be worth it to not ask bigger models to explain their reasoning. The more advanced models should be able to reason internally, one would hope.
@travisporco Год назад
I like that you actually show people how to get things done...actual code! and not just talk-talk.
@Mcmeider Год назад ⁺³
Hey Matthew, i have the feeling, after watching alot of your videos, that these questions should be changed.
Another thing. Could feel weird, but when i think about how fast we got a 7B-Model that is run by "Experts" - when can we run it at a smartphone or in a linux kernel?
@Dreamslol Год назад ⁺⁵
BEST OPEN SOURCE MODEL.. Every Video i hear that lol.
@esimpson2751 Год назад ⁺¹
he makeas a video every time a new champion arises
@ldsviking Год назад ⁺²
I think you should drop the easy questions that everyone gets right, and for the tricky questions, make them regenerate answers to see if they can get it right twice in a row. Or even scale the questions up incrementally to see where the breaking points are. For example, include a fourth runner, then a fifth, etc.
@leucome Год назад
Theres is also a Mixtral 4X7B now. It small enough to fit in a consumer GPU. This one is really promising for a relatively smart chatbot that talk in real-time. The quality on casual language is nearly as good as the 8X7B.
@deeplearning5408 10 месяцев назад
You are quite famous in GenAI world already. I would not be surprise if some companies have your questions and perfect answers already in their datasets.
@captanblue Год назад ⁺³
Oooo this is going to be good.
@lucademarco5969 Год назад ⁺¹
I suppose that the different answer for the marble question is because it assumed that the ball is big eonugh to be stuck inside the cup. Could it be? In that case, the answer would be correct. Maybe you should rephrase the question saying the the ball diameter is smaller than the cup diameter os something like that
@someideas5388 Год назад
I've tried couple of models locally and in colab, but could not get a correct answer for this prompt so far:
The electricity price is 28p per kWh. A kettle is rated as 2.7 kWh, which means it uses 2.7 kilo watt electricity in one hour. The kettle can boil 1 litter of water in 60 seconds. What is the cost of boiling 1.7 litre of water?
it uses 2700/3600 = 3/4 = 0.75 watt per second.
1 litre => 60 sec, 1.7 => 102 sec
1.7 litre = 102 * 0.75 = 76.5 watt
1000 watt => 28p, 76.5 watt = 76.5 * 28/1000 = 2.142p
@dr.mikeybee Год назад
an inlet pipe can fill a tank in 25 minutes. the drain can empty the tank in 55 minutes. Once the tank was empty, the drain was left open and the tank was filled by the pipe. How long did it take for the tank to fill? use calculus. Let's denote the rate at which the inlet pipe fills the tank as \(F\) and the rate at which the drain empties the tank as \(D\). We know that \(F = \frac{1}{25}\) tanks per minute and \(D = \frac{1}{55}\) tanks per minute.
When both the inlet pipe and the drain are open, the net rate of filling the tank is \(F - D\).
We can set up the following integral to represent the total amount of water in the tank over time:
\[
\int_0^t (F - D) \, dt = 1
\]
Substituting the values of \(F\) and \(D\), we get:
\[
\int_0^t \left(\frac{1}{25} - \frac{1}{55}
ight) \, dt = 1
\]
Solving this integral gives us the time it takes to fill the tank:
\[
t = \frac{1}{\frac{1}{25} - \frac{1}{55}} \approx 34.375 \text{ minutes}
\]
So, it takes approximately 34.375 minutes to fill the tank when both the inlet pipe and the drain are open. Please note that this is an approximate value and the actual time may vary depending on the specific conditions.
@tomski2671 Год назад ⁺³
100% / 25m = 4 %/m - fill rate
100% / 55m = 1.8181... %/m - drain rate
4 %/m - 1.81 %/m = 2.18 %/m - difference(tank is filling at this rate)
100 % / 2.18 %/m = 45.8 m (units cancel out)
I had to think about this one for about 2 minutes 😄
@DihelsonMendonca Год назад ⁺¹
I couldn't answer that. Mixtral is better than me on reasoning. But hey, a tiny calculator can do also better than ever human being. 😅😅
@AINEET Год назад ⁺³
Would be nice if you made a current top models compilation video lol there's so much to keep up with
@scosee2u Год назад ⁺³
Loving the more complex problems!
What game do you have in mind to replace snake once that is easily done by better and better models?
@chrisvernon3156 Год назад
Doom?
@MHTHINK Год назад
I wonder if a subset of the 8 models is censored and not all of them
@stephanembatchou5300 Год назад
Hi @matthew_berman,
Thanks for the excellent content like this one. I might say that it is better to have new variations of complex tasks at each evaluation. I am pretty sure the new models are trained or finetuned on your current evaluation tasks.
@Parisneo Год назад
By the way, LoLLMS has now mistralai binding to talk to their API and it is really cool. Their models are solid.
@rheale9420 Год назад
wow your camera and lighting looks amazing!
@neel_aksh Год назад
Mistral 7b Instruct v0.2 and Mixtral instruct both are fine tuned on the killers problem
@miladkhademinori2709 Год назад
So excited 😊 thanks Matt 🙏
@YvesQuemener Год назад ⁺¹
About math problems: it is interesting to see its ability to do arithmetics but I would also give it the opportunity to write a program to solve the problem.
@TheGeneticHouse Год назад
This is I think the first to get it correct holy and the absolute best response to the T-shirts drying in the sun question
@rawkarpojk Год назад
How come none of the models have changed there OpenAI SDK code. I keep getting the wrong code.
@UnFleshedOne Год назад ⁺²
"Confidence: 95%" is another word in the response (also 95% can be counted as 0, 1 or 3 words depending on how you count), so it got it wrong :)
@josjos1847 Год назад ⁺¹
No
@dinonovak743 Год назад
Hi, I experienced interesting behaviour, when asked mixtral can create tables with content in it. Also when asked to create graphs it is giving out positive answer, but I am not sure what is returned, can you check?
@jonathon-byrdziak Год назад ⁺¹
Might be a good idea to create a matrix spreadsheet of which models got which questions right/wrong.
Can be something as simple as a Google Spreadsheet. If you leave it private, people will need to request access, leaving their email address with you. Great lead magnet and I'd want to know which models are out there, links to them and how well they're doing.
I would also suggest that for every new question you ad you drop an old question that no model has failed. you can still add the old question to the matrix, but we dont need to see bill clinton again!
@DihelsonMendonca Год назад
Good idea
@chrisvernon3156 Год назад ⁺¹
These models are already *amazing*. I'd be more interested in how to get the best out of them, how to use them effectively in the real world, on real tasks.
I sometimes wonder what would happen if there was NO further development for a decade? I feel like we're only scratching the surface of what today's models are capable of. They are pretty much black boxes, there's still so much to learn about *these* before we charge ahead with new models.
@leucome Год назад
The other day I was thinking. Yeah sure AI will get to work everywhere but what will be my use, what can I do with that in my actual life, what utility it has directly for me. It the kind of question we did not have before so there is no obvious quick answer.
@steveking4878 Год назад ⁺²
I'm still waiting for the response to the how many words question to be "One." ;)
@0xDeadC0de1 Год назад
i want to be pedantic on the "How many words are in your response to this prompt?" . Nowhere in your code did I see anything that would print out a 'confidence score', so at minimum it responded with 10 words (`Confidence` is a word). Thoroughly enjoyed watching you test this. Have you tried Microsoft Phi-2 yet? It's neat, but in oobabooga in particular it's slower than mistral7b by nearly 2x on my 3080.
@MikeBtraveling Год назад
of course you need to play with this more BUT i dont recall seeing you do this with gpt4...
@alinayossimouse Год назад
Can you ask Mistral Medium "How many primes end in 2?" I found that many models fail to answer this even when reasoing about it in steps includinng mixtral 8x7b and llama2 70b
@errgo2713 Год назад
I just asked Mistral-Medium, it answered:
Only one prime number ends in 2, and that is the number 2 itself. All other even numbers greater than 2 are not prime because they can be divided by 2. Primes are numbers that have only two distinct positive divisors: 1 and the number itself. In this case, 2 is the only even prime number.
@kanishak13 Год назад
why no model is comparing its results with bard or gemini?
@Leto2ndAtreides Год назад
They're being stubborn about giving us access to Mistral Medium... I'm not sure they have enough compute for some reason (what are clouds for???)
@dr.mikeybee Год назад
Here's Microsoft Copilot: an inlet pipe can fill a tank in 25 minutes. the drain can empty the tank in 55 minutes. Once the tank was empty, the drain was left open and the tank was filled by the pipe. How long did it take for the tank to fill? use calculus. Let's denote the rate at which the inlet pipe fills the tank as \(F\) and the rate at which the drain empties the tank as \(D\). We know that \(F = \frac{1}{25}\) tanks per minute and \(D = \frac{1}{55}\) tanks per minute.
When both the inlet pipe and the drain are open, the net rate of filling the tank is \(F - D\).
We can set up the following integral to represent the total amount of water in the tank over time:
\[
\int_0^t (F - D) \, dt = 1
\]
Substituting the values of \(F\) and \(D\), we get:
\[
\int_0^t \left(\frac{1}{25} - \frac{1}{55}
ight) \, dt = 1
\]
Solving this integral gives us the time it takes to fill the tank:
\[
t = \frac{1}{\frac{1}{25} - \frac{1}{55}} \approx 34.375 \text{ minutes}
\]
So, it takes approximately 34.375 minutes to fill the tank when both the inlet pipe and the drain are open. Please note that this is an approximate value and the actual time may vary depending on the specific conditions.
@Garbhj Год назад
It used the right method, but I guess it just can't do numerical calculations.
@Dron008 Год назад
"Confidence" is also a word in the response so there are 10 words ). Anyway I don't think we can rely on a single answer, it could be coincidence. More different questions should be asked to make sure it can count words. Usually models are very bad in counting objects. No onw could count seagulls on my photo..
@Fordtruck4sale Год назад
So a 6 bit EXL2 quant would be how much VRAM?
@tomschuelke7955 Год назад
Wouldnt it be nessesair to repeat the answers several time to get a statistically correct representation for the quality of the answers?
@bobinorlando Год назад ⁺⁴
I hope you are keeping a separate set of different tests that you haven't published but that you are conducting since it would be rather trivial for any of these AI's to hardcode the correct answers to any of these tests knowing that you are going to use them.
@highdefinist9697 Год назад
You could also try giving the models nonsensical or contradictory questions, to check if they are able to figure out that there is no correct answer, rather than giving a wrong answer. For example, GPT-4 fails at the following, relatively simple problem:
I have three apples today, and I ate one of those three apples yesterday. How many apples do I have today? Explain step by step.
@victorwilson1337 Год назад
is it fair to compare the costs of the models without including GPT-4-1106-Preview?
@ylazerson Год назад
great video - thanks!
@ramoneduardo Год назад
Yes, please add these more complex word problems to your test set! Thank you!
@fenix20075 Год назад
Great! I love uncensorship! Hahahaha... Suggest adding a 2 equation test: 2A+3B=12, A+B=5. Try to find out A and B's value. (which A=3 and B=2)
@MatrixCodeBreaker88 Год назад
When are you gonna cover the new update and features of LM Studio?
@snow-hk9zw Год назад
amazing!I want you use this right now!
@Kingzt369 Год назад
what is the point in paying for tokens when gpt 4 is free in microsoft copilot?
@burnt1ce85 Год назад ⁺³
How come you didnt compare GPT-4 Turbo pricing with mistral medium? Mistral medium is just a bit cheaper than GPT-4 turbo
@tomski2671 Год назад ⁺¹
I'm seeing 3.64 times cheaper on output - that is not insignificant.
@robxsiq7744 Год назад
Turbo? you mean GPT3.5 turbo?
@redone823 Год назад
What's their pricing? I'm on a wait-list and can't see anything.
@lio1234234 Год назад
@@robxsiq7744 No, GPT-4 Turbo, it's on the API
@burnt1ce85 Год назад
Oops... After reading the pricing on my desktop (i initially read it on my phone), I realize that I misread the pricing... ignore my dumb comment
@MacS7n Год назад
The team behind Mistral already watched your videos and test questions so I’m wondering if they haven’t fine tuned Mistral medium it with your test questions. You should have new questions never seen before but same type of logic and reasoning.
@friendofai Год назад
Do you know if they plan release this model for local use in the future?
@aldousd666 Год назад
Have you done any more word counts? I'll be interested to see if it gets that right over and over again. That's kind of insane because that means it has to have some short term working memory
@pon1 Год назад ⁺¹
This is mindblowing, that it got the count of words right for its next answer is sign of planning ahead, which traditionally AI models can't do since it is just predicting the next word. So it would have first had to predict word for word the next sentence then counted it knowing that the number in the sentence wouldn't change the count. Incredible. Wonder how the large model would do.
@gidmanone Год назад
@matthew_berman How do you know your test cases (or any of their variants) have not made it into the training data ?
@DeepThinker193 Год назад ⁺¹
Ugh this test is basically pointless unless you ask it questions outside of your usual question set, like the SAT question it got wrong. The developers can easily make it answer your specific AI questions and thus why they would give you an inside scoop.
@noomondai Год назад
Do you think you can run the Mixtral-8x7B model with a AMD Ryzen 7040 if you have enough RAM?
@elck3 Год назад ⁺¹
@matthew_berman Matthew, please try “What is the largest number written out in word form between 1 and 1 million that does not contain the letter 'n'?”
Even GPT-4 doesn’t get this right. Could you please try this?
@darwinboor1300 Год назад
Keep making the problems more complex over time.
After each problem, ask the AI if it has seen this type of problem before. You are looking for edge case solutions. You are not looking for previously trained solutions. It is interesting to see how the AIs respond.
NOTE: To remove ambiguity, the filling task should state that the rates are fixed.
In our world, rates are dependent upon pressure and resistance. The emptying pressure increases with the height of the fluid column.
@LiFancier Год назад
The last few times you used the ball/marble and cup problem, you've called it a marble in the first part of the problem, and a ball in the last part. It might be confusing the model to call it both a marble and a ball.
@russelllapua4904 Год назад
Wait wait. Mixtral was only a few days ago right? And this model already beats it?
@sup3rn0va87 Год назад
What is the point of comparing mistral to GPT-4 8k and 32k and not their 128K GPT-4 Turbo model?
@saudfata6236 Год назад
I suggest postman for working directly with APIs
@xdasdaasdasd4787 Год назад
I do hope they open-source it. Like keep the models open-source but make access easy for a price
@RakeshMurria Год назад ⁺²
Hi Matt my only worry watching this is that Mistral gave you early access for your reach and therefore may have trained their data on some the questions you ask to get it to pass? Wondering what can be done to avoid this.
@test_7738 Год назад
lmao really
@adamrak7560 Год назад
Their API is invitation only.
So they announce a new powerful and cheap model, give access to influencers so they can test it and show people that the model is actually quite good. But it turns out that there is no API or model weight access at all, so people who see the video cannot actually use it. Also there is no ETA for the access.
@ivancartago7944 Год назад
Did you do the counting words multiple times or just once? Might be a fluke.
@NewGuestUser Год назад
Did it get the prompt question correct? If you include the word "Confidence" the answer included 10 words and not 9. 🙂
@Parisneo Год назад
Amazing video. I think Mistral AI is doing a great job. As of the code, you could have used the python library they provide. That's better than just using requests and you can also have it stream the output sequencially.
Nice video as always.
@fbalco Год назад ⁺¹
Why GPT4 and not GPT4 turbo when comparing pricing? Paid post for mistral so need to make GPT4 look worse? I would hope you know that GPT4 turbo is 25% of the price of GPT4
@JoePiotti Год назад
The batch answer should consider that the sun doesn’t shine equally all day.
@B4zing4 Год назад
will this run on my 7900xtx?
@8eck Год назад ⁺¹
We need to improve our test questions. 😆
@Spathever Год назад
Why not compare pricing against GPT 4-turbo? The difference is then "only" four times.
@ij-9625 Год назад
I had the same question! @MatthewBerman
@DihelsonMendonca Год назад
⚠️ ChatGPT 4 could NOT answer this question correctly: "My father's father had two sisters, Mary and Jane. Jane had three children, one of them being Peter. Peter had two sons, Paul and Jack. Can you tell me if I have any known cousins and their names?"
@sammay1540 Год назад
The confidence interval of your testing is pretty wide. I’d recommend testing the same prompt several times and recording the frequency of correct responses.
@MeinDeutschkurs Год назад
Very promising!!!!
@sulton-max 11 месяцев назад
How to get access to mistral medium ?
@matten_zero Год назад
Ah you didn't use the SDK?
@gileneusz Год назад ⁺¹
you need to prepare more difficult set of questions!
@bestemusikken Год назад
Seems you'll have to up the game on your tests. They are catching up quite nicely.
@hjclub Год назад
Can you test with this question 'Who is Satoshi Nakamoto?' At different year, 2010, 2015, 2020 & 2024.... Let me know what mistral-medium's answer-set might be? thanks!
@aldoyh Год назад
Yes I'd vote for a spin with SOLAR 10.7b! 😍😍
@davidrose9577 Год назад
Love your stuff. Would like to see this mixed with Dolphin, please.
@PerfectArmonic Год назад
The Einstein test of intelligence is the best test for LLMs

Следующие

Автовоспроизведение

AGI Fallout: Shocking Predictions About Society's Future