I think taking more of a symbolically grounded environment approach to getting smaller models to train themselves on better processes for solving problems would work quite well in a variety of different domains. So really glad to see the rstar-math model being the proof of concept for such an approach for sub 10B models.
One of the reason why self evolution performs suboptimally on different data set could be that best answers are always already highest ranked so forced Answers make it worst
While the R-star Mathematics approach is intellectually interesting and proves what's possible, it's not particularly accessible for most practitioners. Thanks for the update anyways.
I think taking more of a symbolically grounded environment approach to getting smaller models to train themselves on better processes for solving problems would work quite well in a variety of different domains.
So really glad to see the rstar-math model being the proof of concept for such an approach for sub 10B models.
Was waiting for this one 🥳
Great job! Interesting paper :)
If humanity survives, 2025 is going to be great.
One of the reason why self evolution performs suboptimally on different data set could be that best answers are always already highest ranked so forced Answers make it worst
While the R-star Mathematics approach is intellectually interesting and proves what's possible, it's not particularly accessible for most practitioners. Thanks for the update anyways.
Shoutout out to “lean-star” researchers. Shoutout Microsoft too, their research takes it to the next level, but lean-star ignited the engine.