I'm using ensembles of simple feedforward networks to approximate posterior distributions of a very noisy dataset. It feels like a knife digging into my leg trying to determine appropriate ensemble sizes with slower libraries, but this seems like savior. Thanks!
What are your thoughts on using FLAX for complicated architectures and dynamic architecture variations? I primarily use PyTorch for these tasks, and it seems like the functional programming aspect of JAX/FLAX may work well for very regular architectures but fail to work for anything too complex. And then what about throwing AMP into the mix for improving training speeds? Those are the main reasons I have avoided migrating workflows over to FLAX, though I do find PyTorch to be slower than it should be. Also, how would you expect FLAX to compare to something like TensorRT in regards to transformers (FLAX transformer on an NVidia GPU vs TensorRT transformer)?
a lot of the new AI models, I have seen in literature (and are from renowned sources which know how to code optimally) use JAX, FLAX. simply because of the costs and time frames associated with it. 2nd: Jax FLAX is optimized for Google's TPU clusters, and the cooperation agreement with NVIDIA will give NVIDIA access to this new tech (IPR), that NVIDIA will use in its latest GPUs.
@@code4AI Do you mean that they have the manpower to spend on optimizing JAX/FLAX? Or that they want to cost optimize their training runs? I could understand the former, but the latter would be a bigger motivator (from the perspective of a poor researcher in academia with limited funding / compute). All of the recent architectures I have worked with are highly irregular though, utilizing if-else blocks in the torch forward pass.
It is beneficial to reduce the compute costs of Microsoft, because if ORCA is trained on a subset of GPT-4, then ORCA will be enough (intelligent, compute capacity) for a lot of people, just playing with GPT-4 on their phone for an email. And every million US$ you save (as Microsoft) goes directly to ..... new free services for the community ..... or was it MS corporate profit?
thanks for the content
I'm using ensembles of simple feedforward networks to approximate posterior distributions of a very noisy dataset. It feels like a knife digging into my leg trying to determine appropriate ensemble sizes with slower libraries, but this seems like savior. Thanks!
Hi,
Can you please make a video on physics informed Neural Networks ?
Love it!!!!!!!!!!!!!!
What are your thoughts on using FLAX for complicated architectures and dynamic architecture variations? I primarily use PyTorch for these tasks, and it seems like the functional programming aspect of JAX/FLAX may work well for very regular architectures but fail to work for anything too complex. And then what about throwing AMP into the mix for improving training speeds?
Those are the main reasons I have avoided migrating workflows over to FLAX, though I do find PyTorch to be slower than it should be.
Also, how would you expect FLAX to compare to something like TensorRT in regards to transformers (FLAX transformer on an NVidia GPU vs TensorRT transformer)?
a lot of the new AI models, I have seen in literature (and are from renowned sources which know how to code optimally) use JAX, FLAX. simply because of the costs and time frames associated with it. 2nd: Jax FLAX is optimized for Google's TPU clusters, and the cooperation agreement with NVIDIA will give NVIDIA access to this new tech (IPR), that NVIDIA will use in its latest GPUs.
@@code4AI Do you mean that they have the manpower to spend on optimizing JAX/FLAX? Or that they want to cost optimize their training runs?
I could understand the former, but the latter would be a bigger motivator (from the perspective of a poor researcher in academia with limited funding / compute). All of the recent architectures I have worked with are highly irregular though, utilizing if-else blocks in the torch forward pass.
But doesn’t PyTorch now have something equivalent?
They try to replicate and learn from JAX, but it is different if you modify your old system or you build it from scratch for performance.
can you make video about Orca LLM? I'm very confused why it's not open source yet...
already online: ruclips.net/user/shortskrCY9-R_qkA?feature=share
@@code4AI "this model is so good, so we will never publish it"
It is beneficial to reduce the compute costs of Microsoft, because if ORCA is trained on a subset of GPT-4, then ORCA will be enough (intelligent, compute capacity) for a lot of people, just playing with GPT-4 on their phone for an email. And every million US$ you save (as Microsoft) goes directly to ..... new free services for the community ..... or was it MS corporate profit?
@@code4AI Orca could be beneficial for all, means for out competitors too, so maybe let's better keep it just for us, said someone in M$. Maybe. 😥