An interesting expansion on custom AI models: There are lots of quantization schemes coming up that can't be employed to full efficiency on classical GPUs or CPUs, such as Bitnet 1.58, which is dependent on extremely efficient turnary operations which in contrast can be accelerated quite aggressively on FPGAs and ASICs.
In the case of binary and ternary, FPGAs has huge advantage over GPUs because of their ability to exploit XOR operations. Problem is that accuracy drops badly below the 4 bit mark.
@@beetlebox4858 In the case Bitnet specifically the accuracy actually isn't terribly far off of full precision when trained with quantization aware training as described in the paper; the main issue is actually having to train the model from scratch, but it does work pretty well. And, as you noted, it's very efficient on FGPAs.
I am always interested in applying current trends to the core or very base hardware. Like last year I built a custom lambda image that had low latency than an 8Gb EC2 instance with GPU. Thanks for the video.
An interesting expansion on custom AI models: There are lots of quantization schemes coming up that can't be employed to full efficiency on classical GPUs or CPUs, such as Bitnet 1.58, which is dependent on extremely efficient turnary operations which in contrast can be accelerated quite aggressively on FPGAs and ASICs.
In the case of binary and ternary, FPGAs has huge advantage over GPUs because of their ability to exploit XOR operations. Problem is that accuracy drops badly below the 4 bit mark.
@@beetlebox4858 In the case Bitnet specifically the accuracy actually isn't terribly far off of full precision when trained with quantization aware training as described in the paper; the main issue is actually having to train the model from scratch, but it does work pretty well.
And, as you noted, it's very efficient on FGPAs.
I am always interested in applying current trends to the core or very base hardware. Like last year I built a custom lambda image that had low latency than an 8Gb EC2 instance with GPU.
Thanks for the video.
EC2 actually has some FPGA images called F1s that you can experiment with.
@@beetlebox4858 thanks
currently using Brevitas/FINN frameworks to deploy a tiny-yolo model on a Kria board! Love the hardware design challenge of it all
Sounds very cool. We are looking at developing a Vitis AI and MLOps video as well.
@@beetlebox4858 Looking forward to it, thanks for putting in the work in these videos by the way! High quality stuff!