Salut, pourrais-tu faire des vidéos sur les solutions cryptographiques pour éviter les injections par fautes par électromagnétisme, par horloge, par laser ou par voltage.
Thank you for the informative video. I think a more fair example would be where you want to do the sequential operation with a few million different (sequential) seeds and not just one sequential loop. Even if the GPU takes 40 times longer, if it can do a few thousand of these in parallel then it may still be more efficient for iterating multiple instances of RC4 or even AES A long time ago you mentioned it takes you maybe 30 days to determine a 40 bit RC4 key with a CPU but when I run multiple instances in parallel on a single CPU then I find it takes a maximum of 48 hours (so 1 day on average). I have recently started to investigate the GPU approach and your code examples will be a useful starting point for me to progress this to see if this can be reduced even further.
@@49rekcaH I don't think I said it is a standard home CPU but it is a consumer laptop that I have in my home! Although, it is slightly higher specification as I use it for various radio, coding and video projects. My machine uses a 12th Gen Intel(R) Core(TM) i9-12900HK 2.90 GHz processor with 32GB RAM under Windows 10 and I have written very efficient C++ code optimised specifically for this task. I normally run about 40 instances in parallel with a custom batch file which creates the parallel processes that churn through the required combinations. As I said, it does take up to 48 hours (maybe very slightly more) and completely ties up my machine but unless the key is somewhere near the every end then it obviously will finish long before that so it works well for me. I'm not sure if I should be offended or complimented by the fact that you don't appear to believe this to be possible. As a real-world comparison, maybe you can advise how long your own code takes to execute? I wrote this tool by myself and for myself (and I use it for educational and research purposes only). The basic algorithm is fairly straightforward and I'm sure there are many many people who have written their own versions like me (including Louis obviously). I have over four decades of coding experience and I believe that getting it to work efficiently is what takes the skill which is why I was interested in this video. Anyway, I am simply using my personal experiences to support the point that it would be interesting to build on the concepts in this video and better explore more parallel CPU processing versus parallel GPU processing to compare these approaches more fairly when dealing with large volumes of potential keys. The video does a great job of explaining the different restrictions that apply to the potential approaches but could go a but further about how this might work best specifically in a PMR Research context where there is better opportunity for parallelisation that maybe suggested in the video.
@@Cagiest No offense, writing the previous post I did not mean to hurt you. By writing home cpu I meant a regular processor, not a server one (like xeon or epyc). I am still learning new things on my own. In my case I tested it on an i7-11850H, writing in C, and I wasn't even close to your 48 hour result. Looks like there is a space for improvements. May I ask you why are you running ~40 instances (cpu-bound threads?) on your cpu - isn't it inefficient?
No offense taken and thanks for your reply! No individual process takes 100% CPU, in an OS like windows anyway. And most processors do some elements of parallel processing within their chips so there is an advantage to running more than one instance of an algorithm. A single individual thread definitely runs much faster by itself but if I ran them one after another then it would probably take over a month. I experimented to see where I got the best balance and after about 40 threads I started to get diminishing returns. In reality I use 42 or 43 threads so that I can get through 6 smaller runs in total giving me 256 instances of a program which processes 16777215 keys at a time... On an i5 I have it was much much slower and I couldn't run more than a few threads but memory is probably a big factor there also. Anyway, we are probably getting slightly off-topic and I don't want to detract from the great content that Louis is providing here...
@49rekcaH Inspired by the original contents of the video and the discussions above, I have made a CUDA version of my code to run on my graphics card GPU for comparison and it runs a little quicker overall than the CPU-based version. It is only a first stab though so I hope to optimise it further. So even though the instructions run a lot lot slower, there are theoretically around 1000 kernels actively running at any given time so it balances out and gets through about 1/256 of the keyspace in 11 minutes. The great news though is that it only slightly slows down the CPU version if that is also running at the same time on the same laptop and so I can actually split the workload and do half on each now which gets me down under 24 hours maximum elapsed (so 12 hours on average). The actual configuration is 40 CPU processes in parallel by 1/256 of keys executed 3 times in succession which searched 120/256 keys and then 1 GPU process also in parallel doing 1/256 keys which I execute 136 times to search the remaining 136/256 keys. And I can run that with one overall batch file. Again, all of this is just for research and educational purposes! The graphics card in my laptop is just a built-in Nvidia RTX 3080 which is in addition to the CPU spec mentioned above. Also, theoretically I could even add an additional external GPU (using thunderbolt or USB-C) and reduce the overall GPU execution time by 50% again if desired. @LouisErigHERVE I don't know if that informs the question that you ask "CPU ou GPU, lequel va calculer le plus vite ?" Maybe the answer is actually "CPU et GPU!" I don't mean to undermine your conclusions in this regard and no offense is intended. Thank you for the information and background that you have provided. I hope that the fact that you have prompted some research and discussion around this question is a positive thing.
Salut, pourrais-tu faire des vidéos sur les solutions cryptographiques pour éviter les injections par fautes par électromagnétisme, par horloge, par laser ou par voltage.
❤❤❤Nice
Un abonnement et un like très belle bidouille
Thank you for the informative video. I think a more fair example would be where you want to do the sequential operation with a few million different (sequential) seeds and not just one sequential loop. Even if the GPU takes 40 times longer, if it can do a few thousand of these in parallel then it may still be more efficient for iterating multiple instances of RC4 or even AES
A long time ago you mentioned it takes you maybe 30 days to determine a 40 bit RC4 key with a CPU but when I run multiple instances in parallel on a single CPU then I find it takes a maximum of 48 hours (so 1 day on average). I have recently started to investigate the GPU approach and your code examples will be a useful starting point for me to progress this to see if this can be reduced even further.
What kind of cpu and how many threads? 48h - impossible using standard "home cpu"
@@49rekcaH I don't think I said it is a standard home CPU but it is a consumer laptop that I have in my home! Although, it is slightly higher specification as I use it for various radio, coding and video projects. My machine uses a 12th Gen Intel(R) Core(TM) i9-12900HK 2.90 GHz processor with 32GB RAM under Windows 10 and I have written very efficient C++ code optimised specifically for this task. I normally run about 40 instances in parallel with a custom batch file which creates the parallel processes that churn through the required combinations. As I said, it does take up to 48 hours (maybe very slightly more) and completely ties up my machine but unless the key is somewhere near the every end then it obviously will finish long before that so it works well for me.
I'm not sure if I should be offended or complimented by the fact that you don't appear to believe this to be possible. As a real-world comparison, maybe you can advise how long your own code takes to execute? I wrote this tool by myself and for myself (and I use it for educational and research purposes only). The basic algorithm is fairly straightforward and I'm sure there are many many people who have written their own versions like me (including Louis obviously). I have over four decades of coding experience and I believe that getting it to work efficiently is what takes the skill which is why I was interested in this video.
Anyway, I am simply using my personal experiences to support the point that it would be interesting to build on the concepts in this video and better explore more parallel CPU processing versus parallel GPU processing to compare these approaches more fairly when dealing with large volumes of potential keys. The video does a great job of explaining the different restrictions that apply to the potential approaches but could go a but further about how this might work best specifically in a PMR Research context where there is better opportunity for parallelisation that maybe suggested in the video.
@@Cagiest No offense, writing the previous post I did not mean to hurt you. By writing home cpu I meant a regular processor, not a server one (like xeon or epyc). I am still learning new things on my own. In my case I tested it on an i7-11850H, writing in C, and I wasn't even close to your 48 hour result. Looks like there is a space for improvements. May I ask you why are you running ~40 instances (cpu-bound threads?) on your cpu - isn't it inefficient?
No offense taken and thanks for your reply!
No individual process takes 100% CPU, in an OS like windows anyway. And most processors do some elements of parallel processing within their chips so there is an advantage to running more than one instance of an algorithm. A single individual thread definitely runs much faster by itself but if I ran them one after another then it would probably take over a month. I experimented to see where I got the best balance and after about 40 threads I started to get diminishing returns. In reality I use 42 or 43 threads so that I can get through 6 smaller runs in total giving me 256 instances of a program which processes 16777215 keys at a time...
On an i5 I have it was much much slower and I couldn't run more than a few threads but memory is probably a big factor there also.
Anyway, we are probably getting slightly off-topic and I don't want to detract from the great content that Louis is providing here...
@49rekcaH Inspired by the original contents of the video and the discussions above, I have made a CUDA version of my code to run on my graphics card GPU for comparison and it runs a little quicker overall than the CPU-based version. It is only a first stab though so I hope to optimise it further. So even though the instructions run a lot lot slower, there are theoretically around 1000 kernels actively running at any given time so it balances out and gets through about 1/256 of the keyspace in 11 minutes.
The great news though is that it only slightly slows down the CPU version if that is also running at the same time on the same laptop and so I can actually split the workload and do half on each now which gets me down under 24 hours maximum elapsed (so 12 hours on average).
The actual configuration is 40 CPU processes in parallel by 1/256 of keys executed 3 times in succession which searched 120/256 keys and then 1 GPU process also in parallel doing 1/256 keys which I execute 136 times to search the remaining 136/256 keys. And I can run that with one overall batch file. Again, all of this is just for research and educational purposes!
The graphics card in my laptop is just a built-in Nvidia RTX 3080 which is in addition to the CPU spec mentioned above. Also, theoretically I could even add an additional external GPU (using thunderbolt or USB-C) and reduce the overall GPU execution time by 50% again if desired.
@LouisErigHERVE I don't know if that informs the question that you ask "CPU ou GPU, lequel va calculer le plus vite ?" Maybe the answer is actually "CPU et GPU!" I don't mean to undermine your conclusions in this regard and no offense is intended. Thank you for the information and background that you have provided. I hope that the fact that you have prompted some research and discussion around this question is a positive thing.
Tu connais astron25
waiting for the English version