I have a question about the memory allocation model. I worked In a system that had to process streams of medical imaging data in the past. We tried at some point something alike to a single thread per core and the obvious approach to memory of evenly split the whole system memory. That although did not work well for us, because the processing elements vary a lot in size and the memory consumption is not even linear on most of the algorithms we had to use (just an example an X-Ray image can have 16 MB, while a Tomosynthesis can have 2 GB). The memory consumption per element varied from a bit over 64 MB to 12 GB in some cases. We ended up wasting a lot of memory on the small cases if we wanted to be able to handle the large cases (at an obviously unacceptable level). We eventually changed into a model where there were artificial "pages " of memory that each thread could require at the start of a new task , while trying to fullfill their needs before they start. The problem became the obvious issues with sometimes a core/thread stalling because there was not enough pages for what it wanted to do and the whole bag of threading problems just arising again . That led me to the conclusion that modern streaming architectures are a bit limited on options when dealing with a domain where the data/tasks are not homogeneous (at least on the same order of magnitude). What if your view/ opinion on that problem? is there some other obvious solution that I missed?
Had a very similar problem while working on a different system... Yeah... heterogeneous payloads are evil, we couldn't come up with a silver bullet for that...
I have a question about the memory allocation model.
I worked In a system that had to process streams of medical imaging data in the past. We tried at some point something alike to a single thread per core and the obvious approach to memory of evenly split the whole system memory. That although did not work well for us, because the processing elements vary a lot in size and the memory consumption is not even linear on most of the algorithms we had to use (just an example an X-Ray image can have 16 MB, while a Tomosynthesis can have 2 GB). The memory consumption per element varied from a bit over 64 MB to 12 GB in some cases. We ended up wasting a lot of memory on the small cases if we wanted to be able to handle the large cases (at an obviously unacceptable level).
We eventually changed into a model where there were artificial "pages " of memory that each thread could require at the start of a new task , while trying to fullfill their needs before they start. The problem became the obvious issues with sometimes a core/thread stalling because there was not enough pages for what it wanted to do and the whole bag of threading problems just arising again .
That led me to the conclusion that modern streaming architectures are a bit limited on options when dealing with a domain where the data/tasks are not homogeneous (at least on the same order of magnitude). What if your view/ opinion on that problem? is there some other obvious solution that I missed?
Had a very similar problem while working on a different system... Yeah... heterogeneous payloads are evil, we couldn't come up with a silver bullet for that...