Super-Simple Tasker -- The Hardware RTOS for ARM Cortex-M, Part-2

Поделиться
HTML-код
  • Опубликовано: 25 май 2023
  • Super-Simple Tasker (SST) is a preemptive, priority-based RTOS kernel fully compatible with Rate Monotonic Analysis/Scheduling (RMA/RMS). It was initially published in 2006. This video presents a unique hardware implementation of a modern version of SST for ARM Cortex-M (M0, M0+, M3, M4, M7, etc.).
    This is Part-2 of the talk delivered at Embedded Online Conference 2023, see:
    embeddedonlineconference.com/...
    In this second part, you will see how SST works internally and, specifically, how it maps to the hardware of ARM Cortex-M. You'll also see how SST compares to the FreeRTOS traditional kernel running the same "Blinky-Button" application on the same STM32 NUCLEO-L053 board.
    Resources
    All code presented in this video is available on GitHub in the SST repository:
    github.com/QuantumLeaps/Super...
    David M. Cummings, "Managing Concurrency in Complex Embedded Systems", www.state-machine.com/doc/Cum...
    Quantum Leaps, video: • Beyond the RTOS - Part...
    Quantum Leaps, video: • Beyond the RTOS - Part...
    Quantum Leaps, video: • Beyond the RTOS - Part...
    Quantum Leaps, video: • Beyond the RTOS Part-4...
    #rtos #arm #embedded #realtime #programming
  • НаукаНаука

Комментарии • 31

  • @wegi9621
    @wegi9621 Год назад +1

    And... CM-3 and CM-4 have implemented regarding 240 ISR. Really doesn't matter that STM32F3 have for example 91 ISR. So going forward You can use up to 240 ISR/threads or your SST can use doesn't install ISR from 92 to 240. Simply enought add in vector table 92 vector to ISR routine and add this routine in the code. Next you can set the NVIC priority for them and enable and finally - You can invoke "NVIC_SetPendingIRQ(92);" and use a PHANTOMIRQ in SST as non blocking thread. I can confirm - IT WORK.

    • @StateMachineCOM
      @StateMachineCOM  Год назад +1

      That's an interesting point. Often the Cortex-M silicon actually supports many more IRQ vectors than the one officially documented. Therefore, it is often possible to use the undocumented IRQs at the end of the vector table. However, this needs to be TESTED and VERIFIED experimentally. --MMS

  • @almari3954
    @almari3954 Год назад +3

    Another great series! Thanks Miro!
    So:
    - we have a RTOS which main 'feature' is pre-emptive context switching
    - this context switching helps with meeting the deadlines in system applicable to RMA (e.g. periodic and independent tasks)
    - tasks in most systems nowadays are are not independent, so our system is not applicable to RMA
    - so we have a mechanism - pre-emptive context switching - which is not without overhead and additionally
    we need another mechanism - mutexes - that mostly eliminates the benefit of having context switches
    is this reasoning correct?

    • @StateMachineCOM
      @StateMachineCOM  Год назад +4

      No, I don't think that your reasoning is correct. The same arguments can be made against any traditional RTOS because all of them are motivated by the RMS/RMA method and support the preemptive, priority-based schedulers. As to task independence, the event-driven paradigm helps immensely because (as explained in the presentation) events can replace sharing of resources. (Resources should be encapsulated inside tasks, which become a "broker/manager" of the resource. Other tasks send events to the broker task rather than interacting with the resource directly). The whole point of an "event-driven kernel" is to provide deterministic and thread-safe event exchange. Occasionally, as a last resort, you might need to share a resource among concurrent tasks, in which case you must use a mutual exclusion mechanism. But to this end, there are non-blocking mechanisms compatible with non-blocking kernels. One good resource here is the "Stack Resource Policy (SRP)" (see ieeexplore.ieee.org/document/128747 ). The SRP can be easily added to the Hardware SST for Cortex-M (by utilizing the BASEPRI register present in ARMv7M+ architectures). I just ran out of time to explain the mutual exclusion aspects in the video. --MMS

  • @ms396
    @ms396 9 месяцев назад

    Hi, first off: awesome Videos! They remind me, that understanding scheduling is truely unterstanding how everything works in embedded...
    As I understand, the main difference to the QPC-preemptive kernel is the deeper usage of the NVIC as scheduler? On the back side, this is not as portable?
    And regarding the activation of tasks with separate ISR functions. In Cortex M4, could you put the task functions in a RAM array and delegate any SST related ISR to one single ISR which uses the SCB->ICSR->VECTACTIVE field to reference this array? Maybe it would make the coding style a bit more flexible on the backside of little less performance. I think in AVR a similar approach is possible because you have space for 2 instructions to jump into the single "SST dispatcher ISR".

  • @windward2818
    @windward2818 8 месяцев назад

    The run to completion paradigm is embraced in some rather efficient foreground / background RTOSs as architected as a solution to the needs of a particular application. Meaning it is architected to address the real-time requirements of a well defined (detailed and well documented requirements which also include testing) system as to operation in meeting real-time deadlines, and as such is not presented as a general purpose approach like freeRTOS or SST.
    However, there are similarities in how the on-board interrupt controllers are used to control program flow in foreground / background systems. Where the video demonstrates the concept that hardware flow control is very efficient without the need to do an entire context switch, and does a task, if looked at as an object, need to be sequestered when there is no supervisory dedicated memory management unit being used.
    Unfortunately, architecting systems that carefully that utilize the available ISR or core hardware like NVIC, requires a great amount of discipline in organizing the routines and training for those who may support the application. Because if we add software features (feature creep) outside of the original software specification the implementation can get rather fragile.
    The added traditional challenge is how the run to completion processes share data with background tasks, and communicate events. How do we manage shared resources, and do they really need to be shared if the tasks are organized properly.
    And as you know, as we go round and round conceptually striving for optimization like SST, we begin to see patterns in how real-time systems are architected to optimize the RTOS by leveraging available core hardware.
    We also start to make rather basic decisions of how a system might run. For example only one process or task may own a resource, whereby the information is then provided for all the other tasks in a coherent manner, and only done so while not blocking.
    You touched on one aspect that is often initially ignored in RTOS design, in that what happens to the system when we run out of time to complete tasks. How does the system performance degrade? And exactly how does this degradation manifest itself in actual system performance.
    We may then ask the question why doesn't the core have many more hardware features to monitor real-time behavior built in? It is hard to be general purpose and then optimized at the same time. Or are we saying that solutions are on a system level, which requires formulating the overall best approach to hardware and software integration, and as such is code portability even viable.

  • @ashrafkamel1287
    @ashrafkamel1287 Год назад

    if we have a button module
    With several button instances in it.
    in SST,
    We would post every event from every button to every task that would ever need a button event even once.
    what do you recommend for such a case?

  • @dmrsim94
    @dmrsim94 Год назад +1

    Hello, in general when is better to use Bare-metal programming instead of RTOS?

    • @StateMachineCOM
      @StateMachineCOM  Год назад +3

      This is a very complex question because it all depends on the understanding of "bare-metal" and "RTOS". If by "bare-metal" you mean the "foreground/background" architecture as explained in lesson-21 ruclips.net/video/AoLLKbvEY8Q/видео.html , then I actually don't recommend using it directly. Also, if by "RTOS" you mean the traditional blocking RTOS, as presented in lessons 22-28, see ruclips.net/p/PLPW8O6W-1chyrd_Msnn4LD6LBs2slJITs , I don't recommend using it directly either. The architecture that I actually use and recommend is the event-driven model based on the "Active Object" pattern, see ruclips.net/p/PLPW8O6W-1chx8Y7Oq2gOE0NUPXmQxu2Wr . This architecture can work with quite wide range of schedulers, including schedulers similar to "superloop", traditional RTOS, or non-traditional RTOS, like the SST. I plan to explain the whole spectrum of schedulers in the upcoming lessons. Stay tuned! --MMS

  • @ashrafkamel1287
    @ashrafkamel1287 Год назад

    having these two properties :
    - preemptive priority-based scheduling
    - uses the main stack
    this is applicable to a wide range of microcontrollers.
    Very impressive
    and instead of the interrupt pend bit, it would be the flag bit.

    • @StateMachineCOM
      @StateMachineCOM  Год назад +1

      The particular hardware implementation shown in the video is specific to ARM Cortex-M only. But the general idea of a non-blocking, run-to-completion RTOS kernel is indeed applicable to a wide range of microcontrollers. One example is the QK RTOS kernel, described in Chapter 10 of my book "Practical UML Statecharts in C/C++, 2nd Ed." www.state-machine.com/psicc2 . --MMS

    • @ashrafkamel1287
      @ashrafkamel1287 Год назад

      @@StateMachineCOM The only special aspect of ARM NVIC is the faster transitions between nested interrupts.
      I see no reason that makes SST not applicable to other architectures as well, IMO.
      maybe I can share a porting to DSPIC MCUs
      for example

    • @StateMachineCOM
      @StateMachineCOM  Год назад

      @@ashrafkamel1287 The SST for Cortex-M it is not just about "faster transitions between nested interrupts". The NVIC implements the whole preemptive *scheduling* in hardware. For other CPUs, you'd need to do the scheduling in *software*. In fact, that is precisely what the "legacy SST" from 2006 did. And also, the QK kernel that I mentioned before is a software implementation of the SST idea. From my experience, the software scheduling is not trivial, but sure, please share your SST port with the dsPic MCU! --MMS

  • @coderhex1675
    @coderhex1675 Год назад

    Is there any documentation/ebook relating the SST? For example i always read FreeRTOS pdf in their website.
    What my understanding is basicly utilizing/repurposing NVIC for RTC tasks?

    • @StateMachineCOM
      @StateMachineCOM  Год назад +1

      Yes, I also initially thought that the Hardware-SST is "basically utilizing/repurposing NVIC". But, this "repurposing" is NOT trivial because you need the whole event-driven infrastructure (events + event queues) and you need to know when to pend the IRQs from software. Regarding the documentation, there is no comparable PDF to FreeRTOS yet, but there is an explanation of the main concepts in the form of the original SST article (see "legacy SST" in ruclips.net/video/PTcauYl994A/видео.html ). Such things take time (and it took years before the first FreeRTOS PDF was released.) But SST is available on GitHub under a permissive open-source license. We'll see what the open-source community will contribute... --MMS

  • @davidlauzon1207
    @davidlauzon1207 9 месяцев назад

    How would you go about determining the required stack size for SST?
    -In a traditional RTOS, it’s usually straightforward to empirically evaluate individual tasks for their worst-case usage; but if everything is in SST is on the same stack, how do you know you’ve seen the worst case?
    -If using tools to analyze the code, do you find that they get lost in the seemingly-recursive nature of the scheduler calls?

    • @StateMachineCOM
      @StateMachineCOM  9 месяцев назад

      The stack usage analysis in a single-stack kernel like SST is more straightforward than in a traditional blocking RTOS kernel. There are NO "seemingly-recursive" scheduler calls. Every task can occur on the stack at most once. The worst-case stack use happens when all of the used priority levels happen to nest (from the lowest to the highest priority level). From the hardware SST implementation with the NVIC it should be apparent that the stack usage is the same as nested prioritized interrupts. So, you could just as well pose your question to the hardware designers at ARM Ltd., who should notice if NVIC interrupt nesting could be recursive and unbounded. I would trust them that this is not the case... --MMS

    • @davidlauzon1207
      @davidlauzon1207 9 месяцев назад

      Thanks for the reply!
      I agree - there is nothing truly unbounded here.
      I remember playing with “SST” based on the Embedded Systems magazine article from many years ago, and in that software implementation, any thread that posted an event would end up calling the scheduler, and of course the scheduler could run any thread. So, when I asked my linker to provide stack usage information , it got completely lost.
      I think you are right that tools would have a better chance at determining stack usage with the hardware assisted system in your video. That is very interesting. Thank you again for sharing this!

    • @StateMachineCOM
      @StateMachineCOM  9 месяцев назад

      @@davidlauzon1207 Yes, the "legacy SST," as it is now called, launches every task from a scheduler (a function). But even with this software implementation, the scheduler can launch only a task (again, a function call) of a higher priority than the current one. This ensures that each task can nest at most once on the stack. The linker does not "know" about that rule, so it cannot statically determine the worst-case stack use. --MMS

  • @Xeenych
    @Xeenych 9 месяцев назад

    I'm trying to use your SST implementation, but I've encountered a problem.
    Assume you have a blinker-button project.
    Instead of a blinker AO I have an ADC AO. It has event queue and is attached to some unused interrupt. It also have a START_SIG.
    Upon receiving START_SIG ADC AO starts ADC measurement with interrupt on ADC complete. The measurement takes some significant time and during that time, no measurement can be started again. The ADC is busy.
    Imagine there is two or even more START_SIG in ADC AO event queue.
    What should it do upon processing the second event? How ADC AO should wait for ADC completion? There is no delay_ms() or some yield() functions.

    • @StateMachineCOM
      @StateMachineCOM  9 месяцев назад

      You seem used to blocking calls, like delay() or yield(). So perhaps a good approach for you would be to step back and ask yourself what would happen if you *had* delay() or yield() (?) It turns out that after START_SIG your blocking task waits for the equivalent of the ADC_COMPLETE event (only that event can unblock your traditional task). So, most of the time, other START_SIG events are silently ignored. If this is the case (and you can live with that), your non-blocking SST task can ignore START_SIG as long as the ADC is busy (ignoring an event is easy--just do nothing for that event). On the other hand, you might also decide that you cannot ignore START_SIG events, in which case you *can* do something about it. For example, you could count the START_SIG events in the busy state. After you receive the ADC_COMPLETE signal, you exit out of the busy state and can look at your counter. If not zero, you can decrement the counter and initiate another ADC conversion. You can repeat the cycle as long as your up-down counter is zero. This is just an example, but it illustrates that the event-driven approach gives you *more control* over what you want to do precisely because your tasks remain *responsive* to all events. Using blocking tasks is like sticking your head into the sand and pretending that other "unexpected" or "inconvenient" events didn't occur. --MMS

  • @jayakrishnanharikumaran3758
    @jayakrishnanharikumaran3758 Год назад +1

    How could we handle resource sharing and access using SST?
    For eg: if we are sending data over a DMA channel and we need to wait until the transfer is complete to raise the next request to the DMA.
    If DMA write is handled by TASK1 -> It can initiate the write when an event is raised.
    When another event is sent to the queue, TASK1 needs to wait for the previous write to finish before starting[act on] the new request.

    • @StateMachineCOM
      @StateMachineCOM  Год назад +3

      You ask two *different* questions: #1 "How to handle resource sharing in SST?" and #2 "How to wait for the write to finish". In this comment, I try to answer question #1 about resource sharing. The formal treatment of this is provided in the paper "Stack Resource Policy (SRP)" by T.P. Baker (you can google for it). The SRP can be quite easily added to SST in the form of a selective scheduler locking up to the specified priority ceiling. Specifically, you can use the BASEPRI CPU register available in ARMv7M architecture (Cortex-M3 and higher). This will be officially added to SST on GitHub soon. --MMS

    • @StateMachineCOM
      @StateMachineCOM  Год назад +2

      Regarding your question #2 "How to wait for the write to finish", SST tasks cannot block and wait for something. Instead, when the DMA is done, it will interrupt the CPU and generate another event for TASK1. In the meantime, TASK1 must decide what to do with the requests to send more data. TASK1 can either defer the request events (store them for later) or perhaps just discard them. When the DMA-done event arrives, TASK1 might send the deferred event. --MMS

    • @jayakrishnanharikumaran3758
      @jayakrishnanharikumaran3758 Год назад

      @@StateMachineCOM Thank you very much Miro. Now it is clear what should happen here. The task can keep logging the events, while transfer ongoing flag is set [within the task] and when the DMA complete event is received, the task can check the deferred events [in a local queue] and initiate a new DMA transfer. The reason I am saying local queue is the SST always removes the event from the queue before scheduling the evt handler. I am not sure if a cleaner way is to store back into the same queue quietly from the evt handler [if transfer ongoing] without using the SST_POST function to avoid a private queue within the task.
      Thank you once again for your generosity.

    • @StateMachineCOM
      @StateMachineCOM  Год назад +2

      @@jayakrishnanharikumaran3758 Yes, you got it. The task must "remember" the context to keep responding to events depending on its internal state (in your case either DMA-ready or DMA-not-ready). In the simplest case, you can get away with using a simple boolean "flag". But this quickly becomes unmanageable for managing more complex states. A very elegant solution is to use a *state machine* to "remember" the context, which I mention in the video. So state machines could (and should!) be added to SST!

    • @StateMachineCOM
      @StateMachineCOM  Год назад +2

      Regarding deferring events, you cannot re-post them to the main event queue of the task because this would make the task ready-to-run and it will create an endless loop. So, you need to use a *different* queue to hold the deferred events. This is described in the "Deferred Event' design pattern: www.state-machine.com/doc/Pattern_DeferredEvent.pdf . This feature is not yet available in SST.

  • @prasadghole9976
    @prasadghole9976 Год назад

    Can we overcome limitation of interrupts availibilty by using software exception interrupt with parameter to extend number of events

    • @StateMachineCOM
      @StateMachineCOM  Год назад +1

      No, we cannot "parameterize" a single IRQ or an exception to serve multiple SST tasks. This is because the whole "scheduling" is happening in the NVIC hardware, so for every SST task the NVIC must deal with a separate IRQ vector, with a unique IRQ number. But I don't even think that it is a big limitation in practice. Small M0/M0+ MCUs have indeed a relatively small number of IRQ vectors, but then they also run simpler applications. Bigger M7/M33 MCUs have well over hundred IRQ vectors, and in practice over half of them go unused in any given application. In the bigger Cortex-M MCUs, you can have 50-100 SST tasks. Please also remember that an event-driven task can typically do more than a traditional RTOS task because an event-driven task is responsive and extensible. --MMS

    • @wegi9621
      @wegi9621 Год назад +1

      @@StateMachineCOM Didn't check CM0/0+ how many ISR have implemented the ARM core, but usually providers like STM32 doesn't fill all this places, but checked on CM4 you can use not implemented by provider ISR from the end of them last implemented place to point described in the ARM documentation. This is exactly for SST golden gave. You need simply extend vector table and ISR routines. Pleasse check - it work. You can use NVIC_SetPendingIRQ(your_number) if the arm core have it implemented.
      F429 example:
      #define led1_bb BB(GPIOG->ODR, PG13)
      #define led2_bb BB(GPIOG->ODR, PG14)
      #define PHANTOM_IRQn 91
      volatile uint32_t delay;
      int main(void){
      RCC->AHB1ENR |= RCC_AHB1ENR_GPIOGEN;
      gpio_pin_cfg(GPIOG, PG13, gpio_mode_out_PP_LS);
      gpio_pin_cfg(GPIOG, PG14, gpio_mode_out_PP_LS);
      SysTick_Config(16000000/4);
      NVIC_EnableIRQ(PHANTOM_IRQn);
      while(1){
      delay = 4;
      while(delay);
      NVIC_SetPendingIRQ(PHANTOM_IRQn);
      }
      } /* main */
      void SysTick_Handler(void){
      if(delay) --delay;
      led1_bb ^= 1;
      }
      void Phantom_IRQHandler(void){
      led2_bb ^= 1;
      }