#26 RTOS Part-5: What is "real-time"? Preemptive, priority-based scheduling

Quantum Leaps, LLC

Просмотров 26 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 7 сен 2018
In this fifth lesson on RTOS I'll finally address the real-time aspect in the "Real-Time Operating System" name. Specifically, in this lesson, you will augment the MiROS RTOS with a preemptive, priority-based scheduler, which can be mathematically proven to meet real-time deadlines under certain conditions.
Articles mentioned in the lesson:
---------------------------------
C.L.Liu and James W. Layland "Scheduling Algorithms for Multiprogramming in Hard-Real-Time Environment" January 1973Journal of the ACM 20(1):46-61
www.researchgate.net/publicat...
Lui Sha Mark H. Klein John B. Goodenough "Rate Monotonic Analysis for Real-Time Systems"
Technical Report CMU/SEI-91-TR-6 ESD-91-TR-6
www2.informatik.uni-stuttgart....
------
Resources:
Companion web page for this video course:
www.state-machine.com/quickstart
GitHub repository for projects for this video course:
github.com/QuantumLeaps/moder...
Transcript of this lesson:
www.state-machine.com/course/...
Music credits:
The background music comes from:
www.bensound.com/royalty-free...
Наука

Комментарии • 78

@StateMachineCOM 2 года назад ⁺²
NOTE: The source code, as presented in the video, might cause compilation errors with the newer MDK-ARM / uVision toolsets. This is because the underlying compiler in MDK-ARM has been changed to "Compiler-6", while the older "Compiler-5" is considered now obsolete. The updated code that compiles cleanly with "Compiler-6" is available from the companion website at:
www.state-machine.com/video-course
and from GitHub:
github.com/QuantumLeaps/modern-embedded-programming-course
@saeidreza6736 5 лет назад ⁺²¹
Excellent course on embedded systems programming. Extremely well prepared, excellent presentation, completely cover all important subjects. Thanks a lot Miro for the great job. Eagerly waiting for your new lectures.
@YourCRTube 4 года назад ⁺³
You are an amazing teacher. The way you mix theory, live examples and hands-on exercises is unmatched.
@jameszhu3584 2 года назад ⁺²
I like the lecture style: question leading! Basically you are tackling a question when you are learning. Excited to see the next course that is "Advance a decade in technology". New problems appear when new technologies emerge. Thank you, Miro.
@mengnan24 5 лет назад ⁺⁵
Thank you! Best materials covering RTOS on youtube!
@fawadahmad2616 3 года назад ⁺³
This is the best lesson in the series so far. Thank you for the lessons :)
@HemalChevli 5 лет назад ⁺¹
Wow, this is hands down the best series on embedded programming, can't wait for the next one!!
@jonpinkley2844 5 лет назад ⁺²
Thanks for such a great series. Well edited to the essential points. One of the few series I don't watch at 2x.
@alaouchicheabdellah4986 3 года назад ⁺¹
The explanations are fascinating, this series of courses could easily replace a University course, and it should.
@SSB2706 3 года назад ⁺²
Another wonderful video, I don't have words to admire you Miro sir!
@mndk1011 5 лет назад ⁺¹
Super valuable video!! Love the concrete example with live logical analyzer results!
@alirezasadeghi2560 5 лет назад ⁺²
Amazing again, learnt lot from these great RTOS series
@goranjosipovic353 3 года назад ⁺¹
Thank you Miro for this amazing course.
@yosmanyhernandezsanchez9760 5 лет назад ⁺¹
Simply amazing. Fantastic work.
@neoyang4965 5 лет назад ⁺²
Hello Dr. Miro, I hope your video course will continue to be updated. I got a lot from this series course. Thank you very much. Eagerly waiting for your new lectures.
@davidh8693 5 лет назад
Its updated xD
@breedj1 5 лет назад ⁺¹
Best coarse I've seen on this topic
@isahilliogluu 3 года назад
Another excellent and academic level lecture. Thank you a lot Mr tutor.
@hardrockkunu 5 лет назад ⁺¹
Gr8 work Miro. looking towards new lectures.
@dsbros3591 5 лет назад ⁺¹
I am very glad that I found this channel.
Awesome, clear, crisp explanation from basics to advance.
Thanks alot. Please keep making this videos. It help us alot.
PS: Can you also make tutorials on nordic's nrf BLE module programming?
@takismarkopoulos5639 4 года назад ⁺¹
In my opinion this was the greatest of all great lessons so far. Let me point out the visual example of the missed deadline at 07.46 and the references to the RMA/RMS papers
@ovais217 2 года назад
Miro --> Because he's the hero Gotham needs, but not the one it deserves right now !!
@nikhilsp13 4 года назад ⁺²
Legendary.
@jeswanthkumar4291 5 лет назад ⁺¹
Awesome, thanks a lot.
@amrssrrrdr9254 5 лет назад ⁺¹
Thanks a lot Miro for the great job , ssssssssssuuuuuuuuuuppppppppppppeeeeeeeeerrrrrrrrrrrrrr
@abominabletruthman 4 года назад ⁺¹
Superb lessons, Miro! You are among the best teachers that I have learned from.
Couple of questions about this video:
1. Is there any particular use case or efficiency gain for having a dedicated OS_delayedSet variable instead of calculating it as needed through the bitwise complement of OS_readySet, e.g., workingSet = ~OS_readySet?
2. I see that we are meeting the deadline of having blinky_1 execute every 2ms, but my understanding is that we are not meeting the deadline of having blinky_2 execute every 54ms because it is being preempted by blinky_1 to take longer than 3.6ms (about 9ms), followed by an additional 50ms delay. Wouldn't our blinky_2 hard-coded delay value of 50 ticks have to be changed dynamically for each period to something less than 50 in order to achieve this?
@StateMachineCOM 4 года назад ⁺²
Ad 1: The QS_delayedSet bitmask is not for efficiency, but to store the threads that are currently delayed. It has different semantics than ~OS_readySet, which means threads that are NOT ready to run. At this point, the only threads that are not ready to run are the delayed threads, but already in the next lesson you will see many other reasons why a thread might be not ready (e.g., semaphore).
Ad 2: you are right that the simplistic *relative* delay() operation is not the best way of arranging the thread to run at the specific interval. Some RTOSes provide a more sophisticated delay() operations, where you can specify the absolute tick on which the delay will expire. In the future lessons, I will talk about event-driven programming, where you don't have blocking delays, but instead you have time-events that trigger processing. In particular, you can specify *periodic* time events, which are much more stable basis for truly periodic thread execution. --MMS
@yisong6665 5 лет назад
Hi everyone! I encounter a problem with the keil uvision failed to recognize the cortex m device in the chain jtag. Anyone have a clue? Thank you so much.
@diptopal 5 лет назад ⁺²
How do you do these awesome presentations? What graphics tool do you use? The zooming, panning etc.? Also FYI Code Composer Studio version 6 and below are completely free now with full unlimited license.
@StateMachineCOM 5 лет назад
The presentations are prepared with screencast-o-matic.com/ --MMS
@anup619thapa 3 года назад
@MMS: Since you have used a macro function LOG2(x) I have a follow-up question. Why not use macro functions for smaller functions (for example ike the LedOn() and LedOff() functions)? Someone once told me to avoid using true functions for small code sections because the function call overhead is greater than the code itself. I'm curious about your thoughts on this topic and when would you recommend picking one over the other?
@MaxxG94 5 лет назад ⁺¹
I have been learning to progam ARM-Cortex M4 microcontrollers by learning to write to the corresponding registers as well as using the TivaWare Peripheral Driver Library API functions. It seems the latter is quicker to program in, however i'm sure there are many reasons to program using registers and data sheets. Can anyone explain the advantages?
Thanks in advance.
@StateMachineCOM 5 лет назад ⁺⁴
Many silicon vendors provide software libraries for their chips. Texas Instruments provides TivaWare for the TivaC MCU family. Similarly, ST Microelectronics provides STM32Cube for their STM32 family. These libraries are supposed to provide a higher-level of abstraction to access the chips. But the problem is that often the level of abstraction is still very low, but now you need to learn the new libraries. Worse, the functions in such libraries hide the actual register access, so you have to jump through this layer of indirection to get to the registers and compare the access to the datasheet of the part. In summary, I believe that CMSIS provides just the right balance between the programming convenience and portability, but without hiding the registers. --MMS
@josephzhang1797 3 года назад
At 23:52, you used the oscilloscope to monitor the actual thread usage in the OS, my question is as an architecture, what is the good tool to monitor these OS usage during the planning phase? And with more and more complicated methods, such as semaphore involved, using oscilloscope would be insufficient to serve the need.
@StateMachineCOM 3 года назад
Toggling pins in software and watching them with a logic analyzer is a relatively primitive, but effective method. Other methods include: software tracing (e.g., see www.state-machine.com/qtools/qpspy.html ), where you produce an output of timestamped "records". Another method is to use a standard debugger with breakpoints and some special registers in the CPU to measure timing. For instance, the ARM Cortex-M CPU has the special cycle-counter register that can be inspected in the debugger. That way you can precisely measure the number of clock cycles between breakpoints (e.g., see www.embedded-computing.com/articles/measuring-code-execution-time-on-arm-cortex-m-mcus ) --MMS
@MrHatemyname 4 года назад
16:55 Is the if-else really necessary? If the CLZ works fine also for Idle thread (no other thread is ready and the result of LOG2 is 0), then you should be able to omit the if-else and right away assign the OS_next variable as you do in the else clause.
@StateMachineCOM 4 года назад
Yes, this would work as well, because LOG2(0)==0. The if-else might be a bit more efficient, though. --MMS
@TheShizzle36 4 года назад
Thanks very much for these helpful videos!
I have one questions which I am confused about. In timing diagrams such as at 24:41, we see context switch happening between system timer interrupts... How does it happen? I thought the scheduler can switch tasks only when called from system timer interrupt.
@TheShizzle36 4 года назад ⁺¹
Ah we are calling scheduler also from delay function! Believe this is answer to my earlier question...
@StateMachineCOM 4 года назад ⁺³
The preemptive, priority-based RTOS kernel must guarantee that at all times the system runs the highest-priority thread that is ready to run. A thread can be made ready to run in just two circumstances: when an interrupt unblocks a thread and when other thread blocks or unblocks another thread. Therefore the RTOS scheduler must be invoked in both such situations: after *every* interrupt (the system clock tick being just one of many) and after every RTOS call that can block or unblock a thread. The delay() function blocks the calling thread, so it must call the scheduler to check if any *other* thread is ready to run. I hope this makes sense... --MMS
@TheShizzle36 4 года назад
@@StateMachineCOM Yes, this clears it up. Thanks ☺️
@DarkOrje 5 лет назад ⁺¹
Hi Miro! Please can You provide the Transcript resp. notesXY resp. lessonXY.txt file without the video time stamps, like You did for all the lessons/files before. From time to time I'm reliant to read them because English isn't my native language. The time stamps blow them up and made them worse readable.
@StateMachineCOM 5 лет назад ⁺²
No problem. The transcript of lesson26 without time stamps has been uploaded to the companion page at www.state-machine.com/quickstart/ . --MMS
@matthiasbaunach4123 5 лет назад
Hi Miro, is there any description available how to setup the logic analyzer? Which PICO hardware is used in the videos?
@StateMachineCOM 5 лет назад ⁺¹
The logic analyzer screen was generated with the PicoScope 2206B MSO. This is one of the simplest PicoScopes that works with the standard PicoScope 6 software. The setup is done according to the manuals available online at: www.picotech.com/oscilloscope/2000/picoscope-2000-manuals . I hope this helps. --MMS
@matthiasbaunach4123 5 лет назад
Perfect, thanks!
@user-fb2pf4tn5d 9 месяцев назад ⁺¹
The debugger gets stuck at a line "BKPT 0x00" in the disassembly, and never proceeds to the main function. I don't think it's a problem with the code, because even with the code from the lesson 25 (which KeilMDK was able to debug successfully) it's getting stuck. Any ideas why this might be happening?
@StateMachineCOM 9 месяцев назад ⁺²
In such situations, your best course of action is to back up your work and download the project for this lesson from state-machine.com/video-course or from github.com/QuantumLeaps/modern-embedded-programming-course . These projects should work, and you need to confirm this with your setup. Once you have a working project, you can try to investigate how it is different from your attempts. The differences could be not just with source code, but also with project options, debugger settings, etc. --MMS
@user-fb2pf4tn5d 8 месяцев назад
@@StateMachineCOM Managed to fix it. Indeed, the debugger settings were the source of this problem. Thanks a lot for the suggestion.
@jonpinkley2844 5 лет назад
Miro,
I have a question regarding what is discussed between 16:12 and 17:06.
What is the practical advantage of checking for the special case of the idle condition? I realize that the check takes little time, but since the LOG2 macro already handles the case for zero efficiently, it seems that the conditional checking could just be removed. Am I missing something?
@TheAkatran 5 лет назад
That is a very good question.
The Cortex-M3/M4F Instruction Set Technical User's Manual shows that you don't have to check for the idle condition BUT the in the GCC Other Builtins 6.57 ( gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html ) says that "Built-in Function: int __builtin_clz (unsigned int x) - Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined."
So this might be the reason why he checks for the idle condition!
@jonpinkley2844 5 лет назад
Thanks for the pointer to the GCC info. After looking at it, it appears that these are generic built-ins meant to work with multiple architectures, so perhaps there are some instruction sets where the CLZ type instruction ( en.wikipedia.org/wiki/Find_first_set ) is undefined. So if using GCC on the Cortex-M3/M4F, it may be best to use the asm functionality.
Here is a link to another discussion about CLZ and CMSIS. This even has Miro's comments in it.
community.arm.com/iot/embedded/f/discussions/4016/how-can-clz-equivalent-be-achieved-on-cortex-m0-where-this-instruction-is-missing
Since the M0 doesn't have a CLZ instruction, and to implement in software is not efficient, checking for zero (which may be a very common case) would definitely be worthwhile on the M0.
While googling, I found another interesting link (slides only) but it specifically mentions the CLZ for use in scheduling.
www.silabs.com/documents/public/presentations/ew-2018-arm-cortex-m-and-rtos-are-meant-for-each-other.pdf
@TheAkatran 5 лет назад
Thank you Jon for all the info provided. It is really helpful!
Because the specific uC has the CLZ instruction defined and the compiler support it then my thought is aligned as yours.
The if-then could be removed and keep only the else part as the CLZ instruction would return 0 if input is 0.
The only probable reason I can think why this check is there is if this code meant to be portable among other platforms (compilers) rather than other uCs as if CLZ is not implemented at all then the else statement would definitely fail.
From your link to Find_first_set - Tool and library support ( en.wikipedia.org/wiki/Find_first_set#Tool_and_library_support ) I notice that gcc CLZ returns 0 if input is 0 but Clang would return" undefined".
Maybe Miro can clarify this.
@StateMachineCOM 5 лет назад ⁺³
The explicit check for the idle condition is made for efficiency, because it is the most frequently executed path through the code. Regarding the CLZ instruction, it is available in Cortex-M3/M4/M7, but is NOT available in Cortex-M0/M0+. If you google for "CLZ for Cortrex-M0", you will find a couple of implementations, some of them mine. One of the fastest is hand-optimized assembly for the closely related LOG2() operation, which is located in the qpc folder. Please look at the end of the file `qpc\ports\arm-cm\qxk\arm\qxk_port.c`. I would like to challenge anybody to come up with a faster implementation than this one. ---MMS
@rajeshkumar-yv9ht 5 лет назад ⁺¹
Dear Simek ,Please Explain how for loop time IS 1.2ms and period of Thread 2 is 54ms Than k you
@StateMachineCOM 5 лет назад ⁺¹
Actually, the CPU utilization of the blinky2 thread is 3 times longer than blinky1 (3*1.2ms == 3.6ms) and it waits for 50 clock ticks (50ms). Therefore, the total period of blinky2 is 50+3.6ms ~= 54ms). In contrast, blinky1 runs for 1.2ms and waits for one clock tick (to the next clock tick). Therefore its period is 2ms. --MMS
@rajeshkumar-yv9ht 5 лет назад
@@StateMachineCOM Quantum Leaps, LLC Dear simek ,how blinky1 Thread time is 1.2ms or how to calculate the for loop cpu load , i cant figure out, i am confused since cpu clock is 50Mhz , i am thinking as for led on = (1/50M) AND led of(1/50M) and so (2/50M)*1500 but it comes as .6micro second ,Please clarifiy it Thank you
@StateMachineCOM 5 лет назад ⁺⁴
Rajesh: First, the CPU utilization of blinky1 is everything that happens in one pass through the wile (1) loop. So, if you look closely, you will see a for-loop with 1500 iterations, each one turning the green LED on and off. And second, one line of C code does NOT equal to one line of machine code, which you incorrectly assume. In fact, most lines of C code compile to multiple machine instructions. And finally, one machine instruction does NOT equal one clock cycle, either. In fact, most machine instructions take more than one CPU clock cycle to execute. For example LDR/STR instructions take at least 2 cycles. Branch takes 4 cycles (or more) if taken, and 1 cycle when not taken. So, in the end, it is quite hard to estimate execution time from looking at the C code. Instead, you need to MEASURE it, for example with a logic analyzer, as shown in this video. If you do this, you will clearly see that blinky1 takes 1.2ms to go through the aforementioned for() loop. --MMS
@rajeshkumar-yv9ht 5 лет назад
@@StateMachineCOM Thanks ,I understand clearly now .
@zhenbosun7038 3 года назад ⁺¹
@@StateMachineCOM Dear Simek, I am not very clear with T value. If "blinky1 runs for 1.2ms and waits for one clock tick (to the next clock tick)" == 2ms; for blinky2 thread, if it also waits for 1 clock tick (the next clock tick), T = 4ms, right? If yes, waits for 50 clock ticks, should have the T = 49 + 3.6 ~= 53ms. Thank you.
@rommio3223 4 года назад ⁺¹
t != (OS_thread*)0)
What is meant by "thread is in use " ?? you mean thread is currently running or thread is started ??
@StateMachineCOM 4 года назад ⁺¹
The OS_thread[] array is organized in such a way that OS_thread[prio] == (OSThread *)0 when the priority 'prio' is not used, and otherwise it is used. "Used" means here that the thread has been started. --MMS
@rommio3223 4 года назад
@@StateMachineCOM Thank you very much
@tea-noodle 3 года назад ⁺¹
Isn't the delayed set just the bitwise not of the ready set?
@StateMachineCOM 3 года назад
The delayed-set complements by the ready-set only if time delay is the only way a thread can block. At this early stage of the kernel development this might be the case, but later other blocking mechanisms could be added (e.g., semaphores). At this point the delayed-set won't be complementing the ready-set anymore. For these reasons it is better to keep the delayed-set separate. --MMS
@tea-noodle 3 года назад
@@StateMachineCOM Thank you for the explanation, and the coursework. If I get an entry level job into embedded systems, it's because of this course. I'll owe you a nice bottle of wine (if you drink).
@ahbarahad3203 11 месяцев назад
whats the point of using a delayed set, at the end of the day your complexity is O(1), at max you will ever have to iterate over 32 indices, adding delayed set and allat just increases complexity for not reason? Illuminate me somebody
@StateMachineCOM 11 месяцев назад
The purpose of the algorithm shown in the video is *deterministic* performance. Specifically, the presented algorithm has O(1) performance characteristics with just 2 machine instructions (taking advantage of the CLZ instruction specifically provided for that purpose). In contrast, a naive iteration over all possible active objects would have O(n) characteristics. Iteration can easily take a few hundred instructions, so it has two orders of magnitude worse case performance. One 32-bit bitmask (the delayed set) is a good tradeoff for achieving this kind of improvement, especially in code that is executed so frequently. --MMS
@mehdisafar8565 5 лет назад
عالی

Следующие

Автовоспроизведение

#27 RTOS Part-6: Synchronization and communication among concurrent threads