aaaaa00 said:If you want you can seperate the address spaces with the MMU, then you have things called processes. On console platforms, they're not generally used, but all modern OSes on SMP architectures support that abstraction if you want.
setup separate address spaced and then use IPC? sorry, but you just blew a chunk of the potential advantage smp had over the cellular design - the oh-so-precious total memory coherency. congratulations.
This is true of SPEs too. I'm still not clear how a DMA properly synchronizes SPE execution to the PPE. If the PPE wants a job executed, how does it signal to the SPE it wants the job to be done? If the SPE is already busy, how does the PPE add it to the SPE's job queue? How does it get a signal back from the SPE that the job is complete? How does it know where to collect the results from?
i'm not aware ot actual implementation ether but i think we can safely assume hardware-run queues and interrupt notifications, can't we?
All of these operations require two threads to touch some sort of data structure, hence there has to be some sort of locking protecting those data structures, right?
yes, with locking most likely between _two_ threads running on the PPE. as opposed to many more on the xecpu.
Consider the case in which you have 3 threads and 1 CPU.
The high priority thread is waiting on a resource held by the low priority thread. There is a medium priority thread consuming all cycles on CPU0. Every timeslice, the OS will examine the ready to run threads, notice that the medium priority thread is ready to run, and dispatch it. Because the low priority thread is never given a timeslice, it will continue to hold the lock, blocking the high priority thread from ever completing.
This is the priority inversion problem.
thanks for the lecturing. now re-consider again what i wrote to you in my previous post and try to comprehend it (because if you had done this the first time you wouldn't have written the above).
Now consider the case in which you have 3 threads and 2 CPUs.
The high priority thread is waiting on a resource held by the low priority thread. There is a medium priority thread consuming all cycles on CPU0. Every timeslice, the OS will examine the ready to run threads, notice that the medium priority thread is ready to run, and dispatch it onto CPU0. Then it will see that the low priority thread is ready to run, and dispatch it onto CPU1.
says who? who said the lower priority thread was cpu-agnostic? you assume too much. how about cpu affinities? or you conveniently exclude them out of the scheme?
Because the low priority thread is given a timeslice, it will run, and will eventually release the lock, unblocking the high priority thread.
no. the only condition stated in the problem was that the middle priority thread competes for cpu with the lower priority one, you cannot assume there's conveniently a spare processor where you can run the latter one. pleace try to look more seriously at this problem.
If you have exactly 6 threads assigned to 6 hardware contexts which are executing concurrently, you will never get a priority inversion problem -- a thread that holds a resource will continue to execute until it releases the resource, allowing the other thread to acquire the resource. There is no starvation precisely because the OS scheduler is not involved and there is no prioritization of threads going on, they're all running concurrently.
yes, in the best possible theoretical setup. in practice, though, those 6 threads will have _some_ contention somewhere among themselves, thus some of them will be blocked now and then, thus you may want to get _some_ lower-priority work done on those hw context during that time (only if you care about good cpu utilization, of course), thus you will get the os scheduler involved, thus we get back to the place where we stated from with this problem.
It's stupid to blow 1000s of cycles on a software context switch for a high performance thread. For high performance threads you want to dedicate a hardware thread to it. For low performance threads, just pack them all onto one hardware thread to avoid scheduling them with your high performance threads and messing them up.
alright. so it turns out all you did not totally neglect the thread affinities. that's actually very good. now you can get back to the priority inversion problem and re-consider it.
You can create 5 threads, then use SetThreadAffinityMask() to lock them to the 5 hardware threads that the system provides. From then on, those 5 threads will always execute on those 5 hardware threads and the system will never move them to other cores or cause them to preempt each other in software. Then you create all your low performance threads, lock them to the remaining hardware thread, and those get software threaded automatically by the OS scheduler.
well, good luck with parallelizing those 5 threads so that they never randezvouz. chances are they will, in which case you may want to utilize their hw context somehow.
You can vary this ratio or arrangement however you like. You can create 3 threads and lock them to the 3 cores and not use SMT or software scheduled threads at all. 4 threads, two on one, and one each on the remaining. Or 10 threads, 5 on the hardware threads, and the remaining 5 software scheduled on the last hardware thread. You have the flexibility to set it up however you want.
you have the flexibility, yes. our original argument was about 'ease' and 'correctness' and their relation, though. we never questioned the flexibility of smp mulithreading - it's flexible, alright.
Regarding cache line evictions, remember, going out to main memory is going to cost 100s of CPU cycles. So the first cache eviction that thread A hits will cause it to block, and allow B to execute -- remember it will be 100s of CPU cycles before the data comes back from memory and actually kicks out the cache line that B needs -- even if it does, which isn't likely -- because the cache will probably choose a colder cache line to evict than the one that B just spent a hundred CPU cycles working with.
In any case, cache eviction is a performance issue, not a correctness issue. You don't have to worry about this when you're just trying to get your multithreaded code to work -- it will run just fine, but slowly.
sorry, i though it was you who brought to this argument the many things you have to worry about with smp 'deviations', say, accessing memory locales under numa.
You can always go back later and clean it up and optimize it and rearrange your data structures, add the prefetching, and cache locking and whatever, whereas with stuff like DMAs and LS and asymmetric threads, you have to be upfront and all of it has to be right and working before your program can start doing any useful work.
aha. same with smp. try doing useful work with priority inversions ; )
Last edited by a moderator: