Ricardo Galli explica cómo la NASA e ingenieros de VxWorks trabajaron en conjunto para solucionar un problema con el Mars Pathfinder y crear un parche de software binario que luego se envió por radio hasta el planeta Marte, o la actualización más lejana realizada a un sistema operativo










Ricardo Galli traduce.
Por lo demás, la traducción es muy interesante.
Hay que diferenciar entre “citar” la fuente y “traducir”. El post de Ricardo hizo lo primero.
“citar” o “traducir” lo sierto es que es muuuuuuuuuuuuuuuuuuuy breve la informacion y no muy precisa, expliquen algo mas, como: ¿que fallas tenian que hacian necesario la actualisacion?
Busque las diferencias.
Lamentablemente olvido citar todas sus fuentes.
http://inst.eecs.berkeley.edu/~ee249/fa07/RTOS_Sched.pdf
What happened
• The Mars Pathfinder probe lands on Mars on July 4th 1997 • After a few days the probe experiences continuous system resets as a result of a detected critical (timing) errorerror
Software Architecture
• Cyclic Scheduler @ 8 Hz • The 1553 is controlled by two tasks: – Bus Scheduler: bc_sched computes the bus schedule for the next cycle by planning transactions on the bus (highest priority) – Bus distribution: bc_dist collects the data transmitted on the bus and distributes them to the interested parties (third priority level) – A task controoling entry and landing is second level, there are other tasks and idle timeother tasks and idle time • bc_sched must complete before the end of the cycle to setup the transmission sequence for the upcoming cycle. – In reality bc_sched and bc_dist must not overlap
bc_schedbc_sched pp rr ii oo rr bc_distbc_dist ii tt yy
other tasksother tasks active busactive bus
The problem
• The select mechanism creates a mutual exclusion semaphore to
protect the “wait list” of file descriptors
• The ASI/MET task had called select, which had called pipeIoctl(),
which had called selNodeAdd(), which was in the process of giving the
mutex semaphore. The ASI/ MET task was preempted and semGive()
was not completed.
•• Several medium priority tasks ran until the bc_distSeveral medium priority tasks ran until the bc_dist task was activated. task was activated.
The bc_dist task attempted to send the newest ASI/MET data via the
IPC mechanism which called pipeWrite(). pipeWrite() blocked, taking
the mutex semaphore. More of the medium priority tasks ran, still not
allowing the ASI/MET task to run, until the bc_sched task was
awakened.
• At that point, the bc_sched task determined that the bc_dist task had
not completed its cycle (a hard deadline in the system) and declared
the error that initiated the reset.
• ASI/MET acquires control of the bus (shared resource)
• Preemption of bc_dist • Lock attempted on the resource • bc_sched is activated, bc_dist is in execution after the deadline • bc_sched detects the timing error of bc_dist and resets the system
The Solution
• After debugging on the pathfinder replica at JPL, engineers discover the cause of malfunctioning as a priority inversion problem.
• Priority Inheritance was disabled on pipe semaphores • The problem did not show up during testing, since the schedule was never tested using the final version ofschedule was never tested using the final version of the software (where medium priority tasks had higher load) • The on-board software was updated from earth and semaphore parameters (global variables in the selectLib()) were changed • The system was tested for possible consequences on system performance or other possible anomalies but everything was OK
Y sigue, y sigue, y esquemas y copy-pastes de otros documentos….
http://research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Authoritative_Account.html
http://feanor.sssup.it/~pj/rtos-arezzo/2005/mars_explorer.pdf
http://www.mvps.org/st-software/Movie_Collection/images/7775f.jpg
Andale asi se entiende mejor