Lessons Learnt from the Challenger Disaster

Oct 5th 2004

From The Pleasure of Finding Things Out

by Richard Feynman

Following the destruction of the Challenger Space Shuttle on January 28, 1986, a commission was formed, led by Secretary of State William P. Rogers. The scientist on that commission was Richard Feynman. His report was almost suppressed by commission as it was seen as embarrassing to NASA. In an extract from this report, he talks about the differences between bottom-up and top-down design.

The usual way that such engines are designed (for military or civilian aircraft) may be called the component system, or bottom-up design. First it is necessary to thoroughly understand the properties and limitations of the materials to be used (for turbine blades, for example), and tests are begun in experimental rigs to determine those. With this knowledge larger component parts (such as bearings) are designed and tested individually. As deficiencies and design errors are noted they are corrected and verified with further testing. Since one tests only parts at a time, these tests and modifications are not overly expensive. finally one works up to the final design of the whole engine, to the necessary specifications. There is a good chance, by this time, that the engine will generally succeed or that any failures are easily isolated and analyzed because the failure modes, limitations of materials, etc., are so well understood. There is a very good chance that the modifications to the engine to get around the final difficulties are not very hard to make, for most of the serious problems have already been discovered and dealt with in the earlier, less expensive, stages of the process.

The Space Shuttle Main Engine was handled in a different manner, top down, we might say. The engine was designed and put together all at once with relatively little detailed preliminary study of the materials and components. Then when troubles are found in the bearings, turbine blades, coolant pipes etc., it is more expensive and difficult to discover the causes and make changes. For example, cracks have been found in the turbine blades of the high pressure oxygen turbopump. Are they caused by flaws in the material, the effect of oxygen atmosphere on properties of the material, the thermal stresses of startup or shutdown, the vibration and stresses of steady running, or mainly at some resonance at certain speeds etc.? How long can we run from crack initiation to crack failure, and how does this depend on power level? Using the completed engine as a test bed to resolve such questions is extremely expensive. One does not wish loose entire engines in order to find out how a failure occurs. yet, an accurate knowledge of this information is essential to acquire a confidence in the engine reliability in use. Without detailed understanding, confidence cannot be attained.

A further disadvantage of the top-down method is that, if an understanding of a fault is obtained, a simple fix, such as a new shape for the turbine housing, may be impossible to implement without a redesign of the entire engine.

Any seasoned game developer will see the analogies between the processes described and our work. Thankfully, we do not find ourselves in situations where people’s lives depend on our decisions, but the complexity and expense of game development certainly has a lot in common with medium to large engineering projects.

I obviously favour a bottom-up approach to game design. Game technology development is generally already produced in a bottom-up manner, as coders tend to be au-fait with the techniques Feyman describes. What designers need to do, I think, is examine this type of process and apply it to their field of work.

The materials of game design are obviously the primary low-level elements of interaction in the game. These fast, simple actions and reactions form the basis of all the players experiences and as such should be treated with as much respect and time as the basic materials in an engineering project. Feynman uses the term ‘experimental rigs’ which is an idea that can be directly applied to game development. It is usually quite easy to create a small prototype of an interaction or action in a cheap and cheerful manner in order to stress-test your concept. This might be a ‘blue room’ to test character movement, or a shooting gallery to determine weapon accuracy cones. It could be a text-based app that simulates population growth or character development.

Like the aeronautical engineers, the next stage is to use those selected materials in the manner of what Feynman calls ‘larger component parts’. These are usually objects in which two or more smaller components interact (for instance bearings in a bearing housing, or a sprocket threaded to a spindle). Our game design equivalents would also be instances where low-level actions and interactions are combined. For example the ‘blue room’ may get an enemy dropped into it, or a series of pickups or weapons placed. The shooting gallery could be combined with character movement or the weapon attached to a vehicle. The text based population growth app could be combined with the interactions of individual population members in combat or mating.

As soon as we being to combine elements in a bottom-up manner, we can have a high degree of confidence in the solidity of the individual elements, and be pretty sure that any anomalies are now the result of the interactions between elements, rather than intrinsic problems with the elements themselves. Because the system examined is simple, we can get to the root of the issue, resolve it, and find ourselves with another confident, solid system.

Now, as we move up a level, it is possible to look at actual component parts of the game as a whole. Levels/environments/stages/tracks etc. We know we have solid, dependable systems, that are fun to use, easy to understand and elegant to control. As we build levels, we can be pretty sure that if the game suddenly looses something, this is the result of the level content, rather than the systems. Because we have proven our low-level elements, we can avoid the all-too-familiar meeting where it is decided the missions aren’t fun, therefore we need to examine the game systems. If we isolate each stage of the process, like a scientist, we can much more accurately determine the cause of the problem.

Finally we have robust elements, interacting correctly, and set in enjoyable, entertaining environments. We can bring the levels together and examine the game as whole. Hopefully any changes made to the flow and structure of the missions/stages/tracks will be evident here, and we will not need to edit individual missions, or add or remove low level systems. We may need to edit the content, to ship on time, or to make the game consistent, but we can be sure that those dropped levels were good individually, just not appropriate for the entire game object.

Just as Feynman says, the disadvantages of top-down design are rather familiar – expensive re-iteration, dangerous lack of knowledge about the interaction of subsystems and inflexibility in the design when problems are found. All of these things are costly, demoralising, but thankfully in our case not life-threatening. If you follow the bottom-up approach, however, I believe they are really quite easy to fix.

Advertisements