November 2021 – Journal of the System Safety Society

This morning I was sitting in the pre-dawn morning enjoying a cup of coffee and reading a little book about some of the Buddha’s teachings called the Abhidharma.  (The Abihdarma is an ancient philosophy concerning the nature of mind.)  I am not a “scholar” of Buddhism (or any other “ism” for that matter), but now and then I enjoy contemplating things along these lines, or perhaps modern physics – they are both just about as difficult to grasp.

The house had a bit of chill, so I decided to make a fire in the wood stove instead of just turning on the heater.  I went out to the wood stack to get a firewood, gathered some kindling and preceded to make a fire – I thought.  I laid the fire with care; positioned a bit of crumpled newspaper as an ignition source, carefully positioned the logs and kindling.  I lit the fire, which flared up nicely and settled back down to read my book. 

After a few minutes I noticed that my nice cheery fire was out!  I got up and found that my newspaper had burned up nicely, but that was about all that happened.  So I preceded to put more paper in, rearranged my kindling, and tried again.  This time it got the point where I could hear the satisfying crackle of wood burning.  I was sure of being successful – only to shortly discover that while the logs had ignited, that quickly died down to just being glowing embers.  I had failed once again.

Not wanting to give up at this point I went back to the stove to see if I could find a better solution.  While looking at my layout I recalled a lesson that my older brother had given me decades ago, “The logs need to be close to a partner log so the air flows briskly between them and the heat of the fire radiates to their partner log.”  I realized that I positioned the logs too far from their “buddy” logs for this to occur.  I moved one of the logs about a half inch closer to the other and went back to my chair to see what would happen.  Within a very short amount of time I had a really pleasant fire – meeting my intention perfectly.  About that time my wife came into room and complemented me on building such a “romantic” fire!  I killed two birds with one stone that time around.

You might wonder what this has to do with System Safety, or my normal TBD offering. 

I realized that the fire building exercise might be an almost perfect analogy to what I have been hoping to foster within the International System Safety Society (ISSS), the System Safety profession or anything else applicable to this journal.  Let me try to explain the connection.

For the past few decades I have been hoping to do something to assist the ISSS grow to be an organization that is as important and influential as I know it should be.  I am convinced that the SS process is highly effective and efficient at reducing risks while adding important fiscal and social value to products and systems of all kinds.  I believe that it is the duty/role of the ISSS to foster that process and help expand it into all industries and processes, worldwide.  The dual approach of integrating engineering and management practices into the process of designing, implementing and “fielding” products and systems holds the promise of a better, safer, more environmentally appropriate future.  In short, I think it is BIG deal. 

However, over the years I have noticed a rather disappointing trend whereby we (the ISSS) continually go through waves of enthusiasm and discouragement.  Our history seems to be littered with groups of people, and individuals, who take up the task of “reinvigorating” (or perhaps vigorating) the ISSS, of expanding the scope into many industries, or otherwise promoting and providing training that matches the potential importance of the process.  Things get started, excitement builds to “do something”, meetings are met, papers are written – and then it dies down once again.  Our membership grows to over a thousand individuals, and then decreases back to a few hundred.  (It is my firm opinion that to properly reflect the importance of the approach the membership should be in the tens of thousands, rather than a few hundred.)

This brings me back to my experience with my wood stove.  Like the stove, we work at gathering the fuel, laying the fire, putting in the starter and kindling that we think is necessary, light the process and watch it blossom for a little while – and then die out again.  I have watched this happen three or four times in the past thirty years, it is a frustrating and disappointing cycle.  We keep looking for better logs, better fire starting materials, better kindling – we get out the bellows in an attempt to blow fire into the society – but with little on-going success.

Perhaps we have gathered the correct materials, perhaps we have them ready to go, perhaps we haven’t been wrong in our overall approach – perhaps we just need to make a small adjustment so that we create a chimney between the forces of supply and demand.  There is an obvious demand for the kinds of things that we do, hence the plethora of standards and guidelines based loosely upon the “system safety approach”.  These are created in many industries around the world – but they keep getting it wrong because while they like the ideas – they don’t see the entire picture of what it is we do.  They take pieces and parts of the process, but not the whole thing.  There is a supply of people (our members and those in the profession who are no longer members of the Society) with the skills and knowledge to make it happen – but they are unable to find effective ways to work with the demand.  We (the ISSS) are perfectly situated to provide training, expertise, mentoring leading to the skills and knowledge required to meet the demand.  However, we have been unable to get past the “hump” of making that happen.  Perhaps we just need to find the right thing(s) to shift a very small amount to get the fire burning vigorously – finally giving off the light and heat that we are offering to the world community.

I don’t know what that might be, it is not clear to me what “logs” need to be rearranged to bring this about – but that might be much more effective than our cycle of gathering the wood, laying the fire, and watching it dwindle.  Maybe we have the fire already lit – we just need to find a way to let the air get to the fire, and for the fire to bridge the gap.

Collapse of Champlain Towers South

The September 2, 2021 addition of the New York Times “The Morning” that landed in my inbox had an interesting article concerning the collapse of the Champlain Towers South Condo that resulted in 98 people dead.  While that event kept showing up on the daily news I couldn’t help but wonder what in the world went wrong?  Is there an important “System Safety” lesson to be learned?  If so, what might it be? The System Safety Society is supporting a NIOSH sponsored effort to explore ways to better implement “safety through design” for construction projects. The Times article provided a few enticing tidbits that might be worth mulling over a bit with regard to improving safety through design.

My first reaction was that perhaps it was the after-effect of using a type of high early strength concrete that was popular in the late 60’s because it gains strength early, maximizing the probability of passing the required 28 strength test, enhancing the profits of contractors by reducing the risk of concrete failing the tests a month after it was placed.  I researched this material for my father (who was a county road inspector) around 1968 because he had become concerned about its potential for long term deterioration in strength.  Portland cement achieves substantial strength within the first few weeks, but takes many years to reach maximum strength.  I found that the high early strength concrete peaks within weeks.  However, this type of concrete has a nasty characteristic that it is prone to losing most of its unpredictably losing strength rapidly, resulting in a history of catastrophic collapses.  The ancient Romans were aware of this failure mode and stopped using it thousands of years ago.  Unfortunately, my cursory on-line search over the last weekend has failed to identify this particular material or it’s poor history and I can’t recall what it is called.  So that line of reasoning is not likely to serve me well in my considerations about System Safety.  However, if they happened to have used this material, and if my recollections of its structural properties over time are correct, it certainly could fit into something that perhaps would have been found and avoided by a system safety effort.  For now I will leave that trail hang out as pure speculation.

The Times article brought up a number of other possibilities that are perhaps more germane to the subject.   They reported on a number of problems and speculated that perhaps they either caused the collapse, or contributed to the magnitude of the problem.  It appears that this event was most likely a chain of events that started with the failure of one structural element, transferring the load it was supporting to other elements,  thereby overstressing them resulting in failure, which then transferred that load to other elements resulting in a kind of domino effect.  The question of “cause” gets down to which element triggered the event and why, as well as why such an overload of one element could overload other elements to the point of failure of the entire structure? It is sometimes thought that many parallel structural elements provides safety through redundancy, perhaps they decrease safety by adding many additional failure opportunities.

The building has five main structural features in its design.  The foundation (1) consisting of a grid of driven piles that are capped with (2) a below surface parking garage.  (3) A grid of pillars support the next level that consists of (4) the multistory building and (5) a large ground level deck that featuring a large swimming pool.  

The Times article listed a number of “findings” that have turned up so far.  I have no way to just the veracity of the information, nor can I judge whether or not they even include the “cause” of the collapse.  I offer them here by way of discussion of a much more general problem concerning the appropriate scope and depth of system safety efforts when applied to a building of this sort. A partial list of problems they discussed includes:

  • During a 2018 inspection it was noted that the piles had problems with water intrusion
  • During the 2018 inspection, abundant cracking and crumbling of the support columns, beams and walls were noted in the underground parking garage.
  • Large planters (weighing tens of thousands of pounds) were installed on the deck were not specified in the design drawings. The article speculated that these may have overstressed the design because they were not included in the designs.
  • Several beams supporting the deck in the vicinity of the planters shown in the original designs were not included in the building.
  • The deck was designed to be flat without a slope to ensure drainage.
  • The waterproofing material on the surface of the deck had deteriorated and been replaced in a manner that trapped moisture rather than repelling it.
  •  One corner of the deck appeared to have little, or no, reinforcing steel.
  • Columns in the immediate vicinity of a cave-in may have started started of the collapse punched through the deck at the locations with little reinforcing steel.
  • The design of the columns called for splices being made at a particular height, creating regions within the columns that had too much steel for the amount of concrete.  This resulted in sections of the columns that were weaker than the design calculations indicated.
  • The rebar in the slab was located very close to the surface of the slab (3/4”), perhaps resulting in less that optimal performance (some engineers contend that the second layer of concrete rectified this deficiency.
  •  It appears that the number horizontal reinforcing rods that connected the deck to the columns was less than shown in the design.
  • There was an extra penthouse added to the top of the building that was not in the original design.
  • The 2018 inspection identified numerous locations on balconies with exposed reinforcing rods and crumbling concrete.
  • Water was leaking through the roof.  Repairs on the roof were under way the day before the collapse.
  • Large amounts of water were observed pouring into the underground garage, along with chunks of debris, the day before the event.
  • Video footage indicated that the collapse appears to have started with a hole caving in on the deck, then the rest of the deck collapsing, rapidly progressing to the middle section of the building, followed by the other wings.  The sections that remained standing appear to have been supported by the elevator shafts.
  • There was speculation that perhaps a vehicle ran into, and damaged, one of the support columns.

I have no idea which, if any, of these issues caused, or triggered, the collapse.  However, this rather extensive list got me to wondering which of these fall under the purview of “system safety”, and therefore which would have been avoided given a strong “design for safety” (system safety) effort.

My experience has been that “design for safety” in the construction industry tends to be limited to designing for safety during construction, perhaps extending to operating and maintenance personnel during normal operational phases of the project.  I have heard little about the safety of the design to meet its performance expectations (or safety requirements) especially under conditions of foreseeable change (such as installing landscaping features on a deck with a swimming pool).  The assumption seems to be that the design engineers/architects and building contractors take care of the safety as a functioning system through following sound engineering practices, dedicated high quality contractors and expert building inspection services.  Unfortunately, it seems that many problems during the “operational phase” of large structures such as the collapse of the Champlain Towers South Condo, can be tied directly back to problems in the design, construction and/or inspection. 

I leave it up to you to make a judgment about which of the listed shortcomings, if any, could have contributed to the collapse, and which would likely have been avoided with a strong system safety effort.   It appears that many of the issues had to do with incorrectly following the design and then not catching the deviations during inspections.  Perhaps these are outside of the scope of system safety – or maybe they are within scope.  There are other issues, such as the addition of large planter that changed loading above and beyond what was specified in the design documentation.  Perhaps this is the kind of “change” that should result in calling the design team, including system safety, back in a “change review” process.  There was probably a discussion about this change before it was made, I wonder if the knowledgeable people concerning the design were included in the decision. 

I found this disaster to be fertile grounds for considering when, or how, system safety expertise should be included in the process, and what sort of issues that effort is likely to uncover or identify.  It is interesting to speculate which of these problems could have been identified and mitigated during the design process, and later during reviews of proposed changes.  I think this is an important consideration if we are to positively impact the direction of the current attempts to introduce “safety through design” concepts into construction projects – safety through design has to reach much further than just the construction activities.  It needs to include the users, public, long term structural integrity (including the effects of foreseeable modifications), and the environment.  It is opinion that achieving safety throughout the life of a design requires an effective system safety effort.

March 2021 TBD

I have been noticing a definite “up tic” in the number of industry groups that are talking about the benefits of system safety.  Many of them don’t know that they are “inventing” an approach that has been successfully used for almost 100 years on millions of projects with a combined value of tens of trillions of dollars.  It seems that many of these groups believe they came up with the “new” idea that designing safety into projects is better, less expensive and results in fewer false starts than traditional safety approaches – not to mention that it is also more effective in reducing accident loses. 

System safety is an engineering process that starts as early as practical and continues throughout the project’s life until there is no longer value in continuing.   Conceptually system safety consists of three simple steps: (1) Identify potential hazards, (2) Control the risks associated with those identified hazards to acceptable levels, and (3) Repeat.  Over the past ninety or so years, the system safety profession has developed many tools and techniques to assist with that process.  It isn’t something that needs to be “invented”, it is something that can be learned. I am happy that people believe they invented an important new approach because that might finally result in them “buying in” to the concepts and the processes that have been shown to be highly effective in reducing accident rates and associated costs. 

There are a few places where a conversation with system safety engineers could help industries new to the ideas from going down some unproductive, and disappointing, paths.  System safety engineering has been in the business long enough that it has a lot of history and experience experimenting with ideas that just don’t work out.  One of the really big ideas that keeps coming up is that “risk assessments” can be used to determine what is “safe enough” under the misunderstanding that “risk” (safety risk) is quantifiable.  It seems like it should be quantifiable since it is expressed in terms of “probability” (a number) and “severity” (severity is not a number unless it is translated into a numerical value such as dollars).  “Severity” clearly does not mean dollars lost; it means something else such more closely aligned with pain and suffering than economic cost.  For example, how much is lost finger worth?  For the person paying for a lost finger it is commonly valued at around $2000.  I wonder what it is “worth” to the loser of the finger in terms of immediate pain and suffering plus the lost capability for the rest of their life.  While converting this kind of severity to dollars makes it easy to settle an insurance claim, I don’t believe it accurately reflects the meaning of “severity.”

The reality is that neither probability nor severity can be accurately determined in the messy and very cloudy “crystal ball” used to predict the future event(s) associated with any design decision.  For one thing, there is almost always of range of outcomes, each of which has a different probability of occurrence, even if they appear to be identical.  For another thing, assigning a dollar value to an outcome is arbitrary at best, capricious at worse.  Knowing how to properly add up the potential outcome multiplied by the probability combinations is fraught with difficulties that take more effort and research than is normally available. Assuming that it is possible to make this determination, the costs of accurately predicting the “risk” associated with any decision or design feature is so high that it is only attempted in cases of extreme risk, and then only to the point that everyone agrees that it is “good enough” to be used to guide a decision.

The idea that risk is somehow quantifiable, and can therefore used as the sole (or major) element in making safety decisions has resulted in many questionable decisions.  It certainly “feels” good to use a number as a surrogate for making a decision.  After all, this approach eliminates the cloud of being responsible for making a “bad” decision.  The risk acceptance criteria were made long before the actual situation was known, therefore they are somehow judged to be “dispassionate” and therefore “correct.”  However, once an undesired outcome occurs the problem of whether or not the correct criteria were used comes home to roost. 

Some common examples come to mind.  One thing that has always amazed me is the prevalence of railroad grade crossings in the United States.  These are those places where vehicles drive across railroad tracks with the only “protection” being either a sign, a sign plus a pair of flashing lights, or perhaps those signals plus the addition of a couple of thin wooden arms that block the traffic lane – but with sufficient space to easily drive around the arms to get across the railroad track even though the lights are flashing, and the arms are down.  That situation results in about 2000 collisions per year resulting in about 200 deaths in the United States.  There are also a number of spectacular collisions each year where a car started to cross with an “all clear” signal, but failed to complete the crossing before a train crashed into them.  This of course could be prevented by eliminating all such crossings, thus eliminating the event of a highway vehicle being on a railroad track.  It would be expensive, but Europe has managed to do that – they don’t allow railroad crossings.  It is all about the “value” of the risks involved.  Railroads aren’t liable for the cost of accidents like these unless a signal has malfunctioned – therefore they put all of their efforts and money into making sure the signals don’t fail.  As long as the driver has been “warned,” the responsibilities of actions to avoid the hazard are judged to rest with the driver.  In addition, my guess is that the “irresponsible” driver is also liable for costs that their “error” caused to the railroad. 

Is the cost of solving the problem worth the costs of the lives and injuries?   In the United States, the decision has been that the cost of the negative outcomes is “acceptable.”  I am not sure who it is acceptable to, but clearly those that are in a position to make that decision have done so.  The same answer was not reached in many other countries around the world.  Those countries have decided that the risks are not acceptable and it is the responsibility of the rail owners and the municipalities to pay for the protection as part of the costs of running their business, instead of injured parties paying in terms of their deaths and injuries (and destroyed vehicles).  The decision is about “risk” but it is not just about a number, it is more about a philosophy or point of view.  We have made a decision that it is not “feasible” to eliminate the risks of death at railroad crossings (the first priority in the hierarchy of system safety control); it is only feasible to implement the lesser levels of the hierarchy, placing the onus on the driving public to be “careful.” 

Another example that I find interesting has to do with the risk of falling when working on roofs.  In the United States about 150 deaths per year are caused by construction workers falling from roofs.  OSHA considers this to be within the top 10 “avoidable” causes of death in the construction industry.  I don’t have the statistics, but my guess is that perhaps ten times as many of the falls result in serious injury, but not death.  On a population basis we know that there are about 150 deaths per year.  We also know that each death costs the insurance companies about $150,000.  So, the cost of “risks” to the industry might be expressed as $22,500,000 per year.  The AGC (Associated General Contractors of America) has 27,000 members, representing a portion of the licensed contractors.  My guess is that there are perhaps 50,000 contractors that do at least some of their activities on roofs where workers are potentially exposed to fall hazards.  That means that the average contractor’s exposure to roof fall hazards is perhaps $450 per year (“shared” among them as part of the cost of insurance).   If labor costs $50/hour, that is about 9 hours per year per contractor. If they spend more than $450 (including lost productivity) enforcing fall protection provisions it is costing them more than it is saving.  The correct action to control their financial risk is to do nothing.

The risks in terms of cost and severity are biased in favor of not providing protection.  OSHA recognizes this problem; therefore they institute a system of fines for gross violations of the standards.  Thus the “risks” to the contractor are the risks of getting caught not following the regulations, rather than the actual risks associated with falling.  This turns the “risk acceptance” criteria once again into a social issue rather than a safety issue.  We have taken the position that it is acceptable to kill 200 people a year in railroad crossing accidents, but not acceptable to kill 150 people a year from falling off of roofs.  I am not making a judgment here, I am just pointing out that risk acceptance decisions are not as simple as just knowing the risk in terms of probability and severity.

It occurs to me that perhaps it is not feasible (and maybe not even possible) to design out the fall hazard when working on sloped roofs.  OSHA has a number of requirements (laws) concerning the use of various types of fall protection devices, mostly depending upon various systems of belts, harnesses, lanyards and ropes.  The problem with these are many fold, not the least of which is that it is extremely difficult (and dangerous) to do the necessary work while wearing these protective devices.  It is difficult and expensive to provide adequate attachment points (particularly for existing roofs), the ropes get in the way and create a lot of “working around” problems that put people in harm’s way, they make it hard and slow to maneuver, and do a poor job of protecting people from falling.  All of these problems, plus more, result in extremely low compliance to the regulations.  I often stop to watch how this is being implemented and find that either no fall protection provisions are provided – or, the rope lanyards are clipped directly back to the worker’s belt so that they are just an extra thing to carry around.  They are often not connected to anything. 

I have little hope in any system of fall protection devices or equipment being capable of providing anything near “continuous” fall protection on sloped roofs.  I think the only real solution is to avoid walking on sloped roofs.  There are a number of possible solutions to accomplish such as solution.  One solution might be to provide a mechanical device (crane type equipment) that provides a protected, level surface to work from.  This sounds good until you realize that the surface being accessed would have to be lower than the working surface, making it very difficult and inefficient to do work. Maybe some sort of robot could be developed to do the work in place of people.  With all of the advances in drones, autonomous machines maybe something useful, and affordable, will be developed – but this seems unlikely to me. 

Perhaps the only “good” solution is to avoid building sloped roofs.  Sloped roofs are only used because they are the accepted “style” in the United States.  There are many countries, and many places in the American South West, where the accepted style is a flat roof, usually with a parapet around the edges.  People use these rooftop spaces for many purposes besides keeping rain out of the house.  They have gardens and entertain on their roofs.  There is much value in terms of adding useable space, while virtually eliminating the risk of falling from the roof.   At one time sloped roofs made sense because available roofing materials were inherently “leaky” in the horizontal position, their function depended upon “shingling” the materials to let water run-off.  That is no longer a necessity, there are many cost effective solutions for constructing flat roofs.

Once again, the risk acceptance criterion is more of a social issue than a technical one.  Is it “worth” changing to flat roofs?  Who knows, it would save a few hundred deaths a year – but those that are saved are normally unknown strangers.  The decision is an esthetic one associated the look of a flat roof versus a sloped one; it is not about costs, risks, reduction in deaths or anything else.  It is all about what the building looks like.

The point of this is act as a “warning” to those that are new to the field of system safety (reducing risks to an acceptable level through design) that “risk” assessment and risk “acceptance” are not easily defined processes, and they do NOT remove people from the onus of having to make risk acceptance decisions.  The values that are created in the risk assessment process are useful to provide some information about the level of risk involved, but that information is far too sketchy and poorly understood to be usable as the risk acceptance criteria. At best it is a means to communicate an engineering judgment to the risk acceptors, at worse it is an unsupported guess.  It is just another piece of information, that used in conjunction with many other pieces of information, can provide assistance to the “risk acceptors” as to whether or not they can move forward with the design decisions.  System safety is a very powerful tool, but it does not answer the difficult question of “is it safe enough?”