Written by Dennis Haugh, Dave Morgan and Ron Scott

Everyone in the United States has now become painfully aware of how computer models can impact their lives. We can no longer ignore their existence. Every citizen needs to have a rudimentary understanding, not of modeling itself—but of when it can be trusted and when it cannot.

There is a fundamental question of knowing the future. In 2007, *The Black Swan** *explored the bounds of our knowledge in a world of uncertainty and introduced the “ludic fallacy”—using the past to predict the future.^{[1]} This paper puts some of the concepts from that book into the context of computer modeling.

The response to the Covid-19 pandemic response has largely been based upon the published predictions from models like the ones constructed by the Imperial College^{[2]} and the Institute for Health Metrics and Evaluation (IHME).^{[3]} Had these models not produced the fear they did, the impact of the pandemic would have been reduced significantly.

# What is Modeling?

Models mimic something in the real-world in order to generate key metrics for eliminating uncertainty.^{[4]} The simplest of models can be formulated by a spreadsheet, but most models are more complicated.

## Randomness

The first building block of most models is a *pseudo-*random number generator. It is important that the generation is repeatable but sufficiently spread as if it were really random.^{[5]} A coin with evenly weighted sides has fifty-fifty odds of landing on one side or the other.^{[6]}

Likewise, dice that aren’t “loaded” will exhibit a one-sixth probability of showing any one of six sides. Dividing the range into six equal subranges easily handles this simplistic model.

Putting games like solitaire into a computer is loosely like handling a sequence of proverbial “forks in the road” where each path is weighted by a probability. Slot machines and most other games in casinos have been replaced by such computer models.

## Queueing

The next building block of modeling is the waiting line, like waiting for a bank teller or a store checkout. If customers arrive more frequently than they are served, the line grows indefinitely. The overall system must provide equal or more capacity for service than the customer demand, and a model mimics the balance between arrival and servicing of customers.

Most models are networks of queues that mimic the behavior of a system. Customers in each queue are simulated via a distribution which produces an average *arrival rate*. The distribution of the time it takes to service a customer produces a mean *service rate.*^{[7]}* *Getting any reflection of reality with a model requires getting accurate representations for these distributions.

# Constraints and Sources of Error

A model designer is always presented with a seemingly limitless set of choices. The real world is too complex to model extensively, so simplifying assumptions are always made. The problem is selecting the best set of variables for the model. Taleb refers to this as *tunneling*, and it is an inescapable problem with modeling.^{[8]} There are two fundamental problems:

- The chosen variables might not affect real-world behavior as believed.
- There are always “hidden” variables left out that might have a profound real-world effect.

Taleb refers the latter problem as the *distortion of silent evidence*. The more complicated the real-world system is, the more the ratio between silent evidence and the included variables in the model grows. This means that the error of omission increases the more complicated the model.

## Garbage In, Garbage Out

The accuracy of a model depends upon the data it ingests. If the data are meaningless or flat-out wrong, the output of the model will be neither accurate nor precise. Even worse, there is no such thing as bug-free software.^{[9]} Software that is not peer reviewed cannot be trusted, but even peer reviews are not foolproof.^{[10]}

## Accuracy and Precision

The utility of the model imposes tradeoffs between accuracy and precision. A model that is more precise may be less accurate. This point was reinforced in 1999 by the econometrics “M3 Competition” run by Spyros Makridakis. The conclusion was that “statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones.”^{[11]}

There are at least four reasons a model runs into a problem with refinement:

- Lack of measurements. We just don’t know detailed metrics.
- Ambiguity of the data collected. Real-world events are not necessarily black and white, but data fed into a model must be. A vivid example of this phenomenon is determining cause of death in the face of a pandemic like Covid-19.
- We might not understand the underlying mechanism, but we can measure its effect. In this case, increased fidelity would increase the distortion of silent evidence.
- Compounding error. Some variables are derived from others. This can create more error than a model that simply uses empirical data with less detail.
^{[12]}

## Forecasting

It seems hard to believe that in the modern era of “science” that people would still believe in crystal balls, yet model forecasting is exactly that. The designer of an electric circuit knows the signal and noise when he deals with signal-to-noise ratios. When a model is used for forecasting, it is seeking the signal, and the distortion of silent evidence is unknown.

Part two of *The Black Swan** *is entitled “We Just Can’t Predict” and chapter ten is entitled “The Scandal of Prediction.”^{[13]} The titles pretty much say it all; however, Taleb presents concrete examples about how badly experts stink at making predictions.^{[14]} It is in our nature to demand certainty, but the longer term we seek, the greater the *forecast degradation*. To understand the phenomenon, watch how the weather forecasts change as the days get closer.

The tracking of hurricanes shows the process the best. The forecasted path of the hurricane starts along the Equator. As the hurricane proceeds westward, more lines become apparent that show paths generated by model runs. As the hurricane proceeds, the inaccuracies of modeling become apparent.

Modeling is curve fitting. We *might* have some luck with some short-term forecasting. It depends upon how well we guess the instantaneous slope of the curve, its rate of change, and the direction of that change. The further we get from the present, the rate of real change dictates how fast and wrong forecasts become.

# Utility

Models are useful whenever their inaccuracies can be tolerated. Without further understanding, the results from a model should be viewed at best as a coin toss. There should always be a statement of confidence in a range of results, and we should always be wary of *epistemic arrogance*.^{[15]}

Standard workloads that predetermine the arrival distributions are used in computer engineering models for incremental improvements. By so doing, the predicted improvements can be verified. The customers who make the purchase decisions want the results from the real products.

# Summary

When used for forecasting, models will never get it exactly right. The question of their utility rests in identifying how wrong they will be. Knowing the underlying assumptions is imperative to understanding the confidence in the model. Good decision-making should accept the results of any model with adequate skepticism.

At best, models are only good for short-term forecasting, and then the question is where the fuzzy line between short and long term is. The answer to where that line is depends upon the dynamics of the underlying process. But the results should always ring true to a practitioner—not an expert. In most cases, the intuition of someone who is competent in the field is probably going to be as good as anything.

*Dennis Haugh is a former Air Force officer, mathematician and computer* *scientist. He has created computer models for some of the world’s premier computer companies. He is the author of the Books The Road to Americanism and Political Vertigo.*

*Dave Morgan is a passionate believer in locality of control, personal liberty and good Scotch. He works as a Software Architect in the DoD contracting world.*

*Dr. Ron Scott is a retired Air Force colonel, combat pilot, and Director of the Pentagon’s Air Force Operations Center; retired university professor and principal scientist for Applied Research Associates, Inc.*

^{[1]} Taleb, Nassim Nicholas. The Black Swan: the Impact of the Highly Improbable. Random House, 2010. Pg.122.

^{[2]} Denim, Sue. “Code Review of Ferguson’s Model.” Lockdown Sceptics, May 10, 2020. https://lockdownsceptics.org/code-review-of-fergusons-model/.

^{[3]} Ferguson, Neil. “Report 9: Impact of Non-Pharmaceutical Interventions (NPIs) to Reduce COVID-19 Mortality and Healthcare Demand,” March 16, 2020.

^{[4]} *Ibid.*

^{[5]} *Ibid.*

^{[6]} Determining which side is determined by dividing the range between zero and one in half.

^{[7]} Such queueing systems are described by a fairly simple technique referred to as Kendall notation. It is in the form of *A/B/c*, where “A” describes the arrival distribution of customer, “B” describes the service time distribution, and “c” is the number of servers. Modeling a single teller is usually done as an M/M/1 system. Adding another teller creates an M/M/2 system. A “general” (unknown) distribution is referred to by a “G”. At a very crude, high level, modeling the healthcare system is a G/G/c system, where “c” is the number of available hospital beds in the whole system.

^{[8]} Taleb, Nassim Nicholas. The Black Swan: The Impact of the Highly Improbable. Random House, 2010. Pg. 50.

^{[9]} In the 60s, Donald Knuth famously published a one-page piece of code as an example of “bug free” in the *Journal of the ACM.* By the time all the reviews were done, a dozen bugs were found – and there’s no guarantee there weren’t more.

^{[10]} Denim, Sue. “Code Review of Ferguson’s Model.” Lockdown Sceptics, May 10, 2020. https://lockdownsceptics.org/code-review-of-fergusons-model/.

^{[11]} Taleb, Nassim Nicholas. The Black Swan: the Impact of the Highly Improbable. Random House, 2010. Pg. 154.

^{[12]} “Odds of Hospitalization, Death With COVID-19 Rise Steadily With Age: Study.” U.S. News & World Report. U.S. News & World Report, March 31, 2020. https://www.usnews.com/news/health-news/articles/2020-03-30/odds-of-hospitalization-death-with-covid-19-rise-steadily-with-age-study.

^{[13]} Taleb, Nassim Nicholas. The Black Swan: The Impact of the Highly Improbable. Random House, 2010. Pg. 135.

^{[14]} *Ibid*. Pg. 158.

^{[15]} *Ibid. *Pg. 138