
Robert Cox,* Bruce L. Bauer,+ and Thomas Smith#
ABSTRACT
An intercomparison of four mesoscale numerical prediction models that could lead to the selection of a model for use in the theater of operations by United States Air Force (USAF) meteorological personnel is described. Mesoscale numerical prediction models have matured, and recent advances in computer hardware make this a realizable objective. Two studies were launched to determine if a mesoscale model could be used operationally in theater and to select the model that produced the best forecast under simulated operational conditions. Of prime concern was not whether the model could produce reliable forecasts in data-rich areas, but how well the models operated and thus produced forecasts in data-sparse areas. The first study did an overall review of the available mesoscale numerical weather prediction models resulting in a general ranking of the models by expected forecast ability and operational maturity. At the conclusion of this study it became apparent that a more in-depth analysis was needed to distinguish among the higher-ranking models. Thus, this study was initiated.
This study compared four models for quality of forecasts in different climate regions in the world. Two are considered state-of-the-art models that could easily be made operational. These are the Pennsylvania State UniversityNational Center for Atmospheric Research Mesoscale Model 5 (MM5) and the Colorado State University Regional Atmospheric Modeling System (RAMS). The third model was the Navy Operational Regional Prediction System Version 6 (NORAPS6), the navy's operational regional forecast model. The fourth model is the current USAF mesoscale model, the Relocatable Window Model (RWM), that was used to provide a baseline of the current USAF capability. The models were scored by comparing the forecast values with observations. The relative ranking of the models varied with parameter, but overall, the rank order was RAMS, MMS, NORAPS6, and RWM. The score disparity between the models was not large.
1. Introduction
Air Force Weather (AFW) identified a set of "theater" weather forecasting requirements (a theater is nominally a rectangular region 28006000 km on a side) that could be met by a central or distributed forecasting system (headquarters USAF 1992). The United States Air Force (USAF) currently produces theater forecasts centrally at the Air Force Global Weather Central (AFGWC) using the Relocatable Window Model (RWM), a modified version of the Quasi-Lagrangian Nested Model (Mathor 1983), developed at the National Centers for Environmental Predictions (NCEP), formerly the National Meteorological Center (NMC).
Central forecast production allows the use of large central computer systems and allows the USAF to focus its numerical weather prediction (NWP) personnel at one location. The advantages of producing a forecast in theater are the ability to use locally generated observations, production of forecast products tailored for the local unit(s), and reduced demand on existing communication systems. Military units often operate in regions without normal meteorological observations and generate their own data or make use of existing friendly force data. The use of such data could improve the forecast product and much of this data is not reported centrally. Locally generated forecasts would also be responsive to local needs in terms of forecast times, resolution, and regions. These forecasts would supplement standard centrally generated forecast products. The increased resolution inherent in mesoscale modeling significantly increases the amount of data produced. Production in theater reduces the demands on communication capacity, which may be in use by other elements in theater and can be very busy during an active conflict. However, it increases the computational demands in theater. The requirements levied on the military meteorologist require accurate, reasonably fine grain forecasts that can be accomplished on existing hardware in the field. In response to this situation, the USAF began development of the Combat Weather System (CWS) that was to be an integral part of a command and control system for the theater commander. The challenge facing the CWS development was the selection of a forecast model that could be operated in theater on available workstation class computers.
To meet this challenge, AFW and the Defense Special Weapons Agency (DSWA), formerly the Defense Nuclear Agency, sponsored a project to select a Theater Forecast Model (TFM). There were two main objectives of the TFM effort. The first objective was to identify a candidate numerical weather prediction model, suitable for theater use, from currently available models. Once the model was selected, it was to be repackaged to meet the needs of forces in theater. The search to select the TFM was conducted in two phases. The first was a USAF-sponsored study performed by the Dynamics Research Corporation (DRC) and published as the Combat Weather System Technical Alternatives Study (DRC 1993). The second study is the focus of this paper and was the forecast skill comparison of a select few of the models identified in the DRC study.
The DRC study was based on published literature, interviews with developers, and used a set of previously developed customer requirements (DRC 1992). The literature searches led to a preliminary list of 31 candidate models, and the study considered 10 in some detail. The review resulted in a ranking of the models based on the technical forecast requirements and a somewhat harder to define operational maturity.
Since the DRC study, several other modeling systems have been developed, or are in development, that offer promise for numerical weather prediction. For example, the Advanced Regional Prediction System (ARPS), which not only operates on a Cray class machine and has variations for workstations and clusters, but has been ported to a parallel processing machine (Droegemeier et al. 1995). Two of the models, Mesoscale Model 5 (MM5) and Regional Atmospheric Modeling System (RAMS), considered in this study have subsequently been adapted to run on the new multiprocessor workstations. As the state of the art progresses it is imperative that studies like this are repeated.
The DRC study concluded that no model could currently meet every requirement, but models could approach the required accuracy if the observational data with appropriate spatial and temporal resolution were available. It also found that the available models spanned the development spectrum and support base. Most of the models are under continual development and are primarily used as research tools. A few models are currently operational such as AFGWC's RWM and the Navy Operational Regional Prediction System Version 6 (NORAPS6). The report recommended the selection of one of three models: MMS, NORAPS6, or RAMS. The report noted "These three models are designed for and are good at meso-a and meso-fi scales and have, or will have, some capability at meso-y scale. They are relocatable and have been employed in various locations (including the Tropics) worldwide. They are robust and they all exceed RWM baseline by a large margin. They all have strong, continuing support bases. They have, or will have, options that can be used in various situations or permit tailoring to available computer power" (DRC 1993).
This study began where the DRC study left off. The focus would be on the three candidates identified and would compare their forecast skills in several different regions of the world under different forecast scenarios and computational platforms. The results would provide valuable information about the model's forecast skill in a wide variety of operational situations. This paper presents the results of that comparison. Four models were compared and rated based on relative forecast accuracy. RWM was included in the study to provide a reference to the current capability. To justify a change, one of the three alternative models would have to exceed the RWM's forecast skill.
A previous comparison study focused on the ability of mesoscale prediction models to make accurate forecasts (Pielke and Pearce 1994). As noted in that study, "modelers were free to decide their own initialization and data assimilation procedures, so that, inevitably, the results represent differing levels of sophistication of external forcing. Some modelers were able to carry out a series of experiments to determine the most appropriate values of some assigned model parameters. . ." (Busch et al. 1994).
The guiding principal behind the current study was to perform a pseudo-operational comparison under standard conditions using identical data. Expected field operational constraints on forecast area, resolution, and data available were imposed. Once the models were set up, no changes were made in their configuration, and they remained the same for all forecasts, independent of the geographical region. The models used were obtained from the sponsors of the models and reflect the "standard" model available at the time the study began. Since many of the parameterizations used in the models were scale dependent, careful selection of model options was required. These selections and output from the first case were reviewed by the model's authors to ensure that the correct options had been selected. Two of the models, MMS and RAMS, were under active development, and new algorithms and capabilities were being added as they were completed and tested. Since the focus of this study was to operate the models as if they were operational, no attempt was made to access these research capabilities during the evaluation of the models. The infusion of new technology into operations would require further extensive testing and evaluation.
2. Data handling and case selection
The USAF requirement to DSWA was for a model that could be run in theater. This implied a requirement to size the forecast resolution such that the model could be run on available workstations. The horizontal grid spacing used was 46 km to ensure the models would run in a reasonable time. This horizontal spacing was the maximum desired in the USAF requirements. The workstation used was a UNIX IBM RS/6000 Model 370. Since each candidate model could be run on a UNIX-based mainframe Cray, the decision was made to use the Cray Y-MP at Los Alamos National Laboratory to speed completion of the study. All testing was performed on these two platforms.
The configuration of models and input datasets were the same on both the workstation and the Cray. The models and options selected are described in the next section. In general, the most applicable set of physics parameterizations were selected to allow each model to produce the best results given the resolution restrictions of the study. The models were not optimized for speed. Each model has a data ingestion program or routine associated with it. In discussions with AFW personnel, it was decided to test the model "system." As advances in data ingestion and analysis are made, those advances would be evaluated and implemented through a technology improvement program. These ingestion routines often manipulate the data and boundary conditions to accommodate the needs of the model. To prevent problems associated with such potential manipulation, each model's native data ingestion routine was used to construct the initial fields from the observations and first-guess fields.
Each of the data ingestion routines were outwardly similar in that they analyzed file data containing surface and upper-air observations onto a first-guess field obtained from a global model. The use of the native ingestion routines led to another level of complexity in comparing the models as the ingestion and analysis routines were sometimes not as current as the model. Since two data sources were used in this study, routines were written to convert the raw data into a format acceptable to each model's data assimilation routines. Some modification of the data ingestion routines input file reading routines was required, especially for NORAPS6, as that model was designed to read the navy's integrated data structure. The only other modifications were to the output routines, and this was restricted to removing any smoothing of the output gridded data fields. The internal gridded data were not affected, and the basic forecast module remained as received.
Data were obtained for three, 3-day periods in five regions. Two independent 36-h forecasts were made during each 3-day period. The first forecast was initialized at 0000 UTC, at the beginning of the first day, and the second at 1200 UTC, a day and a half into the 3-day period. Each forecast was 36 h in duration, and verification was accomplished every 12 h (Fig. 1). Verification and scoring were accomplished by comparing the forecast data to the observations. Two statistical scoring techniques were used and are discussed later in this paper.
The first test, Atmospheric Variability Experiment-Severe Environmental Storms and Mesoscale Experiment (AVE-SESAME) I, was used to verify correct operation of the models on both the Cray and the workstation (Table 1). This case was accomplished to ensure that each model was implemented and operated properly. Each model was run on the workstation and Cray in exactly the configuration used to operate them during the remainder of the tests. This allowed validation of the procedures for operating each model and provided an extensive dataset for comparing their forecast abilities. It also provided specific information about model portability, any performance degradation on a workstation, and the integrity of each model-whether it produces the same results on the workstation as on the supercomputer. AVESESAME I took place on 10-11 April 1979. A vigorous short wave over Colorado induced strong cyclogenesis centered over eastern Colorado. The resultant strong-surface warm-air advection over the lower Midwest, combined with cold-air advection aloft and a favorable wind shear pattern, spawned a major tornado outbreak in Texas and Oklahoma. General thunderstorm activity was also observed over the Midwest.
The observational data (surface and rawinsonde) were obtained for the initial and each 12-h forecast time from the National Center for Atmospheric Research (NCAR). The NCEP (or NMC) global analysis 2.5deg gridded data fields were also obtained for these times from the NCAR data archives.
The remaining cases were composed of two test periods, one with data taken in August 1994 and one with data taken in November 1994.
The objective for case two (summer 1994) was to compare the forecast skill in four regions: Alaska, Central America, Korea, and the Middle East. Two of the regions were selected as potential operational regions and two to provide operationally interesting forecasting challenges.
The period from 0000 UTC 16 August 1994 to 0000 UTC 19 August 1994 was selected for this comparison. From the 500-mb hemispheric maps (Fig. 2), a short-wave trough crossed over the Korean Peninsula by 0000 UTC 18 August and a separate short wave passed over and exited Alaska by 0000 UTC 19 August. Observational and gridded data were again obtained from the NCAR archives. The same time period was used for the Middle East and Central America, although no preferred days were noted in these latter two regions.
Case three data were collected in November 1994 from the AFGWC. Data were obtained from the AFGWC real-time in a fashion analogous to the way it would be obtained in an operational situation. Since a data cut off time was employed, the number of observations is generally less than those used in case two. The 2.5deg gridded data were obtained for each forecast time from AFGWC' s Global Spectral Model (GSM). Two GSM forecasts were used, paralleling the forecast times used in this study. These forecasts were used by the models' analysis routines to set the boundary condition at times after forecast initialization. Observational data was only used in the initial analysis for each forecast and for scoring. The same four regions or theaters were investigated.
Maps prepared from the gridded data files on 1200 UTC 14 November 1994, 500 mb (Fig. 3) and surface maps showed a strong trough passed over Korea, leaving a prominent high-pressure system with considerable thermal contrast across the peninsula. An impressive surface low was present just off the Alaska coast, and a strong temperature gradient existed across the state. Relatively benign conditions existed over the Middle East. The weather highlight for these five geographic areas was a strong cyclone off the north coast of Cuba that produced mainly northeast flow aloft over the Central America theater.
3. Model descriptions
a. General
Each model was configured using the best parameter options available in the standard distributed version that met the requirements set for this study. Experts on each model were consulted on the configuration (discussed later). Candidate models can be grouped by their current use. NORAPS6 and RWM were used in operational settings and had few options to select different physical algorithms. Their options were more related to types of data output available. MM5 and RAMS have more of a research background and had numerous optional physical algorithms. In addition, they have many of the same types of options for data assimilation and output. The four candidate models have existed for some time, or are new upgrades of existing models.
RAMS, MM5, and NORAPS6 have much in common: they are all three-dimensional, primitive equation, relocatable, regional mesoscale models; they all use staggered grids, terrain-following vertical coordinates, and four-dimensional data assimilation using nudging; and they include many similar parameterizations for subgrid mixing, cumulus, and radiation. There are some differences, particularly between NORAPS6 and the other two models.
The NORAPS series of models, developed by the navy for their own operational use, focused on oceanarea forecasts. This distinction is reflected in the models; NORAPS6 models sea ice, but RAMS and MMS do not. RAMS and MM5 both model numerous types of land use, but NORAPS6 has no land-use model.
Two of the models that were tested (NORAPS6 and RWM) do not have options for a nonhydrostatic equation set. A nonhydrostatic model is critical in the simulation of a variety of buoyancy-driven atmospheric features. For the grid scales used in this study (46 km), neither buoyancy-driven or the oscillatory circulations would be resolved. A large thunderstorm may only be 10-20 km in diameter, and since an NWP model needs to have a circulation defined by at least four grid spaces to be handled adequately, even runs with a grid spacing of about 10 km will not be sufficiently fine to resolve these circulations. A lee wave train (an example of an oscillatory circulation) usually has a horizontal wavelength of about 20 km, so these will not be resolved well either. At the resolution used, the potential advantages of a nonhydrostatic model will be minimized.
The objective of these tests was to compare forecasts produced by the various models. To accomplish these comparisons, the computational domains were made as similar as possible. There was no common map projection available for all the models, so projections were chosen that were appropriate for each theater and as similar as possible. RAMS uses a rotated polarstereographic projection that "rotates" the pole point of the projection to the center of the computational domain and allows the use of the same projection with a minimum of distortion. MM5, NORAPS6, and RWM used two projections to minimize distortion. The three models used a Mercator projection for the Central American region and a Lambert-Conformal projection for the remaining areas. The Mercator projections differed slightly as MM5 and NORAPS6 are hardwired to use the equator as the standard latitude, and RWM uses 22.5degN. The same computational domain or grid was used in all the tests discussed. A 71 x 71 computational grid with nominal 46-km spacing was used. The computational regions are displayed in Figs. 4a-e.
As noted above, the candidate forecast models can be operated with a number of different optionshydrostatic, nonhydrostatic, sponge boundary, transmitive boundary, etc. As these tests were more concerned with capability than with speed, the most sophisticated level of parameterization available was selected based on the problem set and the developer's recommendations. [The model configurations were reviewed by Hodur (1995) for NORAPS6, Chen (1995) for MMS, Tremback (1995) for RAMS, and Crasner (1995) for RWM.] Although four-dimensional data assimilation was an available option, it was not used in the comparison tests. Each model will be described briefly below.
b. Fifth-generation Pennsylvania State UniversityNCAR Mesoscale Model
MM5 is a regional-scale primitive equation model that can be configured hydrostatically or nonhydrostatically (Grell et al. 1993; Gill 1993). It uses a terrain-following coordinate in pressure and solves its finite-difference equations with a time-split scheme using the leapfrog operator. Multiple, moving, overlapping nesting capability exists with two-way interactivity and predefined nest ratios of 3:1. Its boundary-layer physics package can be either a simple bulk aerodynamic parameterization or a more detailed scheme based upon a revised version of Blackadar's Planetary Boundary Layer (PBL) model (Zhang and Anthes 1982). The atmospheric radiation option provides longwave and shortwave schemes that interact with the atmosphere, including cloud and precipitation fields as well as with the surface (Dudhia 1989).
Large-scale and convective precipitation modules were included in the model, and large-scale processes were treated explicitly. Marshall-Palmer size distributions were assumed for rain and snow, and solid and liquid water were allowed to coexist. Options for deep cumulus convection include parameterizations based on Kuo (1974) and a modified Arakawa-Schubert (1974) scheme that includes moist convective-scale down draft (Grell et al. 1991).
MM5 has polar-stereographic, Lambert conformal, and Mercator projection options. The Lambert conformal projection was used for the continental United States, Korea, the Middle East, and Alaska domains, and the Mercator projection for the Central America region. MMS permits any number of vertical levels to be used. The imposed resolution constraints limited these to 23 layers for these tests. The vertical levels selected are sigma^sub p^: 1.00, 0.99, 0.98, 0.96, 0.93, 0.89, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05, and 0.0. The options selected are summarized in Table 2.
MM5 has a long history dating back to the mid1970s at The Pennsylvania State University (PSU) and continuing through the present with development at NCAR, PSU, and other locations. At the time of this study, no code was available for the "preprocessor" data ingestion routines that would allow them to work on a workstation. As a result, the model was only run on a Cray computer. NCAR indicated at that time that they were developing such code, and a full workstation capable version should now be available.
c. Navy Operational Regional Atmospheric Prediction System Version 6
NORAPS6 is a regional-scale, primitive equation, hydrostatic model that uses a split-explicit time integration scheme to predict dynamic and thermodynamic variables. Its grids use a terrain-following vertical pressure coordinate system. Leapfrog time and space differencing are used with a fourth-order advection scheme. Surface fluxes are modeled according to Deardorff (1972). The PBL is assumed to be wellmixed, so all model levels within the PBL are replaced by average values through the PBL each time the model physics routines are called; presently, the model physics routines are called every nine dynamic time steps.
Large-scale and convective precipitation modules were included in the model, and evaporation of falling precipitation is allowed to occur. Its deep cumulus convection scheme was based on the Kuo scheme, which allows deep moist convection to occur when the moisture convergence exceeds some specified threshold. Cloud temperatures and mixing ratios follow a moist adiabat, thereby representing a nonentraining cloud (Hodur 1987).
NORAPS6 uses polar-stereographic, Lambert conformal, Mercator, spherical, and Cartesian projection options. The Lambert conformal projection was used for the continental United States, Korea, the Middle East, and Alaska domains and the Mercator projection for the Central America region. The model currently uses 36 vertical levels in its operational calculations, although the navy previously used 21. These 21 levels guided the selection of the levels used for these tests (R. Hodur 1994, personal communication). The vertical levels selected are (sigma^sub p^): 1.00, 0.98, 0.95, 0.915, 0.875, 0.825, 0.775, 0.725, 0.675, 0.625, 0.55, 0.44, 0.33, 0.22, 0.145, 0.095, 0.065, 0.035, 0.015, 0.005, and 0.000. Because NORAPS6 was developed as an operational model, there are very few options in the input data controlling the physics. Options selected are presented in Table 2.
No significant changes were required to the model as received. Minor changes in the data ingestion program were required to replace navy-unique data queries. NORAPS6 was well-written and had very good internal documentation. NORAPS6 made liberal use of Cray FORTRAN extensions and required the use of either a Cray or a FORTRAN 90 compiler. The port to a UNIX workstation was easily accomplished. NORAPS6 was developed by the navy and is scheduled to be replaced by the Coupled Ocean Atmosphere Mesoscale Prediction System (COAMPS) model.
d. Regional Atmospheric Modeling System
RAMS is a regional-scale primitive equation model that can be configured hydrostatically or nonhydrostatically (Tremback and Walko 1994). The model uses a terrain-following coordinate in height, and its finite-difference equations are solved with a time-split scheme using an optional hybrid timedifferencing operator. Under this hybrid operator, the leapfrog operator is used on the velocity and pressure variables, while reserving the "forward-in-time" operator for all other variables. Multiple, moving, twoway grid nesting capability exists with two-way interactivity.
RAMS's boundary-layer physics package is based on the similarity theory. Its atmospheric radiation option provides longwave and shortwave schemes that interact with the atmosphere including cloud and precipitation fields, as well as with the surface. Largescale and convective precipitation modules were included in the model, and large-scale processes are treated explicitly. A range of options exist when specifying how the microphysics are modeled, and mixed-phase microphysics is allowed. There were also two options for the deep cumulus convection scheme. The options used in this study are given in Table 2.
RAMS has rotated polar-stereographic and Cartesian options. The rotated polar stereographic projection was used. It rotates the "pole" point of the polar-stereographic projection to any point on the globe, typically the center of the model domain, minimizing distortion. The vertical levels selected are shown in Table 3.
RAMS development dates back to the early 1970s, with a rewrite started in 1986. RAMS is copyrighted by Colorado State University.
Relocatable Window Model
RWM is a regional-scale primitive equation model that uses a quasi-Lagrangian advection scheme to predict u and v wind components, potential temperature, surface pressure, and specific humidity. It employs a single, unstaggered horizontal grid and uses a terrainfollowing vertical coordinate system.
RWM (Englehart et al. 1993) uses a basic boundary layer physics package in which bulk transfers of sensible and latent heat are allowed over the ocean when the sea surface temperature is greater than the air temperature, and surface friction is modeled with a terrain-dependent drag coefficient (Kopp et al. 1994). Large-scale and convective precipitation modules were included in the model, and evaporation of falling precipitation is allowed to occur. Its deep cumulus convection scheme is based on the Kuo scheme, which allows deep moist convection to occur when a saturated parcel originating in the lower atmosphere can rise through at least two contiguous vertical layers above the lifting condensation level. RWM did not account for land surface processes, diagnosed cloud fraction, and solar/terrestrial radiative processes until recently with the inclusion of the Swedish physics package (Unden 1982).
RWM has polar-stereographic, Lambert conformal, and Mercator projections available. The Lambert conformal projection was used for the continental United States, Korea, the Middle East, and Alaska domains and the Mercator projection for the Central America region. The vertical levels used were the same as those used for NORAPS6. This model was only run on the workstation with grid-dependent quantities prepared by USAF personnel at AFGWC.
RWM is a modified version of the quasi-Lagrangian Nested Model (Mathor 1983), developed at the NCEP in the late 1970s and early 1980s. There were two versions developed: one with and one without the ability to add a nested grid. RWM is the version without this capability. Only recently has a diurnal cycle and surface physical parameterizations been included in the model. Several of the model initialization routines only run on a Cray running the COS operating system and have not been ported to any other platform. AFGWC is currently the only user and maintainer of this model to our knowledge.
4. Results
The objective of this study was to support the selection of an operational model that would predict the weather condition associated with an event. The event might be a bomb release or the release of toxic material. The performance of each model was evaluated using several different error measures that reflect this objective. Generally, this was accomplished by comparing the data from the model to available observations at each forecast time. In all cases, scoring was accomplished on only the central 61 x 61 portion of the 71 x 71 computational grid. The approximately 230-km (5 gridpoint) band around the outside of the computational grid was not used for scoring to minimize potential boundary-induced errors. Two types of scoring statistics were generated. The first used mean differences at standard levels, and the second compared individual differences to a fixed criteria set.
a. Root-mean-square rank order
The root-mean-square (rms) rank order computed the rms errors for a given pressure level and forecast time. The 10 pressure levels used were surface, 1000, 850, 700, 500, 400, 300, 250, 200, 150, and 100 mb. The forecast values were compared to the corresponding observational data. Statistics were generated at initialization and at 12-, 24-, and 36-h forecast times. Two scores were kept for each model, one for the best comparison at the time and level and one for the worst comparison. Procedurally, once a forecast was accomplished, the computed data were interpolated to the observation points. Software that came with each model was used if it was available, or a routine was written to accomplish this conversion. Output from each of the models was a gridded data field written under a standard projection geometry. Since each model's forecast was made on similar grids, any bias due to the interpolation process should be nearly equal. A number of statistical parameters could have been and were calculated. Only two are presented here, root-mean-square error and root-mean-square vector error. They are where phi is a scalar variable, u and v are wind components, n is the total number of observations, subscript p = predicted, and a subscript o = observed.
b. Rmse performance by region
These rmse statistics were calculated for each model for the model-predicted variables: temperature, relative humidity, and vector wind. This approach rates each model's general performance without regard to any accuracy requirements for the various quantities. However, it does raise questions about which quantities should be compared. The set chosen was based on an attempt to define a relatively independent set of quantities of interest to military planners. However, sea level pressure and geopotential height could also have been used. Basing the comparison on independently derived criteria precludes defining a comparison set and is presented in the next section.
For each, case one (AVE-SESAME-1), case two (August 1994), and case three (November 1994), the model results were rank-ordered from the one that did best to the one that did worst for each of the three variables. Then a score was tabulated for each model indicating the total number of times it gave the best result and the total number of times it gave the worst result. The data from each case were consolidated, so both 36-h forecast results and each verification time have been combined within a theater. The consolidation was accomplished in an attempt to get a grand overall score that would be reflective of the model performance independent of the other parameters. Table 4 summarizes the results of this rank ordering.
As mentioned above, there are many ways to define a scoring strategy using this technique. Only a few of the quantities were scored. The three variables were chosen to be a relatively independent set, but other sets could be easily established. RAMS had the highest number of "best" agreements with data and the fewest "worst" agreements. An alternate scoring method was developed based on the accuracy desired.
c. Accuracy criteria performance
The weather data accuracy criteria (Table 5) were established by the USAF and DSWA for this study. These criteria were used as a measure of each model's forecast skill. Initially, all quantities were grouped together for each case and region to provide a net total of how often each model satisfies criteria as was done in the previous comparison. Table 6 expresses each model's performance as the percentage of times that its predicted value is within the error bounds specified.
RAMS consistently produced better results for the criteria in areas other than the continental United States. There are potentially several reasons for these differences. One reason could have been the skill of the analysis routine used to develop the initial gridded fields from the observations. The next analysis was accomplished to test this hypothesis.
d. Performance by forecast hour
In this section, the relative performance by forecast hour is examined. The stratification was done only by hour, not by region/hour. The relatively large number of surface observations, compared to upper-air observations, was responsible for developing two tables. Table 7 presents the surface comparison and Table 8 the upper-air comparison for each forecast period.
For the surface parameters at the analysis time (0 h), two distinct groups appear with MMS and RAMS with the superior scores and the other two models in the second group. (The comparison at the 0 h provides a score for the initial fields prepared by the model's analysis routine. The same data was supplied to each model.) However, by 12 h, a significant portion of the MMS and RAMS advantage had disappeared. These two models were still the top performers with RAMS performing better than MMS at each time for most variables. Only in predicting sea level pressure did MMS regularly outperform RAMS. NORAPS6 and RWM performed about the same throughout the forecast period except for dewpoint depression. NORAPS6 and RWM's predictions of dewpoint depression were very poor initially, but become competitive at later times. The poor dewpoint analysis by NORAPS6 is easily explained. NORAPS6 used the humidity data from the global gridded data and did not use the relative humidity data from the observations to generate the initial fields. Each model's skill at 12, 24, and 36 h was relatively stable with a small degradation with time. The performance scores (Table 7) are lower for the surface comparisons than the corresponding upper-air comparison (Table 8). These differences can probably be attributed to two factors. First, even though several of the models have "surface" output options, there will be differences between the actual observation height and the computed height. There was no standard method used. For instance, RAMS used the lower-cell parameters without correction, except for sea level pressure. This cell center is at 46 m and velocities at this point should have a small positive bias as the observations are usually taken at 10 m. The remaining models varied in how each computed the surface data. These output differences were probably obscured by the subgrid variability present in the real world. The observed data were affected by local structures, terrain features, trees, etc. With a grid size in excess of 40 km, none of these local features could be modeled.
A similar forecast accuracy time response can be seen in the upper-air comparison; however, all the scores are higher. Subgrid influences are much less of a factor above the surface. For each parameter and forecast time, RAMS produced the predictions that most often met the criteria. NORAPS6 often produced more accurate predictions than MMS in the upperair calculations. However, there were cases when every model had difficulty accurately predicting a parameter; specifically, the predictions of dewpoint depression, where the highest forecast accuracy seen was 29%. The initialization of the dewpoint depression varied widely in accuracy, and this may be reflected in these scores. This lack of consistency in upper-level moisture observations is most likely the reason for this poor performance. (Dewpoint was only validated at points where relative humidity data existed. The lack of this data at some locations above 300 mb is reflected in the reduced number of observations.)
The statistical data, when depicted by forecast time (Tables 7 and 8), indicate that RAMS and MM5 have better analysis routines. However, the statistical advantage of the better analysis is not obvious by 12 h into the forecast period. The scores at 1236 h were relatively consistent. The ranking of the models remained the same. Statistically, there appears to be no compelling reason to use a better analysis routine to initialize the forecasts. However, the largely nonlinear atmospheric processes modeled by these models makes this statement somewhat risky. During a software upgrade for NORAPS6, a minimal impact in forecast accuracy was noted as a result of improving the analysis (Hodur 1987). This somewhat corroborates the statistical results and is probably the reason NORAPS6 produces such balanced accuracy. There are more technically advanced analysis routines that all models could use to improve the initial accuracy and potentially the overall results. The above data suggests the actual forecast model is determining the accuracy score after some start-up period.
e. Performance in Korea
Analyzing the performance of these models in detail is difficult as the amount of data available exceeds what can be presented in this paper. Therefore, a single example was selected. The Korean region in August 1994 (case two) has the largest single set of observations and is an area of potential interest of USAF. This case will be presented in some detail.
Figures 5b-e present a comparison of the 24-h surface wind fields forecast by each of the models. An analysis of the NMC data at the same time (1200 UTC 18 August 1994) provides a comparison (Fig. 5a). Figure 2 shows the upper-air pattern at 0000 UTC. The entire computational grid is displayed, not just the central scored region. The richness of detail generated by the models is immediately evident. The analysis shows high pressure dominated the Korean region, with a northeasterly flow over the Sea of Japan east of Korea. The anticyclonical winds were seen south and west of Korea. Each of the models indicated that the northeasterly flow over the Sea of Japan was significantly modified by the mountains running up the Korean Peninsula. NORAPS6 (Fig. 5c) and RWM (Fig. 5e) forecast were in close agreement, while MMS (Fig. 5b) produced easterly winds over the Asian continent instead of southerly. RAMS (Fig. 5d) produced a southwesterly rather than northwesterly flow in the Sea of Japan. This implied that RAMS was moving the system slower than the other models. The data analysis showed a large low-pressure system moving in from the west. All the models captured the entry of this system. RAMS and NORAPS6 overpredicted a low development near the northeastern corner of the domain, while MMS produced a low-pressure region over the southeastern part of the domain that was not evident in the analysis. A more detailed look at the error statistics provides some insight into the scores presented previously.
The mixed situation described above was actually reflected in the statistics. The computed error statistics can be presented to reflect the distribution of the errors. These distributions can provide more information on the model's prediction capabilities. If the errors are grouped in bins of twice the error criteria, histograms can be developed. The central bin is plus or minus the criteria, the next is two and three times the criteria, etc. Figures 6a-e and 7a-e present the distribution of the errors with separate surface and upperair comparisons. The bins are labeled with their center points in criteria units. (The high wind speed comparisons are not shown for the surface because of the relatively few occurrences.)
A look at the error statistics results in the conclusion that no Achilles' heel exists for these models. All models generate reasonable error distributions, and there are no gross biases. The surface comparisons indicate RAMS has a bias to predict too high a sea level pressure, while MMS and RWM predict slightly low temperatures. All of the models show a slight positive bias in surface wind velocity that is, at least, partially due to using values at altitudes higher than 10 m. The corresponding upper-air comparison shows a negative bias, which may be due to averaging peaks over the cell. The prediction of wind direction was much better away from the irregularities associated with the surface. The prediction of dewpoint was much better on the surface, where more data exists. Above the surface, NORAPS6 and RWM exhibit a strong tendency to overstate the dewpoint depression, while MMS slightly understates the value. The remaining differences between the models show up as the width of the distribution curves.
f. Run times
The models were all run on single processor machines. While the authors are aware of the great advances in multiprocessor machines and the cost of such machines coming down, the run times in Table 9 are given only as an example of the execution time required for these forecasts.
5. Conclusions
A model comparison was accomplished in an attempt to determine which current mesoscale forecast model best satisfies USAF accuracy requirements under prescribed resolution constraints.
Those constraints were levied on the study based on the need to operate a mesoscale forecast model in a particular theater as opposed to producing a forecast at a centralized facility. By operating the model in theater, indigenous meteorological observations could be used that otherwise would not be available. However, this configuration requires the use of a workstation forecast model by personnel that are not generally experts at numerical weather prediction. The study focused on using a relatively modest, by today's standards, single-processor workstation to produce forecasts typical of those desired by USAF. The results demonstrate this is a practical approach, especially with the relatively low-cost, high-performance workstations now available. To demonstrate worldwide capability, tests were made comparing output to observations in five completely different regions of the world. Models were also run during different seasons of the year to ensure the model not only produced the best results for one region/season, but produced the most reliable forecast anywhere in the world.
Overall, RAMS model had the highest scores with MMS model next. RWM, the current USAF model, had the lowest scores. Scoring of the models was by region, accuracy requirements, and forecast times. The by-region ranking was accomplished to determine how well each model was able to forecast the desired parameter. A direct comparison of model-generated results with observational data for all time periods indicated statistically that RAMS forecasts were closer to observed values most often, with MM5 following next. A further stratification of the region data was accomplished. USAF accuracy requirements were used, and model outputs were compared to them. Once again, RAMS showed a statistical advantage. However, the score disparity between the models was not large. It was shown that all models had difficulty with some parameters.
Surface predictions statistics clearly indicated two distinct groups with MM5 and RAMS having the higher scores. Even though MM5 and RAMS showed superior results, they had some problems with certain parameters. For example, RAMS had some difficulty predicting sea level pressure. It was shown in the upper air that RAMS consistently produced the most cases that met the required criteria.
Histogram plots of the surface and upper-air parameters show all models produced reasonable error distributions with no significant bias. One area where all models had some difficulty was wind speed. For the surface, there was a positive bias, and for the upper air, a very slight negative bias (at wind speeds greater than 10 m s^sup -1^). The hydrostatic models had a positive bias with upper-air dewpoint depression, and MMS displayed a negative bias for both upper-air and surface dewpoint depression.
Overall, this study provided an evaluation of model performance in different regions of the world during different seasons. As horizontal resolution requirements change, the need for nonhydrostatic computations becomes greater. For its transport and dispersion program, DSWA has decided to use RAMS to provide high-resolution meteorological forecasts. It is being run routinely on an IBM workstation at resolutions of 10 km (or less) over some of the regions described in this study.
Acknowledgments. This work was supported under Contract DNA001 -94-C-0096 with the DSWA. The authors appreciate the efforts of the various model authorities in participating in this study by reviewing the parameter settings and output from the continental United States forecasts. The resolution ground rules precluded the models from performing optimally, and their cooperation under these conditions is very commendable.
[Reference]
References
[Reference]
Arakawa, A., and W. Schubert, 1974: Interaction of a cumulus cloud ensemble with the large scale environment. Part I. J. Atmos. Sci., 31, 674-701.
Busch, N., W. Klug, R. Pearce, and P. White, 1994: Comments on statistical results. Mesoscale Modeling of the Atmosphere. Meteor. Monogr., No. 47, Amer. Meteor. Soc., 155-156.
Deardorff, J. W., 1972: Parameterization of the planetary boundary layer for use in general circulation models. Mon. Wea. Rev., 100, 93-106.
Droegemeier, K., M. Xue, K. Johnson, M. O'Keefe, A. Sawdey, G. Sabot, S. Wholey, and K. Mills, 1995: Design and implementation of a scalable-parallel stormscale numerical weather prediction model. High Performance Computing: Problem Solving With Parallel and Vector Architectures, G. Sabot, Ed., Addison-Wesley, 45-92.
Dudhia, J., 1989: Numerical study of convection observed during the winter monsoon experiment using a mesoscale twodimensional model. J. Atmos. Sci., 46, 3077-3107.
Dynamics Research Corporation (DRC), 1992: Theater forecast model customer requirements. Tech Rept. E-21458U, 25 pp. [Available from ESC Headquarters, Air Force Material Command, Hanscom AFB, MA 01731.]
-, 1993: Combat weather systems technical alternative study, "interim report." Tech Rept. E-383U,164 pp. [Available from ESC Headquarters, Air Force Material Command, Hanscom AFB, MA 01731.]
Englehart, L. M., R. L. Hughes, and K. J. Lunn,1993: Functional description AFGWC/SYSM. Tech. Rept., 84 pp. [Available from Air Force Global Weather Central, Offutt AFB, NE 68113.]
[Reference]
Gill, D. O., 1992: A user's guide to the Penn State/NCAR mesoscale modeling system. NCAR Tech. Note 381+IA, National Center for Atmospheric Research, Boulder, CO, 233 pp.
Grell, G. A., Y. H. Kuo, and R. Pasch, 1991: Semi-prognostic tests of cumulus parameterization schemes in the middle latitudes. Mon. Wea. Rev., 119, 5-31.
, J. Dudhia, and D. R. Stauffer, 1993: A description of the fifth generation Penn State/NCAR mesoscale model. NCAR Tech. Note 398+IA, 122 pp. [Available from National Center for Atmospheric Research, P.O. Box 3000, Boulder, CO 80303.]
[Reference]
Haagenson, P. L., J. Dudhia, D. R. Stauffer, and G. Grell, 1993: The Penn State/NCAR mesoscale model (MMS) source code documentation. NCAR Tech. Note 392, 204 pp. [Available from National Center for Atmospheric Research, P.O. Box 3000, Boulder, CO 80303.]
Hodur, R. M., 1987: Evaluation of a regional model with an update cycle. Mon. Wea. Rev., 115, 2707-2718.
Headquarters, United States Air Force (HQ USAF),1992: Operational requirements document for combat weather system, USAF ORD 211-89-1, Directorate of Weather, DCS Plans, HQ USAF, Washington, DC, 62 pp.
Kopp, T. J., T. J. Neu, and J. M. Lanicci, 1994: A description of Air Force Global Weather Central's surface temperature model. Preprints, loth Conf. on Numerical Weather Prediction, Portland, OR, Amer. Meteor. Soc., 435-437.
Kuo, H. L., 1974: Further studies of the parameterization of the effect of cumulus convection on large-scale flow. J. Atmos. Sci., 31, 1232-1240.
Mathor, M., 1983: A quasi-Lagrangian regional model designed for operational weather predictions. Mon. Wea. Rev., 111, 2087-2098.
Pielke, R., and R. Pearce, Eds., 1994: Mesoscale Modeling of the Atmosphere. Meteor. Monogr., No. 47, Amer. Meteor. Soc., 156 pp.
[Reference]
-, and R. L. Walko, 1994: RAMS technical manual (draft). ASTeR Division of MRC, 45 pp.
-, and R. F. A. Hertenstein, 1994: Rams Version 3 User's Guide. ASTeR Division of MRC, 121 pp. Unden, P., 1982: The Swedish Limited Area Model. Swedish
Meteorological and Hydrological Institute Rep., 35 pp. Zhang, D. L., and R. A. Anthes, 1982: A high-resolution model of the planetary boundary layer sensitivity tests and comparisons with SESAME-79 data. J. Appl. Meteor., 21, 15941609.
[Author Affiliation]
*National Defense University, Fort McNair, Washington, D.C.
+Mission Research Corporation, Huntsville, Alabama.
#Defense Special Weapons Agency, Alexandria, Virginia.
Corresponding author address: Dr. Bruce L. Bauer, Mission Research Corporation, 6703 Odyssey Dr., Suite 101, Huntsville, AL 35806. E-mail: bbauer@hiwaay.net In final form 16 May 1997. 1998 American Meteorological Society