PQPF VERIFICATION SYSTEM User/Administration Manual
Introduction
The Probabilistic Quantitative Precipitation Forecast Verification
System (PQPFVS) is a system of software programs designed to
calculate calibration and informativeness scores for Probabilistic
Quantitative Precipitation Forecasts (PQPFs).
Audience
The system as it currently stands is a prototype system and was not
designed as a production system. In particular the user interface of the
system remains primitive. The design of a suitable interface has been
deliberately left as further work and research.
Therefor administrators of this system should be comfortable using C++
in a Unix environment. Users willing to live with the minimal interface
provided need only be comfortable issuing unix shell commands.
System Elements and Implementation
The system was implemented on a Hewlett
Packard 715/80 workstation running HP-UX 10.20. The software was written
in C++ using the egcs-2.91.66 g++ compiler
(documentation here).
The database used is MySql version 3.22.27
(documentation here).
Additionally the the getbasin library (sorry, no documentation that I know of)
written by Wray Mills
is used for extracting basin information and the NWS APPS_DEFAULTS system
(documentation here) is used for specifying system configuration values.
All forecasts and observations are stored as gzipped xmrg files. Documentation
for the XMRG format is
descibed here.
Since the files are gzipped, the zlib library is needed for decompressing
the files
(documentation here). It should be noted
that because of the approximately 50 to 1 compression ratio for XMRG files
and somewhat limited space compressed files are used. If this is not
desirable then the Grid class must be changed.
Finally the source code is needed. It is available for
download here.
This can be installed in whatever convenient location. We'll call this
location verify_loc.
Distribution
The distribution of source code contains several different directories.
These directories are listed and described below.
- src - This directory holds the make file and all C++ implementation files.
- inc - This directory holds all of the header (definition) files.
- bin - This directory holds the binary executable generated by compilation.
- etc - This directory holds various files needed for execution.
- tools - This directory hold various useful tools.
- db - This directory holds the db definition files.
Requirements and Design
The system design document can be found
here. This document gives an overview of
the primary classes involved in the software.
The system requirements can be found here.
This list of requirements is what was initially designed for and does not
necessarily reflect the current state of the system.
Set Up
Once all of the system elements described above are in place, the first
thing to do is create the database. The makeSchema.ksh script can be found
verify_loc/db. It will create the tables, indices and insert the
constant data found in the database.
The next step is to load all of the data. The tool
verify_loc/tools/loadDB.pl can be used for this or it may be done
manually. All loadDB.pl requires is a list of XMRG files with their fully
specified paths. One file per line. It also requires that the files be
named correctly and that they be gzipped. The naming formats are as follows:
- For observations: xmrgMMDDYYYYHRz.gz
- For forecasts: X_YYYYMMDDHRz.gz.
X is can be one of:
- pop_p_i
- x25_p_i
- x50_p_i
- z1_p_i
- z2_p_i
- z3_p_i
- YYYY specifies the 4 digit year.
- MM specifies the (zero padded, if necessary) 2 digit month.
- DD specifies the (zero padded, if necessary) 2 digit day.
- HR specifies the (zero padded, if necessary) 2 digit hour.
The hour is expressed in military or 24 hour time, not 12 hour am/pm time.
It should be noted that the loadDB.pl script is a Perl script that requires
the DBI module to be installed
(documentation here).
Once the database has been created and loaded with data, the next step
is to compile the application. To do this enter the verify_loc/src
directory. In this directory the Makefile must be edited to reflect the
actual locations of libraries and files. The specific variables that may
need to be changed depending on the target system configuration are:
- MYSQL_INC_PATH
- MYSQL_LIB_PATH
- BASIN_INC_PATH
- BASIN_LIB_PATH
- BASIN_UVA_LIBS
Once these changes have been made the code can be compiled simply by typing
"make" at the command prompt. If there are no errors in the compilation,
typing "make install" will copy the executable to the verify_loc/bin
directory.
Configuration
A variety of parameters can be configured in the APPS_DEFAULTS to suit
different situations and requirements. They are:
- pqpfvs_wfo_bit_mask - This value specifies a full path and file name.
The file specified is a set of zeros and ones that determine whether a point is
within a wfo or not. This is used for squares to determine if points within
a forecast are valid. It can also be used to exclude certain areas so that
no calculations are performed for them. This is useful for incomplete data
sets.
- pqpfvs_valid_threshold - This is a value between 0 and 1. It is used
to check areas and sub-areas to make sure that enough of the points in
the specified area are valid points. All work to this point has used a
value of 0.75. It should also be noted that this value doesn't affect the
calculations of basins since basins are defined to be valid. If this number
is changed for squares the offsets specified here
will no longer be valid.
- pqpfvs_verify_basins_dir - This value is the directory path where the
files that specify the ???? that comprise each basin. The files used for
release 1.0 can be found in verify_loc/etc/basins so this is what the
variable should be set to.
- pqpfvs_verify_basins - This value specifies the file that lists the files
that contain the lists of ???? that comprise each basin. The file used for
release 1.0 can be found at verify_loc/etc/basins/basin_ids.
- pqpfvs_debug_level - This is a value used by the Debug class to determine
what level of debug is necessary for debugging purposes. Note that the value
specified is the integer value, NOT the enum label provided below for
clarity. Valid values are:
are:
- 0 = NO_DEBUG - As the name implies no debug is output.
- 1 = LOW_DEBUG - Minimal debug messages. Usually only unusual will generate
output that will be seen here.
- 2 = MEDIUM_DEBUG - Quite a bit of debug. This will primarily consist of
method entry and exit statements. Useful for locating where problems occur.
- 3 = HIGH_DEBUG - Lots of information.
- 4 = MAX_DEBUG - Way more information than you can use. Iterations through
long loops are output here such as the iteration through a Riemann sum.
These debug levels map directly to the DebugLevel enum. It should be noted
that all debug levels below the specified level will be printed. So if
medium debug is specified, then both medium and low debug messages will be
printed.
- pqpfvs_riemann_delta - This value specifies how many deltas should be
used to divide up the range of integration for a Riemann sum. Testing has shown
1000 to be a reasonable value, but you're only limited by the speed of your
computer.
A sample APPS_DEFAULTS file is provided. It can be found at:
verify_loc/etc/sample.APPS_DEFAULTS. All PQPFVS specific tokens are
noted within this sample.
Running
The executable is named "pqpfvs." Simply type "pqpfvs" followed by the
necessary arguments at the command line. All output will be to stdout. If
this isn't desirable the output can be redirected to a file in whatever
manner suits you. All errors are written to stderr, while all debug
messages are written to stdout.
Arguments
The pqpfvs executable provided with the system allows for two types of
verification to take place, either by square or by basin. In both cases the
date range over which forecasts are to be verified must be specified.
So the first arguments are the begin and end dates specified as follows
(note that this command is incomplete):
pqpfvs "YYYY-MM-DD HR:MI:SS" "YYYY-MM-DD HR:MI:SS"
- YYYY specifies the 4 digit year.
- MM specifies the (zero padded, if necessary) 2 digit month.
- DD specifies the (zero padded, if necessary) 2 digit day.
- HR specifies the (zero padded, if necessary) 2 digit hour.
The hour is expressed in military or 24 hour time, not 12 hour am/pm time.
- MI specifies the (zero padded, if necessary) 2 digit minute.
- SS specifies the (zero padded, if necessary) 2 digit second.
It is also possible to specify just the day without the time. In this case
the time is set to a default value of 00:00:00 (midnight). The command
(still incomplete) would then look like:
pqpfvs "YYYY-MM-DD" "YYYY-MM-DD"
It should be noted that if the dates both default to noon then the
ForecastHandler and ObservationHandler that are first constructed must also
be constructed with the MIDNIGHT option. Also since PQPF forecasts are for
the previous 24 hours from noon to the previous noon, this option is not
recommended. Also note that the quotation marks are necessary because
the unix command line (at least with ksh) uses whitespace as an argument
delimiter. The software will throw an exception if the beginning date
is later than the end date.
The next parameter determines whether squares or basins are used for
evaluation. Simply type (this command is complete):
pqpfvs "2000-08-01 12:00:00" "2000-09-01 12:00:00" basins
or (this command is still not yet complete):
pqpfvs "2000-08-01 12:00:00" "2000-09-01 12:00:00" squares
When squares is specified, 3 more arguments are required. The number of
grid points per side of the square, the number of points that offset the
squares from the top of the grid and the number of points that offset the
squares from the left of the grid.
pqpfvs "2000-08-01 12:00:00" "2000-09-01 12:00:00" squares 19 13 8
This command calculates the calibration scores and informativeness scores
for squares of size 19x19 offset 13 from the top and 8 from the left for
the date range of August 1st. through September 1st.
NOTE:
Remember that all output goes to stdout. This means if you want to save the
output to a file you should use the standard unix tools to redirect the
output.
Extending the User Interface
As noted earlier the user interface for the the PQPFVS remains
deliberately primitive.
Some of the features built into the software are only available to
those willing to edit C++ code. Specifically this
means altering the RunAll.cpp file which contains the main loop for the
software and reading the source code
(documentation here).
to see what exactly is available.
The software has been designed with the expectation of a user interface to
be added in the future. With this in mind, all of the data displayed
as text output by the "show" methods can be collected with the appropriate
"get" methods from the Validation objects for proper display. This course
of action assumes a C++ display mechanism.
The other display possibility is to call the show methods with the "COMPUTER"
enum. This will dump the results to stdout as usual but in a format that
should be easily parsed by most programming languages. This allows the
the interface designer to use whatever tools are desired.
Happy Forecasting!