
The Benzene
Case Study:
Using Right-to-Know
Data to Answer Important Environmental Policy Questions
paper
prepared for the
ESP Case
Study #2 May,
1999 by
Dr. James
R. Lee
Ms. Anna Jung
Environment, Statistics
and Policy (ESP) Project
American University

Table
of Contents
Executive
Summary
This report
embodies a case study carried out by a (fictitious) organization that represents
public environmental interests, in particular the dangers from the chemical
benzene. Benzene Dumping in the Baltimore Area. The BENZENE case study is predicated
on the need to estimate environmental clean-up costs based on emissions aggregates
for benzene- like compounds.
1. Tell the user more
about the technical aspects of using the data.
2. Make the data easier
to use.
3. Create and make available
time series data.
4. Provide a more useful
context for the data.
I.
The Problem
Public
health and environmental health advocates both advocate clean-up of harmful
chemicals such as benzene. This is a problem in Baltimore where several industries
produce benzene as a by product. Not only is benzene a threat to human health,
it is also a danger to the health of the Chesapeake Bay, since much of the benzene
is washed or drained into the Bay where is lies in deposits at the bottom.
Prior to even considering policy mechanisms to solve problems of benzene pollution,
there needs to be an estimate of how much benzene has actually been released
into the environment.
The
purpose of this case study is to determine the extent to which the publicly
available date released on the Web can be used to estimate magnitudes or aggregates
of pollutants released into the environment. We want to know to know how much
benzene has been dumped in the city of Baltimore as far back in time as possible.

II. Research Approach
The
estimate the level of benzene emissions, the RTKNET site was used to search
for benzene releases. Since the Toxics Releases Inventory is the oldest of
the datasets available, this will give the longest time line (1987-1996).
The
TRI data that is on this site can be sorted by a specific chemical, such as
benzene, but this is not the case with most other RTK data sets. The TRI is
a more comprehensive database, since most others that are limited to single
media reporting.
This
search was taken from RTK NET's (the Right-To-Know Network)'s copy of EPA's
TRI database. RTK NET is run by OMB Watch and Unison Institute at 1742 Connecticut
Ave., NW, Washington DC, 20009 - Phone: 202-234-8494. The search was done
on 02/09/1999. Here are the input criteria (see Table 1).
|
Table 1 |
| AREA
REPORT ( TRI DATA ) |
| search
used- |
range |
| Zip
Code |
ALL |
| City
|
BALTIMORE |
| County
|
ALL |
| State
|
MD |
| Chemical
|
BENZ*
[BENZENE, BENZOYL PEROXIDE] |
| CAS
|
ALL |
| Year
|
ALL |
| Level
of Detail |
HIGH |
| Output
Type |
Text |
| Sort
Order |
Facility
name |
III.
Results
This
section discusses right-to-know data, where to find it, and how to get it.
A.
Using Right-to-Know Data
We
accessed environmental available to the public through "Right-to- Know"
legislation. RTK.NET is the major private "The-Right-to-Know" Web site,
run by a private organization affiliated with OMB Watch and the Unison Institute.
RTK.NET uses information from EPA, so RTK data appears to cover about the
same categories of reporting as Envirofacts.
Another
level of sorting options must be made available to the user to make the
system more useable. Further, many user choices could be expanded and other
"user-friendly" features added. Here is a brief description of the process
of getting data.
The
RTK.NET web site is very accessible, and the options for sorting by geographic
locale are quite easy. Actually managing and utilizing the data is another
matter. The system does have a nice feature which sends the data for the
chosen geographic locale to your email address.
The
data we recieved is an "ascii" file, which you can easily download from
Netscape email using the "save as" option. The file contains more than 100
categories and can be transmitted in either TAB or COMMA delimited format.
From there one can import the files into Quattro Pro or Excell (and presumably
SPSS and SAS) for data analysis.
B.
What Does the Data Tell Us?
Most
of the benzene releases occurred during the earlier part of the time period
and most of these releases were in the air (fugitive and stack). Therefore,
the dispersal of benzene compounds may have spread over a large area. However,
most of the area would still be within the watershed area, so that the benzene
emissions might eventually wind up in the Chesapeake Bay.
| Table
2 |
| Data
in pounds |
| Year
|
Releases |
Transfers |
Total
Production |
| 1987
|
54,000 |
11,800 |
65,800
|
| 1988
|
90,000
|
7,800
|
97,000 |
| 1989
|
80,500
|
8,800 |
93,300 |
| 1990
|
72,590
|
11,010
|
83,600
|
| 1991
|
68,794 |
7,799 |
76,593 |
| 1992
|
39,877
|
8,417
|
48,294 |
| 1993
|
4,388
|
65,595
|
69,983 |
| 1994
|
3,655
|
91,605
|
95,260 |
| 1995
|
1,947
|
106,830
|
108,777 |
| 1996
|
2,323
|
28,920 |
31,243 |
| Totals
|
418,074
|
348,576
|
766,650 |
Between
1987 and 1996 there were over 400,000 pounds of benzene related compounds released
from facilities within the city of Baltimore. These were mostly releases into
the atmosphere. The vast majority of the transfers are out of state, with large
transfer destinations occurring in states such as New York, Kentucky, Pennsylvania,
and South Carolina.
C.
Comparable and Additional EPA Data
Through
examination of comparable and additional data, it is possible to provide context
for the BENZENE case.
1.
Comparable Data on the Web: Envirofacts
We
attempted to access the same data set using EPA's Envirofacts (EF) Web system,
in an initial effort in comparing data sets on water discharges.
2.
Additional Data on the Web
a.
EPA's Surf Your Watershed
Data
on water discharges is only part of the overall story in explaining other factors
present in the pfiesteria outbreak on the Pocomoke River. In probing deeper
into these other issues, we would also need to obtain data from Storet(x) which
details the volume of water discharges and therefore provides context for the
PCS discharge data. Watershed data (Surf Your Watershed) may also provide critical
background information.
b.
FedStat
We
also used FedStat, which is a collection of databases from many Federal agencies,
to seek out Benzene dumping elsewhere.

IV.
Recommendations
Here
are four recommendations about improving public access to and use of right-to-know
environmental data.
1. Tell the user more about the technical aspects of using the data.
There
is simply not enough information available on the process of downloading
and utilizing data from either Web site. We use Quattro Pro on the AU
system. The default on the RTK.NET system is tab delimited format, although
Quattro Pro supports a comma delimited format. We unfortunately discovered
this the hard way. There should be an explanation of how to actually manage
the data in various software packages as well as introductory instruction
in analyzing it. At the user end, there should be a user-friendly choice
of downloading the data in readily accessible formats (for example, Quattro
Pro, Excell, Word, etc.).
2.
Make the data easier to use.
The
data is presented in a random way that confuses the user as to order of
information types. For example, the data fields in the files when downloaded
are not accompanied by the data headers when imported, which means these
must be imported from another file or typed in by hand. Included in the
e-mailed data set, there is a hyper-link for the header categories, but
this step serves as an additional obstacle for the user to solve as well
as another potential source of error in data use.
3.
There needs to be readily-useable time series data available.
There
should also be a means by which to discriminate data by time, as that
is a feature which will be of constant concern. Data will naturally need
to be examined in terms of periodicity. This information is determinable,
but is not easily attainable in the current data offering on the Web sites.
4.
Provide a more useful context for the data.
There
is context for the data, but it is often at levels too disparate from
the level of data. In the Pfiesteria case study, there was a context,
but the specific locations of the point source data could not link up
to the eco-system level data of the context. There must be some discrimination
in eco-system levels and scopes to provide a link to the point-source
data.
Appendix
1
PCS
Data Field Explanations
Next
Steps
We
think one way to explore this case study is to follow-up on this trail
of discovery by turning attention away from somewhat sophisticated use
by researchers to the problems of providing accesible data that can be
used. Therefore, we suggest the case study continue, but this time from
a focus of the Pfiesteria case study within EPA itself. The case study
could serve as a simulation to test the legal limits of the information.
We
propose a project that will both educate and examine "right-to-know" (RTK)
consumer data that is now available on the Web. The Educating and Evaluating
RTK project (EE-RTK) would use students in assessing and using right-to-know
data. Not only would it provide valuable feedback on the use and misuse
of the data, it can also serve as a basis for developing the elements
of a class built around this subject.
Appendix
3
Nine
Database Quality Questions
This
case study constitutes a good basis from which to answer the "Nine Database
Quality Questions" which form the basis for review of PCS and other EPA
databases. Our approach to the subject is as scientists. We believe that
the level of accuracy requires an assumption of proof, this for attaining
reasonable scientific findings and for the legal reasons that flow from
scientific findings, especially those based on statistics. We will answer
these questions from using the data in the context of an academic researcher,
one therfore whose findings would be sufficient to stand as an expert witness
in a court case or proof of statistical relationship. We also assume that
the data is publically available and began with use of a non- profit user
of EPA data.
1. How
Comprehensive is the Database?
Unknown.
As a case study, comprehensiveness was antithetical to the scope of the
research.
2. Can
the Database Be Used for Spatial Analysis?
Maybe.
There are spatial variables in the database. However, it is unknown as to
its geographic exactness to produce cause and effect. Is the address the
report for the site of an event, the site of the nearest post office, or
the corporate headquarters filing the report? Likewise, do municipal variables
refer to the location of the event of the government office responding to
the request? Furthermore, there are distinct state-by-state reporting characteristics
that were found in this case that need to be addressed.
3. Can
the Database be Used for Temporal Analysis?
No.
Publically-available PCS contains inadequate data for even constructing
a time series, a key find of our report. This data does exist but the data
on the Web, and thus publically-available, has only limited time series
indications. This is a function of both funding and protection of business
and privacy interests.
4. How
Consistent Are the Variables Over Space and Time?
Not enough.
Time is distorted and space may be limited in the dataset.
5. Can
Data Be Linked with Information from other Databases?
Absolutely.
We were able to use PCS along with other data through facility reports provided,
although were not publically-available data sets.
6. How
Accurate are the Data?
We
did not investigate this.
7. What
are the Limitations?
Is
the data that is now available on the Internet of sufficient quality for
scientific examination? At the moment, the answer is no.
8. How
Can I Get Information?
Any
Internet account with a search engine can find the data. We did not investigate
ordering the data by phone in hard copy.
9. Is
There Documentation?
Yes,
but not very accessible.
Important
Links to Related Sites

August, 1998

|