<advertisement>

CJRColumbia Journalism Review

May/June 1992 | Contents

Technology

QUANTUM LEAPS
Computer Journalism Takes Off

by George Landau
Landau, a reporter for the St. Louis Post-Dispatch, is a specialist in computer-assisted journalism.

Two and a half years have passed since reporter Elliot Jaspin left the Providence Journal-Bulletin, moved to the Midwest, and began preaching the gospel of computer-assisted reporting. Jaspin is not alone in the field, but as founder of the Missouri Institute for Computer-Assisted Reporting (part of the University of Missouri's School of Journalism) he is generally credited with having been the first to promote the use of personal computers to analyze mainframe-sized databases.

His methods have caught on. While the precise number is hard to pin down, at least two dozen newspapers in the U.S. have a reporter who specializes in working with computer data, according to Jim Brown, executive director of the National Institute for Advanced Reporting, at the Indiana University School of Journalism in Indianapolis. And judging by attendance last April at the Indiana institute's conference on computer-assisted journalism, hundreds more reporters and editors from print and broadcast want to learn how to use a computer for a lot more than just word processing.

PC journalism has caught on quickly in newsrooms because PCs have evolved at a dizzying pace in recent years, while their price has dropped. These days, $ 6,000 can buy you a PC with more than enough storage capacity to handle files from most mainframes. Another $ 3,000 buys a nine-track tape drive to read the data from those mainframes. Finally, state-of-the-art software costs only about $ 800.

Here's what some of us ordinary reporters and editors have been able to accomplish in the last year or two using PCs to analyze data from mainframes:

GHOST VOTERS In an account of possible vote fraud in East St. Louis, Illinois, the St. Louis Post-Dispatch proved the existence of an afterlife. My colleague Tim Novak and I had been comparing a listing of voter addresses with a database of vacant lots, trying to gauge the extent of illegal registration. (With the city's registered voters outnumbering the voting-age population, this was no fishing expedition.)

In the middle of that project, another database we had long been seeking arrived: eleven years of Missouri's computerized death certificates, 1979-1989. We had haggled with the state health department for months to get those tapes. The bureaucrat in charge didn't understand the words "public record," and feared we would print a list of every AIDS victim since 1983.

What did Missouri death certificates have to do with East St. Louis? More than we expected. St. Louis, it turns out, is a popular place to die; the best and biggest hospitals are on the Missouri side of the Mississippi River. By having the computer scan those eleven years of death certificates -- 550,000 records altogether -- we derived a list of more than 1,000 East St. Louis residents who had died in Missouri.

Of those, 270 -- most of them dead for several years -- were still registered to vote. Two dozen had kept on voting from the grave. We started our story this way:

A man named Admiral Wherry, an army veteran who owned a barbecue pit and tire repair shop in East St. Louis, died more than two years ago.

But that didn't stop him from voting in the Illinois Democratic primary on March 20.

Since then, we've used the Missouri death certificates to identify coroners who repeatedly failed to investigate suspicious deaths, listing the cause as "unknown" when an autopsy might have revealed child abuse, elderly abuse, or other evidence of homicide.

JUSTICE JAILED In Connecticut, reporter Brant Houston of The Hartford Courant had been working with a relatively small set of computerized court records when he learned of an intriguing, and much larger, database: a computer file kept by the state bail commission that listed defendants' bail, the highest crime of which the defendants were accused, any other charges pending, prior convictions, race, age, and sex.

Two months after Houston began working with a nine-track tape of bail commission data, the Courant published a three-part series describing the racial inequities in the state's bond system, the tendency of some judges to impose excessive bonds as pretrial punishment, and the failure of the bond system to assess its own fairness despite a 1981 legislative mandate that it do so.

Houston and reporter Jack Ewing's series started on June 16, 1991. Six days later they reported the response of Connecticut's chief justice: he ordered a comprehensive study of racial bias throughout the state courts.

Recently, in order to make databases available to as many reporters as possible, the Courant purchased several PCs and printed a directory of what's available in-house or on-line. Among the offerings are databases of state and federal campaign finances, industrial toxic emissions, death certificates, federal contracts, census data, and real estate records.

DEADLINE DEMOGRAPHICS Early last year, when the U.S. Census Bureau released the first detailed population counts from the 1990 census, the information wasn't available on paper. Months before printed reports would be ready, journalists with the right tools got the counts from nine-track magnetic tape, the format in which the bureau initially releases all its information.

San Francisco Chronicle special projects editor Judy Miller, who had attended one of Jaspin's seminars, was ready when the Census Bureau released the California counts in February. The Chronicle, of course, wasn't the only paper in line for a copy of the tape on the release date. Also present at the state data center in Sacramento were couriers from other California newspapers.

It was 9 P.M. when the tape reached Miller in San Francisco. "I knew exactly what the writers needed, what the graphics people needed," she says. "It took maybe five minutes." On the front page of the next day's Chronicle, sharing space with news of the allied thrust into Kuwait, was an account of California's "astonishing population changes."

Ramon G. McLeod and demographics editor Jim Schreiner wrote that "California is rapidly becoming a state in which minority groups will constitute a majority of the population." The story jumped to a page brimming with maps and charts listing ethnic populations in the city of San Francisco and every county in the state.

Miller says that while the state's other newspapers also had census stories that day, "We were one of the first California newspapers to print detailed census results." She says the experience taught her that "computer-assisted journalism doesn't have to mean it takes three months to do. It can also give you a competitive edge in a very tight, tight deadline situation."

Since those initial stories, Miller has bought a tape of more detailed census data, listing population by age, sex, household size, and family type (single mothers as opposed to married couples, for instance).

"I'd like to do some trend stories, where you use the numbers to get at news features," Miller says. "For example, the population of single dads increased. Now that men are having to deal with issues like child care, will we get better child care?

"We want to get behind the numbers and see how they reflect changes in society, how we live," she says.

RESOURCES There are two keys to the kingdom in this information age: access to data and the ability to analyze it. Neither requires mainframes; personal computers can deliver both.

With the addition of just a $ 70 modem, any PC can be used to explore the rapidly expanding universe of on-line databases. Services like Lexis/Nexis, Vu/Text, DataTimes, and Dialog offer more kinds of information than can be described here. Newspaper morgues, municipal real estate records, appellate court rulings, SEC filings, abstracts of obscure research journals -- it's all out there, waiting to be found.

On-line research can be expensive -- many services charge about $ 100 an hour -- but it's an unbeatable way to get background for a story. In an ideal newsroom, on-line research would be the responsibility of the reference library. But in newsrooms where libraries haven't risen to the challenge (or in newsrooms without librarians), reporters have to learn what there is and how to get it.

In addition to on-line services, data can come from such sources as government tapes or manually entered facts and figures gathered from paper sources or interveiws. You might want to fetch census data by modem for analysis with mapping software, for example. Or you might only need to type a few hundred records into a database by hand.

Yet even then, having gathered data, you'll have solved only half the equation.

A well-equipped PC can swallow a database of almost any size. The trick is in getting the PC to digest the data. Here's a quick overview, with examples, of three different kinds of software that a reporter can use to analyze information:

Database Managers: This software allows you to take a database and do basic things like search, sort, and "group" data.

* Searching: with the right software, computers can find needles in haystacks without breaking a sweat. In a matter of seconds, for example, a PC could extract from a master database of death certificates a list of everyone who had died from brain cancer.

* Sorting: using the same database, you could have the computer list those brain cancer victims by zip code. You could also sort by name, age, weight, or whatever else is in the database.

* Grouping: if you wanted to identify the zip code with the most brain cancer deaths, you could tell the software to "group" on zip code, counting up the occurrences within each zip, then displaying the resulting list in descending order of frequency.

This isn't as hard as it may sound. Luckily, there's an elegantly simple but extremely powerful language we can use to get information from a computer database. Called Structured Query Language (SQL), it is being used increasingly in software for everything from mainframes down to PCs.

To master SQL you need learn only a few rules governing the use of a few key phrases, and once you've learned SQL you can instantly use any software that employs it.

Several brands of PC software with SQL are available; after testing a few, I'm happiest with FoxPro 2.0, from Fox Software in Perrysburg, Ohio. FoxPro is extremely fast, even with very large databases. It is also easy to learn and, once mastered, provides sophisticated and powerful programming tools that go beyond SQL.

Statistical packages: These are database managers with built-in formulas that can be used to isolate cause-and-effect relationships in a sea of variables. Such software would allow you to analyze court records, for example, to show a link between sentencing and defendants' race -- independent of defendants' age and sex, the kind of victim and crime, and the specific judges and attorneys involved.

Philip Meyer, a journalism professor at the University of North Carolina at Chapel Hill, has been urging reporters to use statistical tools since 1973, when he published Precision Journalism. (A 1991 updated version, The New Precision Journalism, is available from the Indiana University Press.) Back then, you needed to master a mainframe, a slide rule, and statistical theory. These days, Meyer says, journalists can teach themselves statistics using PC software called SPSS/PC+ studentware.

But he suggests that the journalist master a traditional database manager (like Paradox, or dBASE) first.

Geographic Information Systems (GIS): This is a new breed of PC software that allows mapping of any data with a geographic component, from real estate records to census counts. Analyzing the 1990 census, for example, can be daunting if you don't have software to plot the hundreds of census tracts in a metropolitan area. With a GIS, the computer can instantly color each tract to reflect data values -- for example, red for tracts that saw increases in median income from 1980, blue for tracts that saw a decline.

Reporters can use a GIS to monitor the political redistricting process. Just trace the latest boundary proposal onto the screen and tell the computer to total up the ethnic populations for all the census "blocks" inside it.

A good GIS can also plot anything that has a street address. In St. Louis, we used software called Atlas*GIS to "pin map" a database of 18,000 vacant lots. By merging the pin map with a map of the city's twenty-eight political wards, we were able to publish a ranking of vacant lots by ward.

Mastering the PC as a reporting tool requires four things: hardware, software, data, and diligence. There's still a learning curve to climb, but it's a lot less steep than it used to be. Computer skills are largely self-taught, but to get pointed in the right direction, reporters may want to take advantage of the following resources:

* The Missouri Institute for Computer-Assisted Reporting in Columbia, Missouri, offers week-long seminars that teach neophytes how to handle data in a variety of formats, including nine-track tape.

Elliot Jaspin, executive director of the institute, sells a software package called NineTrack Express that makes it easier for reporters to transfer data from mainframe tape to PC. The institute also publishes Uplink, a monthly newsletter on computer-assisted reporting. (For more information call 314-882-0684.)

* The National Institute for Advanced Reporting at Indiana University, Indianapolis, has an annual coference on computer-assisted reporting that draws hundreds of journalists from around the country for a weekend of inspiration and idea-swapping. It is usually held in March. (For more information call 317-274-2774.)