Mining for data

QUESTION: What’s the best way to obtain government data?

Answer by Fred Vallance-Jones

More and more reporters are incorporating computer-assisted reporting techniques into their work, and for many this means obtaining data from government agencies.

Sometimes this is as simple as downloading a data table from a website, and loading it into a spreadsheet or database application.

But most databases are not online. Getting them means approaching the agency and asking.

This column is about the asking.

No doubt you have heard accounts of reporters waging long FOI battles with agencies, and indeed I have fought such battles. But the advice I’m going to offer here is to try to avoid these battles if you possibly can. In fact, I’m going to suggest that if you can avoid using the formal access to information process, you should do so.

In my years working in radio and print, I was able to obtain data informally many times.

Oddly enough, I learned to go this route when I found that many of my formal access requests would lead to extended negotiations anyway, such as when I asked for a series of environmental databases in Manitoba a decade ago.

At first, the idea of approaching an agency and asking for its internal data may seem a little daunting. But something you need to remember is that the request may seem daunting for the agency as well.

Government officials are used to doling out information to the public and media in measured amounts, like a cook filling bowls in a cafeteria lineup. So when a reporter comes along and asks for the whole pot of soup, the bureaucrats get nervous. They will be asking themselves, why does she want it, what will she do with it, and how will it affect us? Most likely, their answers will veer toward worst-case scenarios, and as a result, they will dig in their heels.

This is why access requests for data so often grind down. While access laws give you a right to obtain records, the process promotes paper shuffling, not dialogue, and leaves plenty of time for officials to imagine all sorts of unfavourable outcomes.

A direct approach is often better.

The first step is to do some research. Once you know you want to obtain some data, find out everything you can about what the database is called, what data is stored in it, the database software used, the purposes for which the data is maintained, and who maintains and uses it. See if you can get a copy of the paper form(s) used to collect data the data.

There are all sorts of sources for such details. They range from agency annual reports, websites and telephone directories, to public interest or advocacy organizations that deal with the agency, to your own news files, which may contain references to databases.

Once you know as much as you can, you will need to approach the agency directly.

This has to be done with care and diplomacy. I would suggest avoiding the media or public relations department, because these often work in an information tunnel, with only certain details approved for public release. Media relations people frequently have no authority to release anything beyond the official line, and tend to be steeped in a culture of information control rather than one of openness. Media relations officers are also unlikely to be particularly database-literate, and so may have no idea what you are talking about.

I would also suggest avoiding the line IT people at this point. They are used to being gatekeepers to data, even within their organizations. They are unlikely to have any inclination or authority to engage in discussions with an outsider.

Once you have opened the doors through an appropriate official, the IT people can be helpful in terms of solving specific technical problems, but they aren’t the best first point of contact.

The right place to aim is where control of the information resides. This is usually some middle-level manager responsible for the program area that collects and uses the data.

The reason I suggest approaching this kind of person is that he or she will have significant authority over the data and the people who administer it. The person will also have influence over more senior decision makers who may be called in to approve release.

A key tactic is to minimize the amount of information you give over the telephone. Your goal is to arrange a face-to-face meeting in which you can have a reasonable discussion of your request, and how the agency could fulfill it. Say something like, “I’d like to chat about how I could obtain some data from the Chinchilla database for a project I’m working on,” or some such thing.

Once you have agreement to sit down, make sure you have a good sense of which fields of data you need, the time period you need, what you would be prepared to have left out—names or street addresses for example—and what sort of format you need.

Then, once you are talking with the official, give enough of an explanation of what you are doing that you alleviate possible fears about how you might misuse the data. Obviously you don’t give away what story you may have in mind or explain every possible use, but you can explain generally that you wish to analyze the data to look at a broad question.

In my experience officials will often now offer to provide you with a data summary or report. Politely explain that you like to work with raw data. I would often talk a little bit about computer-assisted reporting, and how it is an emerging kind of journalism in which journalists work with original source material. Often, once an official understood the kind of work I was doing they would respect rather than fear what I was doing.

During this first meeting, if you have not already done so, ask for a copy of the list of fields in the database, their data types and lengths, and a general description of what is in each field. If your conversation is going well, this can often be arranged easily now.

If you are lucky, the official may arrange for the actual data to be copied for you. More likely, he or she will want to take your request to others, possibly legal counsel, higher ups, or IT folks.

These other officials will likely raise various concerns, as this is what they are paid to do. You may be able to address these concerns directly with the officials involved, or through your contact. But if it looks like you are hitting a brick wall, see if you can arrange a meeting attended by the key officials involved, and one or two colleagues from your organization. At this point it can help to bring along one of your higher-ups, to help balance the power relationship in the room.

At this point you may have to ramp up the argument a little, demonstrating how you would have the right to access the information through a formal request. Make sure you know what you can and can’t get through the applicable access legislation, if the agency with which you are dealing is covered. You may also need to demonstrate your resolve to obtain the data, as some of the people in the room may just be hoping to make you go away.

Always remain civil and courteous, while at the same time adopting a level of insistence appropriate for the level of officials with which you are dealing.

The goal, of course, is to obtain agreement to allow you to obtain the data without any restrictions on its use. Generally, once this agreement has been reached, you will be able to discuss detailed technical issues with the IT staff. You can often initiate this by suggesting it would be easier if you could have a direct techno-talk conversation with an IT person rather than working through an intermediary who is not familiar with database lingo.

Obviously, how you go about your negotiations will vary depending on the organization with which you are dealing. A conversation at a small city hall could involve at most one or two people, while those with a large provincial department could bring in many more. But no matter what the level, you want to promote a constructive dialogue in which everyone can see their interests served.

It would by Pollyanaish to suggest that this approach will always work. In some cases, you will be dealing with extraordinarily sophisticated people who will, for whatever reason, not want to give you the data. They may fear what you will find out, may mistrust your intentions, or may simply be control freaks. Whatever the reason, you may be forced to file a formal access request.

The basic approach is simple.

Ask for the database by name, state whether you want all or just some of the fields—in this case name them–, provide the time period for which you would like the data, and make sure you request all associated data codes and lookup tables, and sufficient documentation to understand the structure and contents of the database.

From here, the process will unfold as with any request, but there is a high likelihood of continued resistance. You may be told that the data cannot be copied, that it will cost a great deal to do so, or that it can be withheld under exemptions in the act. Knowing how the data is stored and what fields are contained within in, and having a detailed understanding of the particular access statute, will help you immensely as the process unfolds.

If you are denied access, and you cannot reach a satisfactory accommodation with officials, file an appeal with the appeal body in your jurisdiction. This not only puts pressure on the bureaucrats—they hate appeals because they tie up time and resources and hurt the overall access compliance record for the agency—but the investigator with the appeal body can become an important ally once they understand the merits of your request.

Generally, if you are asking for data that doesn’t fall within one of the exemptions in the act, you will get what you are looking for. It just may take time, especially if that database hasn’t been requested before. If it has, then another journalist may have fought the battles for you, and a procedure will have been established. Just ask for the same data in the same format, adjusting the time period as appropriate. A formal request may not even be necessary, just a call to the access office.

The best approach is to avoid the access process altogether and try to obtain the data informally. So long as the agency doesn’t see a huge downside in letting you have it—for example officials know that the program is in chaos and the data will reveal that—you have an excellent chance of getting what you want without a huge fight.

Good luck.

Fred Vallance-Jones, an award winning reporter with both the Hamilton Spectator and CBC, is one of Canada’s leading practitioners of computer-assisted reporting. He currently teaches journalism at the University of King’s College in Halifax and is the Contributing Editor of the “Computer-assisted Reporting” J-Topic.