The emergence of the market for data-journalism skills

By David McKie

By David McKie

There has been a lot of distressing talk about the slow but sure death of journalism as we know it. With each passing week, there is news of yet another newspaper scaling back traditional content delivery and migrating much of that content to the Internet. In this space earlier this month, I introduced Kelly Toughill’s piece about some of the leading research into journalism’s foray into the digital universe, a space that has been likened to the wild, wild West.

Among all the uncertainty, there is a sliver of good news that we, and especially journalism schools, should pay attention to: The emergence of a market for a new breed of journalists who combine programming, computer-assisted or data journalism skills, and, for lack of a better term, low-tech and old fashioned shoe-leather journalism of working the phones, conducting solid research, writing coherent stories and getting it right. I’ve had the pleasure of teaching students who possess this unique combination of skills. And they will have jobs for the foreseeable future.

The recent U.S. conference of the National Institute for Computer Assisted Reporting this spring was the largest ever, featuring a job board that was so large that even organizers were surprised. Anyone who belongs to the NICAR listserv knows that job postings are becoming more frequent.

Here in Canada, The Globe and Mail, Global, the CBC, the Edmonton Journal, the Ottawa Citizen, and other outlets have journalists using these skills to produce interesting work that will be featured in the upcoming edition of Media.

I was at a gathering earlier this month in San Francisco called the Logan Symposium where a journalist from the Pulitzer-prize winning, non-profit online newsroom called Propublica made a pitch for more data journalism in a talk that I wished had lasted longer. A student at Berkeley Graduate School of Journalism, which hosted the symposium, argued in a blog that there is a disconnect between what schools are teaching and the data journalism emerging skills that a growing number of employers are seeking.

I’ve chosen to run Glen McGregor’s article, in large part, because he exemplifies this hybrid that combines computer programming skills with the old fashion ones. Glen uses many of these skills to tell original stories, including the one he discusses in his column in the upcoming Media magazine. If you ever wondered how he ended up telling the story that combined navigable waters, federal Conservative ridings and the playgrounds of the rich and famous, like actress Goldie Hawn, then please keep reading.

Glen will also be part of a team — comprised of yours truly; Fred Vallance-Jones, who teaches data and investigative journalism at University of King’s College; and Karen Li, ESRI Canada’s technical solutions specialist — that will be offering a data journalism workshop at Carleton University on the weekend of May 4 as part of the Canadian Association of Journalists’ annual conference. There’s still room, but I wouldn’t delay because it’s filling up fast.

In the meantime, enjoy Glen’s article.

Protecting playgrounds for the rich: The Conservative government’s omnibus budget bill gave “cottages” in certain Conservative ridings special treatment

By Glen McGregor, for Media Magazine

The Conservative government angered environmentalists last fall when it introduced changes to a law that protects lakes and rivers from development.  The Navigable Waters Protection Act is one of the oldest statutes in the country

The law required federal approval for construction on any body of water large enough to float a canoe. Changes to it were introduced with some stealth through the federal omnibus budget bill. 

The government argued that the law needlessly constrained development of small projects like bridges or docks by wrapping them up in federal paperwork. Environmentalists charged that the rescinding the key provisions of the law effectively shed a key level of environmental protection for Canadian lakes and rivers.

The bill removed the requirement for federal approval from all bodies of water except for a select few that were listed in a schedule in the bill.  It itemized 97 lakes and 62 rivers and canals to which the federal oversight would continue to apply and gave the approximate longitude and latitude of each waterway listed.

On first glance, there seemed to be little logic to how these were chosen. The list of exemptions included three oceans, massive Great Lakes but also dozens of small lakes in the cottage country. I had an idea to see if there was any political pattern in the decision to choose only these lakes for special protection. To find out, I used an electronic mapping program called ArcGIS – a “Geographic Information System.” ArcGIS is an extremely powerful commercial product that has long dominated the GIS software sector. It is used by city planners, geographers, demographers and, increasingly, data journalists to develop stories.

GIS programs such as ArcGIS allow analysis of data based on their spatial location. The software can take census data and match it to provincial election voting results. It can compute the average distance between Tim Hortons franchises, highlight the neighbourhoods with the highest rates of residential break-ins, or show the correlation between city blocks with low incomes and the number of syringes found in nearby parks. I chose to analyze only the lakes named in the budget bill because the rivers tended to flow through so many ridings that the data would not show any trend. If there was an attempt to protect certain bodies of water for political purposes, I reasoned, it was more likely to show up in the selection of lakes.

I began by downloading an electronic map that contained thousands of Canadian lakes. The map, in the standard “shapefile” or .shp format, can be downloaded from GeoGratis, a website that provides free base maps of Canada. The next step was to identify which of these lakes on the map were named in the bill. I cut-and-paste the list from an online version of the legislation and imported it into ArcGIS. The software then converted each of the longitude and latitude coordinates from this list to a point on the map.  The points appeared as single dots that were overlayed on the map of the lakes. Using an ArcGIS function called “spatial join,” the software selected only the polygon lakes that had matched up with a dot representing a named lake.

In theory, this should have perfectly selected from the map of lakes only those that were named in the budget bill.  Unfortunately, as most data journalists come to learn, things rarely unfold so smoothly with these kinds of projects. The longitude-latitude coordinates listed in the budget bill were not that accurate. ArcGIS matched some of them to the wrong lakes based on these fuzzy locations. 

I exported this list of matched lakes to Microsoft Excel, and wrote a quick formula to look for errors in the matching process. Where the name of the lake from the map didn’t match the name from the budget bill, I flagged the record for review. This began a laborious process of manually checking the location of each of these records by entering the coordinates in Google Maps and comparing the result with the map ArcGIS.

Once I was confident I had the selected the correct lakes from the map, I imported another map into ArcGIS that represented the federal ridings. This map is available for free from Elections Canada, also via GeoGratisI use this map a lot and had already matched up each riding on it with the name and political affiliation of the MP that held the seat.

With both the lakes and ridings on screen in ArcGIS, I ran the spatial join function again, this time selecting options that would identify the ridings that were contiguous – immediately adjacent to – each of the lakes. Many lakes, particularly the larger ones, had shoreline in more than one riding.  The program generated a tidy list of all the lakes with the names of every riding it touched, along with the MP and party affiliation. To better analyze this data, I exported it from ArcGIS back to Excel. Using an Excel function called PivotTables, I generated a breakdown of this data based on party affiliation.

The data showed that 90 per cent of the lakes had shoreline in ridings held by Conservative MPs, but only 20 per cent were contiguous with NDP ridings and six per cent with Liberal ridings.  (The numbers did not add up to 100 because many lakes are adjacent to more than one riding.)

Another analysis based the same data showed that 68 protected lakes were located in Ontario but only four in Quebec. Odder still, a disproportionate number of the lakes fell within two Ontario ridings, both held by Conservatives.  Many were in Treasury Board President Tony Clement’s riding of Parry Sound – Muskoka, which boasts some of the most expensive cottages in the country. 

Some of the lakes protected in Clement’s riding have $5 million cottages perched on their shores and count Hollywood stars and NHL players as regular visitors. Under the new law, these lakes surrounded by affluent cottagers would continue to enjoy federal protection, while the vast majority of Canadian lakes would not. The government said it had used freight-movement statistics to determine which lakes to add to the protected list and also said it did a further "qualitative analysis" to consider the historical importance of each waterway. 

The GIS analysis paid off with a front-page story and questions in the House of Commons.