Recognising patterns is a key skill in computational journalism (image by Stanley Zimny)
1. What are the essential computational skills that a journalist should develop?
Firstly, an ability to recognize patterns, or structured information. Spreadsheets are explicitly ‘data’ but some of the most interesting applications of computational journalism are where someone has seen data where others don’t.
That might be patterns in thousands of documents, or the occurrence of colours in the pixels of images, or frequencies of words, frequencies in music, or relationships between people. All of these are potential sources of stories, and open up opportunities for making investigations possible that otherwise wouldn’t be.
So I guess the other skill is the ability to find and adapt solutions – it’s not about memorising every possible Excel function or Python method, it’s often about standing on the shoulders of others who have tackled similar problems before you.
These ‘computational thinking‘ skills (I’ve written more about them here) are important regardless of whether you are using Excel or programming to do your data journalism.
But in terms of tools, I’d say sorting, filtering, pivoting (aggregating totals by a certain category in your data), combining data, and calculating percentages/ratios are the 5 key practical skills.
2. What’s your top advice when you are dealing with data?
As a journalist, identify what sort of story you are looking to tease out as early as possible – otherwise you can spend hours just fiddling around with the data and going down various blind alleys.
Most of the time your story is about just one or two columns, so decide which ones first, then break down what you need to do with those (sort, filter, pivot, combine, clean etc).
Likewise, know when to leave the data behind and start picking up the phone or hitting the streets (or searching for background).
Data is great at directing you to a place or organisation, or identifying a problem, but once you’ve done that you need to speak to that organisation, or visit that place, or interview an expert on that problem, and someone affected by it.
Finally, retain your journalistic scepticism when looking at data.
Is it too good to be true? Does a term or field mean what you think it means? (And how can you clarify that?)
Does it need cleaning? Are there other sources you can consult or talk to?
There are many examples of data which is dirty or poorly compiled or manipulated or incomplete, but which is used for the basis of decisions – and that might end up being your story.
3. What are the free tools/programs that would help journalists that are new to this area to learn computational journalism?
A spreadsheet program like Google Sheets or Libre Office is still the basic tool for most data journalism.
For cleaning data Open Refine is excellent – but you can also use some of its functionality in Workbench Data which includes data analysis, combination, scraping and visualisation tools.
For visualisation Datawrapper is great for making charts and maps against a deadline. Infogram and Flourish and Tableau (which also does data analysis) are all worth exploring too.
But really I think there can be too much of a focus on tools and ‘which tool I need to learn’. It’s always best to decide on what you want to achieve first, and then look for the tool that will help you do that.
Some people find themselves drawn more to the visual side of data journalism, for example, while others are inspired by the potential of interactivity; others want to be able to manage the data that comes through using FOI; and some love the idea of using data skills to obtain the data in the first place. Each of those leads you to a different tool and skillset, and if you’re motivated by the story you’ll be much more likely to keep learning.