Sunday, June 22, 2014

Keeping an Eye and a Thumb on Entrenched City Halls

As stated in a prior blog post, there is no plan to turn this blog into a data journalism site. There is still though a great deal of commonality between the efforts of new community paradigms and data journalism. Partnerships would seem to be a natural relationship. This is particularly true with new community paradigms efforts working from the outside, engaging with entrenched political institutions over a protracted period of time. This takes a more disruptive approach to establishing new community paradigms. Entrenched politicians often use anecdotal stories to support their decisions. Having a thorough analysis of the facts, especially those hidden from public view could help leverage change in a community. Even if a data based story is not of larger publication interest it could still be of importance to the community in which it is occurring.

The first challenge is finding out what data there is available. This can depend upon how consistently open the government is to transparency and open data. Those which have an open approach can offer different methods of keeping current.

Some sites will offer “email alert” functionality. The Doing Journalism with Data, First Steps, Skills and Tool course offers UK government publications as an example, here is the American version. Others will offer RSS feeds (read more about RSS). The course features the Office of National Statistics in the UK and Here in the United States there is Data and Statistics |

This accessibility does not seem to be as prevalent though at the state and especially at the local levels of government and when it is available one needs to be sure that it is providing untampered and unbiased information.

Even when the institution does not offer either email alerts or RSS feeds, there are alternatives available such as which can track changes and send update notifications with or without the willing cooperation of the institution.

There are also advanced Google search techniques that you can use to help determine what sort of information you want and where that information might be.

A Google search though can provide too much information and there can be a desire to focus the search. Searching with an exact phrase by using quotation marks e.g. “crime statistics” has fewer results compared to crime statistics with no quotes. Using a minus sign can exclude data, e.g., “crime statistics” versus “crime statistics” -national, the later will provide a more local and regional focus. You can also broaden your search with a wild card asterisk "*". Using * for example within an exact phrase such as “crime statistics city of * ” will give you a list of websites of cities and their crime statistics. You have to experiment though and the number of results generated does not stay consistent.

You can also try searching within particular types of websites for specific information. Using "community health” provides community health related websites in California ending with Using filetype:xls will give you results formatted as Microsoft Excel spreadsheets, filetype:doc as Microsoft Word documents and filetype:pdf as PDF’s.

Searching for databases is a different matter as the contents of a database might not be visible to the search engine. In that case you can use the word database "search by" instead of filetype. Databases though are the best means of organizing data for analysis if made efficiently accessible.

More advanced search operators such as Google’s advanced search, Google Guide advanced search operaters page and the ability to combine different search operators are also available.

Sometimes, however, a possibly more confrontational approach, using legal avenues to obtain information by requesting it directly from government institutions through Freedom of Information or Right to Information laws is needed, when you cannot find it online. These are rules or regulations that provide citizens access to particular types of information, such as environmental laws.

To understand these legal rules and regulations check with sources for Freedom or Right to Information laws, or connect with people fighting for these laws.

You should speak with the organization in question before making the request, to check if they do hold the data but also anticipate possible exceptions and exemptions. Consider what judgements have been made on previous or similar requests.

The second challenge, beyond gaining access, is considering the means of analyzing the type of information you are seeking, whether it follows changes over time, requiring data over time or it compares items, meaning you need data dealing with the comparison or sometimes you need data that can add context to an issue.

All of this depends though on how the institution maintains its information online and how it provides access to the public. As suggested above, maintaining the data in machine readable form within databases is the best way of establishing a transparent open data environment.

If this is the case then you can request a related data dictionary. If investigating, for example, crime, gender, victim, and suspect information gathered by the police or investigating gifts or hospitality given to individuals by organizations and that information is kept in a database, then a data dictionary can be very useful.

These have data codes, designed specifically for combining data sets that use the same code. The course provides the HES online Data Dictionary Inpatients (PDF) from Great Britain. Here is access to data dictionaries for the United States Center for Disease Control and Prevention.

Obviously, it would be easier if all government institutions organized their data this way. This provides a good rationale for supporting open data laws. Often though the data is not so neatly organized and must be extracted.

There are means of collecting and extracting data usually left to those focusing on data journalism. Still understanding these methods so as to better work as or along with data journalists seems a good idea.

A primary means of extracting data is called scraping: using Google Drive spreadsheets or other online tools such as OutWit Hub ,, Chrome extension Scraper or Scraperwiki. With a Google Drive spreadsheet which is available to just about everyone, it is matter of using the function formula =ImportHtml(“URL","query",index) with the URL of interest address in quotations, the type of data format which is either table or list you want, also in quotations and the sequence number of specific table or list that you want which is not in quotes.

As an example, the World Cup has been ongoing for the last few days so the course provides a FIFA webpage that publishes information on football agents in Cyprus. The formula for extracting the first table on the page is =ImportHtml(“”,”table”,1). Be warned though, you cannot just copy and paste from a Page or Word document or even blog post.

The New Community Paradigms wiki provides a variety of different of organizations, often through Facebook connections, outside the framework of institutional government that can provide knowledge and advice on a range of issues.

You need to map out what you’re interested in and which organizations, both governmental and non-governmental deal with that area. You might also seek other experts or pro-amateurs that collect and publish data by thinking about the people affected by the issue that interests you.

If looking for data in specialized areas of knowledge then you need to understand the jargon or professional language used in that arena. Sometimes you have to pick up the telephone and ask someone.

Data journalism pursues in depth analytical stories that may be beyond the capacity or interest of an organization focused on particular community issues. There may be times though when statistics or data on a certain issue are needed to make a case for change in a community. There are a number of different types of sources of information available to build a case for change.

1) National statistical organizations

2) Local, regional and national governments and departments

3) International bodies

4) Regulators and auditing bodies

5) Charities and non-profit institutions

6) Corporations

7) Professional bodies and unions

8) Open data initiatives in your field

Google’s Public Data page

The Guardian’s datastore


At some point, these efforts take on the aspects of a long term campaign by an established and organized group rather than one time issues. Maintaining a data library then becomes essential. The course suggests using social bookmarking services such as Delicious or This site uses the Diigo Group Page feature for new community paradigms to organize sites and related tags. You can also keep records of your sources in a master document (Google Docs, Microsoft Excel). Proper tagging allows you to find your data fast.

If starting to work as an organized group for change through new community paradigms, then the basic idea behind organizing your means of accessing, extracting, collecting and analyzing data in this fashion is the same as it is for data journalism.

"If you need to do something more than once, get a computer to do it for you," Charles Arthur (The Guardian).

Finally, even if the data is not available or not made accessible by a government institution or city hall, it still raises questions. Why is there a lack of data? What are the implications if certain data is not collected? What about data that exists but is not shared? These concerns still fall under the endeavor to create new community paradigms.

