You already have a good understanding of your current data ecosystem, meaning you know which data sets are available to you and where you can find them (if not, see this guidance). What you need now is additional data and information. This section will support you in identifying new, trusted data sources to address your data needs.
Photo: UNDP
This section will provide you with an overview of conventional, as well as emerging data sources that can complement the sources you already identified. You may be wondering how to trust these new sources, which is why you will find some guidance on identifying reliable data sources as well.
A good first step is to create a consolidated list of the data needed and not yet available. The list should be informed by the earlier problem definition and data ecosystem mapping exercises. If the list is long, prioritizing the different data needs can help.
As an example, let’s assume you are working on a policy to improve the situation of people with disabilities in your country. You may already know how many people have a disability, what type of disability they have and what their background is (e.g. education level, income level). However, you lack data on their everyday life, such as, can they access public spaces (e.g. public transport, theaters, etc.)? Is sufficient housing available for people with a disability?
Your data gaps are therefore:
As a reminder, below you will find different types of data that you may be be looking for.
The next question is, where to find these data?
The most established data source is often the National Statistics Office (NSO), as well as national, regional and municipality governments and affiliated agencies.
Linking back to the example above, the data on number of people with a disability, the type of disability and their background may have been provided by your NSO.
You are likely aware of these sources and what data they provide, however, you may still find the following summary helpful:
More relevant in the context of identifying new data sources are non-government, non-profit public sources:
New trends for data collection:
Citizen-generated data is the practice of involving the public (non-scientific) community in gathering knowledge. While it can take different forms, it involves citizens collecting and sharing specific information for a dedicated, often non-profit, organization. Some examples include: volunteers assisting NASA in identifying clouds on Mars and Argentinians separating and documenting their solid waste in townships. This form of science is becoming increasingly relevant as technologies make data collection easier. For political decision makers, it can offer significant benefits as more relevant data is collected at a local level, providing important insights for policy design.
Private sources can generally be split into two different categories.
First category includes the companies (e.g. multinationals and start-ups) that are collecting data for their operations. A standard example is social media companies owning data on user interests as those users shared the data on their social media profiles or via social posts. Analysing this data can for example help to understand how ideas and movements on disability spread on social media and how they are perceived.
Second category is the service providers who are specialized in data collection and analysis (for example, to conduct market studies).
While both may be relevant data suppliers, the first category provides more data that can be used for policymaking. Focusing on that category, the following type of companies can be of relevance to you:
The data provided by these sources may be closed data, shared data or openly available data. To learn more about how to access them, see the next section on accessing and collecting data.
The process of identifying the data source that suits your needs is often an iterative process. It usually requires checking in with your NSO to double check what data they have available. In addition, it will need someone to conduct brief research (desk research, expert interviews) on what non-governmental and private organizations exist in the relevant field. Following that, engaging in conversations directly with these actors to understand their willingness to share data and the quality of their data has proven to be helpful.
In that process, there are two key considerations to keep in mind:
Reliable data, and thus trustworthy data sources, are essential for policymakers to create effective policies. The previous section on “identifying data gaps” provided guidance on ensuring a conventional data source is complete, timely, accurate and of sufficient quality.
With new data sources, however, the challenge of knowing when to trust a source becomes more complex. The data are collected for a different purpose than informing policy, raising potential questions on the data collection and management methods, the data representativeness and the applied privacy standards. Especially in the case of non-profits, the team managing the data may have limited resources, leading to an increased risk of quality issues with the data. What is more, new data sources often provide large datasets which amplify existing biases. For example, in countries where a high share of the population is not yet using the internet, data collected from activities online (social media) may not be representative. In addition to the measures outlined in the previous section, you may want to consider the following aspects, which build on this resource.
Are they reputable? Organizations that are larger or embedded in the local ecosystem can be trusted more easily. Asking trusted partners confidentially about a certain organization may be another way to double check the reputation of an organization. Further, critically questioning the potential political agenda of an organization is crucial.
Do they have a track record of ethical data production? This can be explored by seeing if they worked with other governments in the past, published open datasets or provided resources to help the public use data more ethically.
Do they have sufficient resources available to answer your requests, provide insights into the data and continue to collect the data ethically? The information may be available online. Alternatively, through engaging with them over time, you will be able to get deeper insights into their resources.
Are data being released in clean datasets? This question may be answered by the NSO or a data analyst on your team.
Has sufficient metadata and other contextual information been provided? There are a few key questions that you can ask the data provider and then double check their answers with your data experts:
Parts of the information may not be available publicly but should be provided by the data sources upon request. Ensuring you have this level of transparency with the data source will also be important for data analysis later, as comparing and combining data requires your team to have a good understanding of the methodology behind the datasets.
Once data has been shared, does the data source allow for feedback on datasets as well as independent verification of sensitive data? Ideally, the data source agrees to a continuous engagement where feedback provided by you is taken into consideration.
After working through this section, you should have a list of (new) data sources that will provide more relevant data to address your policy problem. You may have to go back to the ecosystem mapping exercise and the identify data gaps section to double check if all data needs are covered.
In addition, it is important to have confidence in the data sources identified. Meaning that it provides complete, timely, accurate and sufficient data with the necessary quality and privacy measures in place, as outlined above.
It is important to remember that this is an iterative process. You may have identified a private sector partner as a new data source now and will realize later that the collaboration may not materialize. That is normal.
Mapping your data ecosystem and understanding where data and data producers might already exist in your work is an important step towards learning what data you may or may not have available to you already. The next step is to get access to the data or collect it yourself.