Mapping your data ecosystem has shown some cracks in what data is available, how it flows and where it flows, such that you are now seeing some data gaps. Let’s have a closer look at them.
Photo: Unsplash License
Data gaps form when quality data that is critical to formulating effective policies for the citizens is not readily available.
For instance, there may be a pressing need to address educational levels in your country. The educational outcomes across different geographical regions (counties) in the country vary drastically, and it may be intuitive for you to start analysing the data there. However, are you considering data on other important aspects that impact education levels - such as residence neighbourhood, socio-economic status, gender or specific cultural aspects in these regions? Are there any political and historical factors that are also shaping citizen preferences? Do you have quality data that can help you make effective, evidence-based decisions rooted in these granular aspects?
Data gaps lead to missed opportunities for creating effective policies as well as an inability to properly attribute impact on the ground. People who are most at risk of being left behind are the ones most affected by data gaps since they are most likely under-represented or missed in the data. This section addresses how you can identify the type(s) of data gaps specific to your problem that are impeding your policies to reach their full potential. This is a crucial transition point from analysing the data problem ‘as is’ to drawing up a plan for bridging the gaps.
A good way to start is by looking at the data gaps from the ‘source’ or at the initial stages from where the data is being produced. This typically involves looking carefully at the description of the data set being used including raw data structure, design of survey questions (in the case of census or other national data sets) or a general analysis provided by the data producer.
Do you believe the data you are looking for is entirely missing?
Good to know: NSOs often possess more data than is disseminated to you. Start by talking with relevant officials at your NSOs before deciding on data gaps to avoid duplicity in solutions.
Some commonly seen types of data gaps emerge from the incompleteness of data, timeliness (or lack thereof) of data, lack of appropriate data coverage for policy decisions to shaky data flows. Below are some attributes of data to keep in mind while identifying and classifying your data gaps.
Classifying your data gaps:
Unavailable and incomplete data is often the foundational reason for why you are not yet able to leverage it for effective policy design. You have probably identified this unavailability of data at the problem definition or map the data ecosystem stage. For example, you want to measure the impact of climate change on natural resources to formulate or strengthen your climate policy. However, historical data for annual mean temperatures, precipitation values or forest water balance is not easily available. Can you formulate an effective climate policy that works for all regional contexts in your country with no or limited information on the impacts of climate change in specific regions over the years?
Data timeliness issues rise from the potential lag between the moment a data point is being collected and the time it is being used in your decisions. Most federal policies consider data from national census, economic as well as health surveys, among others. While these data sources provide maximum national coverage that is important for decision-making, the frequency of the collection of this data for most countries is every ten years. Administering these surveys is very expensive and increasing the frequency is not feasible. However, it is imperative that our policies keep in mind that the relevance of data collected ten years back may change in present times, even with predictions and especially in the post-COVID era.
Data accuracy is the level to which data represents the real-world scenario and confirms with a verifiable source i.e., consistency of data with reality. Accurate data is substantial for forecasting, planning, program budgeting and strategy development in governments. At the same time, inaccurate data can lead to wrong decisions and have tremendous unintended consequences. For example, education data typically involves data compiled from school districts on graduation rates, drop-out rates, test score averages and attendance rates. Education data is often used to measure the success of a state or a school district, and policies are evaluated and redesigned based on them. But there’s a problem. This information is not always reliable, and the fault lies in the way the data is collected (data entry), compiled and presented.
Tips for ensuring data accuracy
The data you work with needs to be detailed, granular and disaggregated for the conditions of different sectors of society to be understood, for example, showing:
Not all of these details are as relevant in different issues. Addressing education levels in a country may demand different levels of data disaggregation than addressing agricultural productivity. Similarly, other factors such as mobile phone ownership and bank account ownership, are also increasingly playing an important role in understanding the context of present-day problems such as multidimensional poverty. Even more important is how these different factors put together can completely miss certain populations. For example, data from certain tribes in your country might not be easily available. Even with what you have, you may be missing representation of women or other sexes within these tribes – how then could you make your policies and programmes work for the entire population? The lack of this granularity in different aspects may create blind spots in your work.
While data gaps can exist in many shapes and forms, recent years have brought into spotlight the specific data gaps that are a result of women being consistently underrepresented or overlooked in data ecosystems causing a gender data gap that has led to lack of knowledge about their living conditions. Acknowledging this challenge, the UN Women’s global gender data programme, Women Count, in collaboration with the ISWGHS, has produced the Counted and Visible: Toolkit to Better Utilize Existing Data from Household Surveys to Generate Disaggregated Gender Statistics. This resource may help you to bridge the gender data gap.
The Counted and Visible Toolkit provides recommendations and practical country examples on how to utilize existing data to generate disaggregated gender statistics.
Granularity of data is one of the biggest contributors to ensuring the quality and reliability of data. For more information on frameworks for ensuring data quality, see the data sources and reliability section.
Different aspects of your problem will have different levels of data maturity. And therefore, some gaps are more easily identifiable than others. However, one key indicator to move forward in the process would be to answer the question ‘Will I have all/most of the data I need to solve my problem if I am able to access the data identified?’ Sub-parts to this question may look like this:
Once you have identified your data gaps, the next step is to understand the feasibility of bridging them, given the limited resources and competing priorities you may have. Classifying your data gaps may be a good way to understand this feasibility.
The next section will take you through a number of resources, recommendations and examples on different types of data sources and how reliable data can be accessed, collected and used. However, at this stage, there are already a few resources that can be used to dive deep into your data gaps.