The quantum of data being generated every minute comes with a risk, the risk that this data will not be protected. Each time data is used or re-used for a different purpose, an individual’s right to privacy must be weighed against the rights of citizens and communities and the benefits for society more widely. If privacy is not considered in the process of data collection, re-use and sharing, not only will citizen’s trust be undermined and a variety of regulations and laws may be broken, but citizens and the government could be exposed to fraudulent activities and other dangerous actions.
Many governments lack the necessary legislative frameworks and regulatory environments for data, which consequently makes individuals more susceptible to fraudulent activities, hacking, phishing and identity theft. Especially while using new or re-using existing data for decision-making, it is important to consider confidentiality, integrity and availability of data in your work.
The first question you should ask yourself when planning to process personal data should be “What is my reason or justification for processing this personal data?” This question is important since processing personal data usually is only lawful when there is a legal basis.
The GDPR, for instance, sets out what these potential legal bases are, namely: consent; contract; legal obligation; vital interests; public task; or legitimate interests. Processing personal data therefore may require that the data subject has consented to the processing.
According to the European Commission (EC), “a consent request needs to be presented in a clear and concise way, using language that is easy to understand, and be clearly distinguishable from other pieces of information such as terms and conditions. […] Consent must be freely given, specific, informed and unambiguous.”
The EC specifies the information that should be provided to data subjects when obtaining consent for processing personal data, including:
Also see this helpful checklist from the UK Information Commissioners Office on what to consider when asking for, recording and managing consent.
Usually, consent should meet the standard of an unambiguous indication by clear affirmative action (opt-in). As long as this standard is met, different ways of obtaining consent are possible, including:
In recent years, new approaches to informed consent have been developed to enable ongoing engagement and communication between individuals and the users of their data. Dynamic consent is one such example, mainly applied to facilitate participant engagement in clinical and research activities over time.
Authorized public purpose access (APPA) is another innovative method beyond the explicit, opt-in consent of individuals, promoting data flows while simultaneously protecting people’s rights.
Data privacy is deeply intertwined with data governance, as protecting data necessitates keeping data in secure locations and in the right hands. With the basics of data stewardship covered under the data governance section, we will focus here on keeping personal data private during public use.
Personal data – like names, demographic data and political beliefs – require particular protections, so that individuals are not identified or targeted based on the data they provide. For instance, an individual should not be targeted for filling out a survey on who they plan to vote for in the next election. In this instance, personal data needs to be kept private and systems need to be developed to keep data private.
The UK Office for National Statistics offers a helpful tool for outlining the five areas of safety that governments should address when using personal data for public use:
Concerns about privacy and individual protection arise when collecting personal data on subjects such as race, gender, age and socioeconomic status. The decision to collect, disseminate and use such data involves a constant tension between the public interest and the need for privacy. Though data points can help us better understand a situation, it may also compromise the anonymity of the data and expose individuals to re-identification risks, which is why all people, including public officials using personal data, must carefully consider these trade-offs when selecting which data to use.
In addition to best practices that endeavor to keep data private, there are also more sophisticated emerging technologies, known collectively as Privacy Enhancing Technologies (PETs), which can better protect the privacy and security of data, especially when data are shared.
PETs work by limiting access to individual data, either by transforming it, encrypting it or storing it on a different system, while still enabling analysis. PETs can be an enabler for innovation by allowing for the safe sharing and processing of data. PETs can also enhance privacy in existing projects. But PETs cannot fully address the privacy challenges in each data-sharing system and must be applied within a wider data privacy and protection infrastructure.
A detailed overview of PETs is available from the Centre for Data Ethics and Innovation. The most common PETs include:
To determine which types of PETs may be beneficial in your upcoming projects, the Centre for Data Ethics and Innovation’s decision-tree Adoption Guide can be helpful (see illustration below).
Another comprehensive overview of relevant privacy-preserving techniques can be found in this Handbook from the Big Data UN Global Working Group:
Centre for Data Ethics and Innovation’s PET Adoption Guide
Data storage needs to be handled by database management systems (DBMS). DBMS is a collection of programs that manages the database structure and controls access to the data stored in the database. A relational database management system (RDBMS) is used for creating, storing and connecting structured data and then rapidly retrieving via a query language.
Rules can be applied for data security, connecting data and enforcing referential integrity. This is vital to ensure data quality which cannot be guaranteed using unmanaged data stores such as Excel. The most widely used commercial DBMS are Microsoft SQL Server, Oracle, Sybase and IBM. There are also open-source examples including MySQL, MariaDB, MongoDB and PostgreSQL (UN DESA).