Biographic data

When decided which biographic data to collect, Internationally-recognized good practice suggests specifying the “minimum set” of identity attributes that uniquely represent an individual. In essence, the minimum set consists of the core attributes used to identify a person by most applications for most purposes. In addition to this data, certain other fields, such as biometric data (discussed in the next section) may also be collected, either to ensure statistical uniqueness and/or for later use in authentication. See Box 27 for examples of minimum data sets.

In some cases—particularly where identification and information systems have been historically weak and there are few reliable sources of data on individuals—countries may be tempted to use the opportunity of building of a foundational ID system to collect lots of personal data for a variety of purposes (e.g., education status, marital status, household information and income information needed for targeting a social program). In general, however, it is recommended to keep the number of data fields as close to the minimum set as possible. Increasing the number of attributes collected will also increase:

  • Time and cost for registration. Collecting—and then vetting—many data fields will increase the time is takes to register a person and is therefore a major contributor to costs of ID systems. In addition, collecting many data fields will decrease convenience and increase costs for individuals (i.e., more time spent cueing), which can create a barrier to registration.

  • Inaccuracy of data over time. Any data fields that can change over time (e.g., address) require additional procedures and cost to keep updated and avoid inaccuracies over time. Collecting more non-immutable data fields than necessary (e.g., education, occupation, household information, etc.) therefore increases the probability of inaccurate data and/or the frequency with which potentially costly updates must be done.

  • Risk to privacy and data protection. Collecting data without a clear use or purpose does not meet international standards on data protection and privacy, including the Fair Information Practice (FIP) principles that data collected must be proportional to the use case and fit for purpose. The more data collected, the greater the privacy risks if that data is compromised.

In addition to the number of data fields collected, countries must also consider the implications of requiring certain biographic attributes, such as potentially sensitive data.

Box 27. Examples of minimum sets of personal data

The EU’s eIDAS Implementing Regulation (2015/1501) established a minimum set of unique identity attributes for an individual for the purposes of basic requirements for mutual recognition of digital identity schemes. Mandatory attributes include: (1) current family name(s), (2) current first name(s), (3) date of birth, and (4) a unique identifier which is as persistent as possible in time. Additional attributes include: (5) family name at birth, (6) first name at birth, (7) place of birth, (8) current address and (9) gender.

In India, to minimize the burden of registration and promote inclusion, the Aadhaar ID system limits the biographic information it collects to an individual’s (1) first name, (2) last name, (3) gender, (4) date of birth, and (5) address. Additional biometric fields used for deduplication and authentication include ten fingerprints, two iris scans, and a digital photo.

In Australia, the Trusted Digital Identity Framework: Attribute Profile (March 2019, version 1.4) defines core identity attributes as: (1) family name; (2) given name; and (3) date of birth. Other data can be collected by identity providers.

Source: Adapted from the ID Enabling Environment Assessment (IDEEA), Australian Government (2019).

Sensitive biographic data

Although all PII can be considered “sensitive” data, certain biographic fields can be particularly sensitive, in the sense that they are personal in nature or might have a serious impact on the individual (ISO/IEC 29100 Privacy Framework). When collected or made public, such data could facilitate profiling or discrimination against a person or put them at serious risk of harm. Which attributes are deemed most sensitive will vary by context, but this typically includes characteristics such ethnicity, religion, sexual orientation, gender identity, health information, political opinions, criminal convictions and more (see the IDEEA tool for further discussion).

Ideally, foundational ID systems intended to provide identification for general use should not collect and store this type of information because:

  • The risk to individuals is high

  • The utility of this data for general purposes is low

  • The ability of a foundational ID system to keep “sectoral” data accurate and up to date is not as high as those agencies responsible for those sectors

  • Extra data fields can add significant cost

There are, of course, certain use cases for which these data are needed and collected as part of a functional ID system, such as a database used to target social transfers to an underprivileged group, or for electronic health records. In such cases, however, separation of purpose should be maintained so that sensitive data is collected and managed separately by an appropriate entity (e.g., the Ministry of Social Affairs, healthcare providers, etc.) rather than the foundational ID provider.

Furthermore, and in line with Principle 6, ID systems should not disclose this type of sensitive personal information except for pre-specified and authorized purposes. This means, for example, that these attributes should ideally not be programmed into ID numbers or included on cards, as this makes them widely legible and is therefore a violation of privacy. Furthermore, access to individual-level sensitive data by other government actors should be prohibited (ideally) or severely limited and regulated. The decision to collect any sensitive data should be subject to a thorough risk assessment during the planning phase and reflected in the legal framework.

Box 28. Examples of policies regarding sensitive data

Under the EU’s GDPR, data regarding an individual’s racial or ethnic origin would be considered “special category data.” Given the sensitive nature of special category data, the GDPR provides for additional protections to ensure that the processing of such data is lawful. For example, to process special category data, an entity must identify both a lawful basis under Article 6 and a separate condition for processing special category data under Article 9.

In the United Kingdom, the Data Protection Act 2018 introduces additional safeguards in relation to special category data. For example, where processing for law enforcement purposes is “sensitive processing,” there must be an “appropriate policy document” in place which explains the procedures for securing compliance with the data protection principles and the periods for which personal data is likely to be retained.

Source: Adapted from the ID Enabling Environment Assessment (IDEEA).