Foundational ID systems may collect various types of data, as shown in Table 27. The choice of which specific attributes are collected is fundamental to ID system’s inclusivity, utility, cost, and trustworthiness, including the extent to which it complies with data protection and privacy standards and good practices (see Figure 20). For example:
Which data are collected impacts who is likely to be excluded from identification (e.g., some people may not be able to provide certain biometrics).
The type of data collected will determine the uses and utility of the system for various purposes (e.g., certain use cases may require specific attributes).
At the same time, the collection of more data than what is needed—including sensitive attributes—increases the cost of registration, creates data protection risks, and decreases the reliability and accuracy of the system over time as non-static attributes (e.g., occupation, education, address, etc.) become out of date.
Key decisions regarding data include:
What biographic data will be collected and verified, including defining the minimum set of attributes necessary and how to handle sensitive data.
Whether biometric data will be collected, and if so, which types.
These decisions will go hand-in-hand with decisions made about the registration process to collect and proof identity data, the types of credentials and authentication mechanisms used, IT infrastructure including data storage, interoperability frameworks for data exchange, and the enabling legal framework and associated privacy and security controls adopted to govern and protect personal data.
Table 27. Types of data and evidence often collected by an ID system
|Biographic||Biographic and other attributes of a person||Name, age, sex, address, nationality||
Establishing a person’s basic identity attributes; can also be used for deduplication but can be inefficient an inaccurate (e.g., when many people have a similar name)
|Biometric||Physical or behavioral attributes of a person||Fingerprints, irises, facial image, signature||
Deduplication during identity proofing and/or as an authentication factor
|Supporting evidence||Identity-related documents provided during the application process or vouched by a trusted person||
Birth certificate, passport, driving license, voter ID card, utility bill, testimony/letter by a local government official.
|Substantiating (“proofing”) a person’s identity during registration|
Metadata (collected passively without input from end-user)
|Information about data and/or its capture and use, including logging who has accessed the data and when||Name/ID of registration agent, time and location of registration, date/ID of official who accessed data, metadata of the biometric data, checksums||Controlling the quality of data entry, providing context for its collection, creating an audit trail of entry and use|
Source: Adapted from the Digital Identity Toolkit
Figure 20. Key considerations for the types of data collected
Certain groups may face technical or practical difficulties providing specific data (e.g., certain biometric modalities) and evidence (e.g., birth certificates or proof of nationality or immigration status), which may deter or create barriers to participation.
|Collecting large amounts of data increases information security risks and decreases accuracy and completeness over time as data become out-of-date.||Data protection standards require minimal data collection and purpose limitation in order to minimize risks to privacy and security (e.g., from cyberthreats, function creep, unauthorized disclosure, etc.)||More data fields and strict evidence requirements lead to higher costs and longer registration timelines, including to validate the attributes.|