Hosting options

Significant computing resources are needed to store and process identity data (e.g., in response to identity verification queries), and there are multiple options for hosting this infrastructure. Key decisions include:

  1. Who operates the physical facilities (“datacenters”) that house the IT infrastructure, providing power, cooling, physical security, network connectivity, etc—the ID authority, another government agency, or some form of private sector provider.

  2. Whether the infrastructure itself is dedicated to the ID system (so-called “single-tenant”) or part of a pool of shared resources, available on-demand to multiple clients (so-called “multi-tenant” or “cloud” computing).

In particular, practitioners should evaluate the following solutions for ID-related hosting in light of system requirements and country context:

  • Dedicated datacenter operated by the ID authority. Some countries choose to host data and applications in-house through the use of dedicated datacenters. This option gives full control over all components of the ID system, including physical facilities and access, hardware, software (operating systems, applications, technical services), and data. However, it also requires the ID authority to take on significant responsibility and all capital and operating expenses for those components, as well as ensuring the presence of technical expertise needed to support the ongoing operations and maintenance. It could also be an unnecessary duplication of an existing shared datacenter operated by a central IT ministry or similar agency in the country, if such exists.

  • Shared datacenter operated by another government agency. Another option is to use a datacenter run by a central IT ministry or similar agency that provides shared hosting services for multiple government agencies (and sometimes also the private sector). Costs for this solution are typically lower than running the in-house datacenters because of the scale advantages in sharing capital and operating costs. The operator will often offer additional services (such as server maintenance and patching, backup and restore, etc.) with similar economies of scale. There are security and resilience implications of the multi-tenant model, since the underlying infrastructure is shared with other clients. These are discussed in more detail below. A shared datacenter typically supports three broad models of provision:

    • Colocation. In this model, the shared datacenter provider offers space, power, physical security and network connectivity. The ID authority provides and configures and operates its own the infrastructure (servers, storage). Accordingly, the authority bears the capital cost of its infrastructure, the allocated charge for power and space, and the staffing and running costs for operations and maintenance.

    • Managed hosting. In this model, the datacenter provides and operates the IT infrastructure as well as the physical facility where it resides, in a “single tenant” configuration designed for and dedicated to the ID authority. The provider will either charge an up-front capital cost for this infrastructure or use a leasing model with a specified minimum term. A regular service charge will cover operations and maintenance. Limited flexibility to increase or decrease the dedicated capacity over the term of the contract may be built in.

    • Government (“private”) cloud. In the cloud model, the datacenter operator is also responsible for all physical facilities and IT infrastructure but uses modern “virtualization” technologies to pool this infrastructure and make it available on a flexible, pay-by-usage basis to multiple clients (known as “Infrastructure-as-a-Service” or IaaS). Cloud operators often offer additional service layers such as databases, authentication services or analytics platforms (so-called “Platform-as-a-Service” or PaaS). There is no up-front capital cost, while all the operating and maintenance costs of the underlying infrastructure are included in the pay-by-usage charges. This model is also extremely flexible, allowing the authority to provision (or de-provision) required infrastructure capacity very rapidly, and pay only for what it uses (with charges by the second in some cases). In return, the authority sacrifices control over the configuration of the underlying hardware and must choose from a menu of infrastructure configurations offered by the cloud, which may not support some applications with specific infrastructure requirements.

  • Shared datacenter operated by commercial organizations. Private sector firms also offer infrastructure hosting on the same models just described—colocation, managed hosting and multi-tenant cloud. They operate in very similar ways to shared government datacenters but are likely to serve a wider range of public and private sector clients, with a correspondingly wider range of services (some private sector operators also have datacenters tailored and restricted to government agencies.) In addition, the physical location of the datacenters is in most cases outside of the country, and the data stored may migrate across datacenters depending on a number of factors independent of the contracting authority. There are also so-called “hyperscale cloud” platforms—such as Amazon Web Services, Microsoft Azure and Google Cloud Platform—from specialists in the multi-tenant model that have multiple data centers across continents with high-capacity networks connecting them to each other and to the wider internet. Because of their scale they can offer a very wide range of IaaS and PaaS options and can also provide replication of data and infrastructure across geographies to support highly resilient, highly accessible large-scale applications. As these platforms are open to any client able to pay, they are known as “public” clouds.

  • Hybrid approaches. It is possible to combine different elements of the models described above in a hybrid solution. For example, a government datacenter could be used for core storage (e.g. sensitive and restricted data) and compute provision, with flexible hyperscale public cloud capacity added to meet peak demands on the system; or commercial managed hosting could be used for the current “production” solution, with cloud capacity used for development and testing of new features and applications. (A hybrid approach that combines clouds from multiple providers is also known as “multi-cloud”.) Hybrid hosting strategies are very common in commercial IT, as they have the potential to offer a “best of all worlds” solution. However, there is clearly added technical and commercial complexity in managing multiple infrastructure models with multiple providers; this needs to be balanced against the benefits.

Some key differences between these options are summarized in Table 29.

Table 29. Comparison of data storage options

Option CAPEX OPEX Required staff Control over infrastructure Elasticity & flexibility Network & connectivity Data location

Dedicated, agency-owned data center

Most expensive, includes cost of equipment and datacenter facility (building, fire protection, power etc.)

Most expensive, OPEX for equipment and datacenter expenses, including staff

Datacenter, network, physical security, server/system administration, application/ database administration, cybersecurity

Full control over data and all components of the infrastructure

No elasticity, least flexibility in provided services

Good LAN connectivity required (and good G-NET connectivity required for data sharing)

On premises

Shared datacenter – collocation (government and private)

CAPEX for equipment collocation

OPEX for collocation costs and own equipment

Server/system administration, application/ database administration, cybersecurity

Control over data and collocated equipment

No elasticity, least flexibility in provided services

Good G‑Net connectivity required

In country

Shared datacenter – managed hosting (government and private)

CAPEX for infrastructure and equipment are typically born by the datacenter provider but it can vary by provider

OPEX for managed services

Application/ database administration

Limited, as provided by the contract

Limited, as provided by the contract

Good G‑Net connectivity required

Typically in country

Government cloud No CAPEX for the ID authority, costs are born by cloud operator OPEX for resource usage (pay per use model)

Application/ database administration

Control over data and own applications

Elastic. Some flexibility in service availability

Good G‑Net connectivity required

In country

Private-sector operated public cloud

Application administration

Control over data

Elastic. Flexible service availability

Low latency required for business-critical systems (e.g. fintech)

Anywhere the provider is operating datacenters

Hybrid cloud

Application/ database administration

Control over data and own apps

Elastic. Most flexibility in service availability

Good G‑Net connectivity required

Sensitive data stored in country; other data stored in private-provider datacenters with a global scale/footprint

The appropriate choice of a data storage solution will depend on a number of factors, including:

  • Existing infrastructure and service providers: The viability of any particular data storage strategy or solution depends, foremost, on the availability or existence of dedicated datacenters and trusted government and/or private-sector provided datacenters and cloud services. If an in-house datacenter does not already exist, building one will be a major expense for the ID program and will take some time to build (e.g., potentially several years, depending on procurement procedures, required authorizations, existence of powerlines, connectivity etc.). At the same time, it may be the only option if shared datacenters and/or cloud services are not available or desirable for other reasons. When data storage is contracted to a government IT agency or a private-sector provider, is it essential that the service provider offers appropriate service-level-agreements (SLAs) to meet the needs of the ID authority.

  • Storage and processing capacity and elasticity. Data storage services must have enough storage capacity and processing power to meet demand, both during high-volume start-up (e.g., mass registration), peaks in demand (e.g., around cash transfer distribution dates), and medium- to long-term growth (i.e., as a function of population size and the expansion of services). A primary advantage of cloud services is that it offers the flexibility to automatically add or remove computing resources when needed to meet requirements. In contrast, datacenters require purchasing enough equipment to handle spikes in usage; however, at low-volume periods most of this equipment sits idles for as much as 90-95 percent of the time. At the same time, depending on the technical specifications and service requirements, cloud hosting may not be the best solution for certain functions that require dedicated high-capacity computational processing power, such as an Automated Biometric Identification System (ABIS), which is extremely resource intensive. In these cases, it is possible for an ID authority to store most of its other functions that do not have such high computational processing requirements (e.g. database, core software and the authentication system) on the cloud, while the ABIS and its biometric libraries are stored on a smaller, dedicated data center.

  • Cost and budgeting: Datacenters have higher capital expenses (CAPEX) for the ID provider than cloud services, and also have higher ongoing costs related to staffing. As described above, optimal performance for a datacenter requires paying for equipment that is often idle; while cloud services offer a pay-for-use model. At the same time, this means that the monthly or annual operating expenses (OPEX) of datacenters are more regular than for cloud services, which can be highly volatile due to fluctuations in activity. Therefore, although cloud computing for ID systems may help optimize resources, it is only feasible if the authority’s budget and business model can accommodate variable expenses. Under that model, the budget allocated to the ID authority has to be agreed upon and negotiated with Government to ensure yearly appropriation to sustain the service.

  • Connectivity. Transferring and updating data in an ID database requires sufficient digital infrastructure to connect to the datacenter and/or cloud—in terms of both speed and reliability. Private-sector provided cloud services in particular requires very fast, regular network connections. If network infrastructure is unreliable or is already highly utilized, cloud-computing may be too much of a burden, causing applications to crash or be inaccessible. In such situations, a private cloud on a dedicated line could be considered, but a private or hybrid cloud would by unviable without infrastructure upgrades. In the case of biometric verification, high-speed broadband connectivity is needed to ensure the software applications used for matching algorithms perform robustly, reliably, and securely.

  • Control and location of data. All of the solutions described above provide control over data; however, they vary with regard to control over equipment and applications (highest with in-house datacenters, lowest with private clouds). In addition, government-provided hosting solutions, including in-house datacenters, shared datacenters, and private clouds, data will remain within the territory. In contrast, data stored in a public cloud or the public portion of a hybrid cloud may be stored in multiple locations abroad. Where a country prohibits the transfer and/or store certain data (e.g. health, tax, personal data etc.) abroad, this will make these options unviable. A hybrid cloud could still be viable provided that data resides within the country and additional services (e.g. anti-DDoS, load balancing, etc.) are contracted from the public cloud provider(s). At the same time, it is important to note that storing or transferring data abroad does not necessarily increase security and privacy risks, and in some cases keeping a back-up off-site (as Estonia does) could help mitigate the effects of severe data loss events. It is important to understand the backup and disaster recover processes and policies used by the cloud provider so that the contracting agency is fully confident of the reliability, security, and privacy of the cloud provider.

  • Application dependencies: Applications that depend on specific hardware—such as a particular chip set or an external device such as a fingerprint reader—might not be a good fit for cloud-based services, unless those dependencies are specifically addressed. Similarly, if an application depends on an operating system or set of libraries that cannot be used in the cloud, or cannot be virtualized, that application cannot be moved to the cloud. In particular, public cloud operators generally provide very little customization to accommodate specific tenants, so application development requires focusing on applications that can be run from a cloud environment.

  • Security and data protection: No matter the solution, the data host must have sufficient capacity—including staff, policies, operating procedures, and technology—to protect personal data from unauthorized access, misuse, loss, or theft. This includes physical and cybersecurity measures and disaster recovery mechanisms. Putting data on a public server accessed over the open internet—as occurs in both public and hybrid cloud models—is inherently riskier than hosting in datacenters or private government clouds that are not connected to external networks, although connections to the ID system could still be configured via secure VPN channels. At the same time, major private-sector cloud providers typically have advanced protections against internal and external threats, follow best-practices security protocols, and have multiple data centers to provide automatic backups. In contrast, many government cloud and datacenter providers may have smaller dedicated security teams. However, it is often believed by some that placing sensitive data and platforms in the cloud is by default more secure than local hosting. This is a misconception; cybersecurity arrangements for cloud hosting are simply different from those made locally and need to be implemented just as carefully and by design. While it may be true that placing data in the cloud and partially outsourcing cybersecurity arrangements ultimately lead to a more secure platform in some countries with low cybersecurity capacity, inadequately secured data centers and weak processes, it would inadvisable to assume so in every case. A reputable vendor’s arrangements, risk tolerance, and response procedures should be evaluated carefully and regularly and designed to align with good practices prior to data upload.

    For a deeper assessment of when and how to use cloud computing for IT systems in general, see the World Bank’s Cloud Readiness Toolkit Assessment (World Bank 2016a).

Figure 22. Key considerations for data storage

Reliability Data Protection Sustainability
Data must be secure and have adequate back-up and disaster recovery to prevent data loss. Data storage solutions must provide adequate data protection and security measures to prevent unauthorized access and protect against cyberattacks. Data storage choices will have a potentially large impact on start-up and/or operating costs; data storage solutions must be flexible enough to adapt to long-term needs.