eIAM Geo-Redundancy

Geo-Redundancy of eIAM

The control over which data center is addressed for eIAM-Core is managed via DNS entries. Normally, the hostnames (FQDN) of eIAM-Core point to the load balancers in Primus; after a switch, they point to Campus. As a result, a switch leads to an interruption due to the DNS query validity period (DNS TTL). Currently, an interruption of about 30 minutes is to be expected during a switch.

Under eIAM-Core, services 1–10 and 12 are grouped according to eIAM Services. The service eIAM RP-PEP (Service 11) is treated separately because the RP-PEP must, from a network perspective, lie in the communication path between the browser and the application backend.

Geo-Redundancy of Applications

Depending on the eIAM failure scenario, different effects arise for applications. For geo-redundant applications, eIAM assumes that these have been set up according to the specifications in Chapter 4.3 and 5 of the IT guideline on implementing availability classes and geo-redundancy (German)
(Perma-Link: )

Integration Pattern BTB-Direct

Applications connected via STS-BTB access eIAM through defined hostnames, which point to components in Primus (by default) or Campus (if Primus has completely failed as a data center), depending on operational status. Thus, a switch of eIAM-Core is fundamentally independent and transparent from a possible switch of a geo-redundant application. Whether an application should or must also switch depends not on eIAM, but on the incident itself (e.g. data center failure).

: STS-BTB integrated geo-redundant application, active in campus

Scenario	Application in Primus	Application in Campus	Geo-redundant Application (Primus + Campus)	Application outside Primus / Campus (Other DC Public Cloud)	Application outside Primus / Campus geo-redundant (in 2 Cloud Regions)
Failure of DC Primus (eIAM-Core switches to Campus)	Application fails	Logins still possible	Logins still possible. Application: If the application was active on Primus, it must also switch as part of application operations.	Logins still possible	Logins still possible
Application failure	Application fails	Application fails	Logins still possible. Application: Switches to the other DC	Application fails	Logins still possible. Application switches to another region (or is already active there)

Integration Pattern STS-PEP

Scenario	Application in Primus	Application in Campus	Geo-redundant Application (Primus + Campus)	Application outside Primus / Campus (Other DC Public Cloud)	Application outside Primus / Campus geo-redundant (in 2 Cloud Regions)
Failure of DC Primus (eIAM-Core switches to Campus)	Application fails	Logins no longer possible (*)	Logins no longer possible (*)	Logins no longer possible (*)	Logins no longer possible (*)
Application failure	Application fails	Application fails	Logins still possible. Application: Switches to the other DC	Application fails	Logins still possible. Application switches to another region (or is already active there)

* The ongoing migration from STS-PEP to STS-BTB resolves this situation.

Integration Pattern RP-PEP

Also note: Migration RP-PEP to STS-PEP

In contrast to STS-BTB integration, the RP-PEP integration creates a dependency between eIAM and the application, as the RP-PEP is an eIAM component placed in the communication path between the browser and the application. Therefore, the RP-PEP must always run in the same data center as the application itself. For geo-redundant applications with RP-PEP, eIAM always considers the RP-PEPs of both data centers to be active so that an application can switch independently of eIAM. RP-PEPs are not offered outside of Primus / Campus.

: RP-PEP integrated geo-redundant application, active on campus

Scenario	Application in Primus	Application in Campus	Geo-redundant Application (Primus + Campus)
Failure of DC Primus (eIAM-Core switches to Campus)	Application fails	Logins still possible	Logins still possible. Application: If the application was active on Primus, it must also switch as part of application operations.
RP-PEP or Load Balancer failure	Application fails	Application fails	Logins still possible. Application: Switches to the other DC as part of application operations
Application failure	Application fails	Application fails	Logins still possible

Fundamentally, application operations are responsible for deciding whether to switch based on situational assessment and coordination with eIAM operations, or—during a major incident—with incident management, or whether to wait for the issue to be resolved in the data center.

Switch by application operations for geo-redundant applications with RP-PEP
During integration, front load balancers are ordered by the eIAM SIE and their FQDN / CNAMEs are handed over to the application operations. Example scenario:

FQDN: app.amt.admin.ch
CNAME Primus: app.primus.amt.admin.ch
CNAME Campus: app.campus.amt.admin.ch

If application operations now wish to switch, the following approach can be taken (example: switch from Primus to Campus):

Open a Critical Incident in Remedy requesting mutation to now point the FQDN to the CNAME of Campus.