Cloud and DevOps Improvements for Land Registry at Landgate
By James Bromberger
18th October 2019
In 2014 Akkodis started to create a custom replacement land registry automation solution for Landgate, the state government jurisdiction of land administration in Western Australia.
The existing system was a manual paper-based workflow for transaction processing, with a digital registry for holding the results of transactions, implemented using an Oracle database and a desktop application (written in Delphi).
Background
The New Land Registry (NLR) project sought to achieve several outcomes:
- Migrate from a paper-based workflow to a digital (on screen) workflow
- Automation of the digital workflow, with exception processing by humans
- Migration from physical infrastructure on premise in a data centre to deployment in cloud
- Automation of the deployment
- Automation of the testing of the deployment
- Increase in the security of the digital system
- Minimal time to value
- No interruption to service
This service went live with a minimal implementation in July of 2015, and incremental changes on a rapid time scale saw authoritative sources of data progressively migrated from the on-premise environment to the cloud environment over the following 18 months. By December 2016, the entire Land Registry, Power of Attorney Registry and other data sets were authoritative in the cloud.
You may read the previous AWS-authored case study of this online at: https://aws.amazon.com/solutions/case-studies/landgate/
AKkodis has continued to maintain and modernise this platform, expanding its capability, continually adjusting the security profiles, and ensuring that new solutions adopt always-improving best practice, such as in the use of development and security patterns, as well as leveraging new and evolving AWS Cloud capabilities.
The purpose of this document is to demonstrate the level of maintenance and modernisation activity that has gone into the service, outside of the major application feature roadmap.
Service Improvements 2015 – 2019
The initial configuration and preparation for go-live consisted of preparing the CI/CD DevOps pipeline for delivery of software to the production environment. This required scripting and templating the entire deployment, using the best practice of the time. That pipeline has been maintained with newer versions of tooling, newer approaches, with hundreds of production deployments since initial production release.
Disposable Virtual Machines
One key objective of the project was to treat all virtual machines (EC2) as disposable, such that they could be removed at any time, and no persistent data would be lost.
This approach of “cattle, not pets” to the infrastructure was bisected into long-term persistent data services and short-term application servers and load balancers. The principles that enabled this were:
- All relational databases live on a persistent AWS RDS Database instance
- All file (object) storage lives on dedicated object storage services
- All logs are egressed from virtual machines to dedicated log retention (and inspection) services
- All application server deployments are 100% automated.
This presented some up-front work to create a robust deployment architecture, but soon paid back the investment in time. One of the early lessons was to adjust the number of virtual machines in use at the end of each day in non-production environments to zero, and conversely deploy new virtual machines each morning. This forced the issue on banning direct EC2 changes; they would not be persistent. The only route to making a change was via updating the automation and triaging a change through release management across all environments, all of which is a good thing. This also means there are no long-term persistent virtual machines; they are constantly replaced from known, secured master images.
Prior to production, the only environments that had infrastructure deployed were the development, testing and UAT environments. Even the production network was not created until around 10 days before the target go-live date. There was no sunk capital, or elapsed time for physical infrastructure purchase ahead of service commissioning.
This method of disposable and templated infrastructure, and a CI/CD DevOps pipeline of repeatable actions between environments meant that introducing changes became very low risk, and has led to repeated incremental service improvements in a timely manner, with minimal interruption to service.
During the life of the service, the actual version of operating systems has been re-baselined multiple times, introducing newer version of the host operating system, new kernels, TLS libraries, etc.
Each deployment to the service, done approximately every two weeks, has terminated (destroyed) the previous instances in service and replaced them with a new service instance. This has the immediate effect of removing any potential local changes, and removes long-running instances and the associated hygiene done over a longer term (years).
In 2019, these virtual machines migrated instance families, from the older AWS t2, to the newer t3 instance family. This represented a shift down in cost, and performance increase to the fleet.
Database Maintenance
As with many applications, there is a relational database as a key component, and the initial database version used was Postgres 9.4. The DevOps team routinely adopted new minor versions proactively within the Postgres 9.4 branch, updating in a controlled manner from lower to higher environments, ensuring testing and validation of version changes.
It’s worth noting that the configuration of the database mandates the use of encryption both in-flight from application servers, as well as at-rest (on disk). Key and certificate management are automated and at no time does the DevOps team have access to the private keys involved.
In May of 2016, Postgres 9.5 became available, and the team worked to perform this major version upgrade with a similar amount of care, and for a period after that continued to track the current Postgres 9.5 minor release. In November 2016, Postgres 9.6 was available, and after suitable testing, the service was incremented to this version. Once more in December 2018, the team completed another Major version upgrade of its core database, to Postgres 10.
Of course, there are many minor updates to the database version, and these are applied after testing to ensure latest exploits mitigations and fixes are in place as early as possible.
This close adherence to the current versions of database, buried deep at the lowest layer of the application, means that known errors and bugs are addressed as quickly as possible. AKkodis chose to deem it unacceptable to be running software with known issues for which there are known fixes, but which the team has not yet applied. Security is always best served reasonably fresh.
Across this period Akkodis saw the instance size (CPU and memory) modify, from m3, to r3, and then on to r4. In 2019, this RDS instance again moved from the older r4 instance type, to a newer r5, showing a performance improvement of several-fold for I/O.
Java Updates
A similar story exists for the Java runtime environment, initially Java 8 update 45 in 2015. The deployment installs the latest version of Java for each new server instantiated, which is typically every few days to weeks. It’s reassuring to view the Java Release history and see the set of updates with “Security fixes” tagged, knowing that this has been immediately addressed.
However, Java itself has not stood still. Java 9, and then Java 10 came and went, and a new Long-Term-Support version has debuted in Java 11. This new version does represent again a major version upgrade, and additional validation of a change. In December of 2018, the Land Registry project moved to Java 11.
AWS Parameter Store, Secrets Manager
Previously secrets and configuration were distributed in custom (but secure) ways to meet the workload requirement, but with the launch of AWS Systems Manager Parameter Store and AWS Secrets Manager to hold per-environment and per instance-role secrets and configuration, the project moved to adopt this simplification and move away from the custom-rolled solutions.
Encryption in Flight
There’s five major areas when talking about encrypting data in flight:
- TLS Protocol version
- Key Exchange method
- Bulk Cipher and its options
- Messages Authentication (MAC/checksum)
- Certificate type & chain of trust. This maintenance of protocols, ciphers (see next section) and keys is a continuous maintenance task over time (not “set and forget”).
TLS Protocols
Design of the initial deployment of the NLR platform in 2015 had already banned the use of “Early TLS” protocol (v1.0 and prior) for Land Registry application users. By comparison, the PCI DSS 3.2 standard that enforced (credit) cardholder environments to adopt a similar stance only came into force much later in July 2018; 3 years later.
By 2016, TLS 1.1 was also banned for all internal users, leaving only TLS 1.2 enabled, save for one external service provider who had not been able to step up to enable TLS 1.2 in their service. This external provider has resolved their ability to use TLS 1.2, and all services are now operated at TLS 1.2 as a minimum.
With the above Java 11 update, the service is now looking to start a TLS 1.3 rollout both internally and with external integration partners, working closely with the AWS service teams and external service providers.
Key Exchange
The only key exchanges ever permitted on this environment have been ephemeral key exchanges, ensuring Forward Secrecy of all communications. This involved ensuring only Diffie-Hellman Ephemeral (DHE) or Elliptical Curve Diffie-Hellman Ephemeral (ECDHE) exchanges were configured; over time Akkodis expects to drop DHE for just ECDHE key exchange.
Symmetric Bulk Ciphers
When it came to Bulk Ciphers, it was decided to only support AES in either 128- or 256-bit block sizes, and with either CBC or GCM mode (Chain Block Cipher, or Galois/Counter Mode). In 2018 Microsoft announced that CBC was no longer secure, and the service has indicated to its integration partners that the intent is to deprecate CBC mode in the near future.
The NLR project has long prioiritised the use of GCM, and will terminate support for CBC at over time.
Message Authentication Code
Originally the service supported MAC using SHA-1, SHA2-256, and SHA2-384, but the service has now deprecated SHA-1 as it does not meet current acceptable minimal requirements for reliability and collision space.
X509 TLS Certificate Handling
Improvements to the Amazon Certificate Manager service meant the service has also swapped x509 certificates from being email-validated to DNS validated. This means a certificate can have its corresponding authorisation key/value pair left in DNS perpetually, and the certificate would be automatically authorised – and deployed – before expiry without human intervention, a clear win over the manual process exercised by many organisations, and removing the danger of missing doing this before expiry.
Virtual Private Cloud Improvements
Migrate from Two to Three Availability Zones
The initial deployment of the Land Registry was designed to take advantage of the two Availability Zones (AZs) in the AWS Sydney Region. Each Availability Zone is one (or more) data centres, geographically distant from other AZs.
In February 2016, AWS added a third AZ was added to the Sydney Region, and the Land Registry migrated to using this without redeploying. A live migration was done on load balancers and autoscale groups to redistribute IPv4 addressing across a wider split (without redeploying the Virtual Private Cloud environment).This new distribution of resources across three AZs gave even more resilience to the service: now any complete failure of one AZ would mean the instantaneous demand for instances that across across all customers at that point in time would mean that requests could be satisfied from more AZs, and during that initial AZ failure, the service would continue to have active-active fault tolerance.
Private Access to S3, DynamoDB
Access to the S3 Object Store and DynamoDB services was optimised to include private interfaces using the Gateway Endpoints for the Virtual Private Cloud when this became available in Sydney, and as a consequence, S3 Buckets may now be locked down to specific VPCs.
Guard Duty
The introduction of Guard Duty at the end of 2017 brought greater visibility to the environment, and also led to the team finally tackling another security design consideration: blocking outbound traffic. While the NLR platform had initially concentrated on very strict inbound controls for Security Groups (firewalling), it was the introduction of Guard Duty and its visibility that led the team to turn attention to stripping down the outbound rules. All UDP traffic outbound was removed; DNS and NTP (time synchronisation) was shifted to make use of the Link-Local network services. For TCP traffic, only known ports to known destination ranges were permitted.
Dual Stack IPv4 and IPv6
When deployed, the Land Registry VPC operated using a private IPv4 address space. In 2018 this was extended to include IPv6, and the Land Registry now stands ready to talk to its integration partners, and the public, over both IPv4 and IPv6.
The adoption of IPv6 is a clear signal of the modern and forward-looking perspective of the Land Registry. While widespread adoption has been extremely slow (having started in the early 2000s), it represents yet another technology transition that the IT industry is undertaking.
Akkodis has now moved to start enabling dual stack across many other AWS cloud-based services that are internet facing.
Federation of Identity
When going live in 2015, the SAML Federation service was using a SHA-128 based signing algorithm, and during 2016 this was migrated to the stronger SHA-256. The configuration of federation certificates was also modified to monitor and auto-rotate when the authoritative SAML server rotates its keys (on a regular basis).
The Akkodis team also improved the application consumption of SAML as a Relying Party, implementing auto-refresh of the Identity Provider’s metadata document on a daily basis. This permits the automatic roll over of the certificates that SAML uses for signing assertions of identity.
Independent Transaction Digital Attestation: Vanguard
In 2017 the Land Registry project started to pass a representation of land titles transaction data to the Federal Department of Industry and Innovation’s Vanguard Timestamping and Witnessing service to attest independently that the land transaction had taken place at a point in time.
This service takes the passed text, adds a timestamp to it, and then cryptographically signs that data before returning it to the service. This additional data file, consisting of a few hundred bytes, cannot be modified without invalidating the digital cryptographic signature of that data structure, thereby freezing the data in the file.
Object Storage Improvements: S3
Every document lodged with Landgate for titles transactions ends up being scanned (very early after Lodgement) and stored in the Land Registry. These items now make up 14.8 million items, and over 680 GB of storage (which costs around US$17.07/month to durably store).
Numerous improvements to the AWS Simple Storage Service (S3) service have been adopted over time. Default encryption, object versioning, and more recently, the ability to Block Public Access as a major AWS Account-level control have all been enabled. None of these objects had ever been stored unencrypted, nor set as public, but these additional controls help ensure this continues to be the case, making auditing and inspection easier.
Conclusion
This service has now been in production for over four years, and it is as modern today as the day it was launched.
By leveraging a DevOps approach and mature revision control branching strategies, this service has high agility and frequent releases, making multiple small adjustments incrementally with minimal manual release overhead.
As a side effect of the excellence in delivery, this project has been a trigger for many other government jurisdictions and industry bodies across the Australian land jurisdictions to follow Landgate into the AWS Cloud, as well as encouraging them to step up to stronger and stricter security capabilities to protect some of our highest value assets.
This has been achived by:
- DevOps and SaaS (Service Operator) approach to change
- Public Cloud
- Full automation of deployments
- Reliablity and security architected into the product from the start
- Service team mentality, responsible for the full-stack deployment and operations from development to production
- Pragmatism.
This service has unified multiple teams, from Developers, Testers, Business Analysts, Programme Managers, and of course, our Customer.
As a result of this service, turn-around-times on document processing has dropped from an average of 30 days to just 1.4 days, and digital processing from industry is done in as little as 10.8 seconds.