Sunday, November 29, 2020

Google Cloud Spanner - Globally distributed relational database service - Part-1

 An Introduction to Google Cloud Spanner Part 1 

Image for post 


Google cloud spanner is a globally distributed relational database service for massively scalable applications. It employs automatic sharding (or splits) and replication to scale up to millions of nodes and trillions of database rows and promises 99.999% availability. 

Cloud Spanner combines the scalability of a NoSQL database with qualities of relational databases, including schemas, ACID 


Main building blocks of Spanner 


  • 1. Google’s network infrastructure – Provides global connectivity  

  • 2. TrueTime – Provides high accuracy and availability  

  • 3. Optimized Software Stack – Automatic resharding and Paxos implementation  


Features:


High availability 


Cloud spanner provides multi-regional deployments with 99.999% availability SLA and single regional deployments with 99.99% availability SLA. Administrative functions such as replication and sharding are managed without requiring database outages.  


 

Instances 


Instances can be either regional or multi-regional. Multi-regional makes use of TrueTime (a globally distributed clock with high accuracy and availability) to provide global consistency and higher availability. 

 

Security 


Cloud spanner provides Enterprise-grade security. It provides security through IAM integration with permissions and access configurable for groups and users at the instance and database level. 


Replication 


It is used for geographic locality and global availability. Transactions are replicated using a Paxos distributed consensus protocol to ensure transactions are available in multiple replicas before being committed.  


ACID Transaction and SQL Query Support 


Google Spanner supports ACID transactions and SQL Specs. 

 

Architecture of Spanner 


Spanner provides minimum 3 shards per region and each shard will be in each zone. A shard is known as splits. When we create a cluster with 1 node, 2 more nodes are created in other zones. Paxos protocol allocates a main node at a time and others nodes are minions. Paxos helps to maintain the minimum number of nodes and select a new leader during any breakdown.  


 


High accuracy and availability with TrueTime 


How does spanner provide strong consistency across all nodes?  The Answer is TrueTime!! 

Its not a good idea to explain Spanner without diving into TrueTime. 

TrueTime is highly available, distributed clock that is provided to applications on all Google servers. Ienables applications to generate monotonically increasing timestamps. The main hardware is built with Atomic clocks to maintain the time.  

During each write operation spanner takes the TrueTime values and provided to the leader nodes. Then, the date is replicated to other nodes. Therefore, the timestamp is same on all nodes.  


Database replication  


One of the major issues the spanner has to address is database replication on a global scale i.e., data should be consistent when multiple users perform transactions across the globe. The spanner team at google eliminates by using Paxos protocol – A distributed consensus algorithm (An algorithm that ensures that a change you make on a file is propagated to all of its replicas).  


When to choose Spanner? 


Let’s take a close look on below flowchart to understand the importance of cloud Spanner 

spanner-when-to-choose 


Cloud Spanner in a Nutshell  


Google cloud Spanner is a cloud native, enterprise grade and fully managed relational database that offers high availability, horizontal scalability with consistent global ACID transactions. It delivers high-performance and strong consistency across rows, regions, and continents with industry leading 99.999% availability SLA. 

 

Google Cloud Spanner - Globally distributed relational database service - Part-1

  An Introduction to  Google Cloud Spanner  Part 1     Google cloud spanner is a globally distributed relational database service for massiv...