Sunday, 18 December 2016

9 Best Components of Apache Cassandra [Architecture]

Apache Cassandra's Architecture
Cassandra database is NoSQL database that manage and handle the big data workload. Cassandra store data in multiple nodes with no single point of failure. Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes in a cluster. All the nodes exchange information with each other using Gossip protocol. Gossip is a protocol in Cassandra by which nodes can communicate with each other.
Key structures of Cassandra
There are following components in the Cassandra.

  1. Node :- It is the region or place where data is stored. It is the initial component of Cassandra.
  2. Data Center:- Data center is the collection of nodes. Data can be written to multiple datacenters depending on the replication factor. However, datacenters should never span physical locations..
  3. Cluster:- The cluster is the collection of many data centers. A cluster contains one or more datacenters.
  4. Commit Log:- All write operation is written to Commit Log, after all its data has been flushed to SSTables that can be archived, deleted, or recycled.
  5. SSTable:- SSTable (sorted string table) is data file and stored on disk sequentially and maintained for each Cassandra table.
  6. Cassandra Keyspace:- A keyspace is a container for data. When you are defining a keyspace, you need to specify a replication strategy and a replication factor i.e. the number of nodes that the data must be replicate too.
  7. Column Family:- A row in the map provides access to a set of columns which is represented by a sorted map
  8. Row Key:- A row key is also known as the partition key and has a number of columns associated with it.

2 Best Ways of Cassandra Data Replication [Replication]
The replication strategy of a keyspace decides which nodes are copies for a given token range. The two main replication strategies are:
  • SimpleStrategy
  • NetworkTopologyStrategy

SimpleStrategy:- SimpleStrategy is used when you have just one data center. The first copy for the data is decided by the partitioner. After that, the placement of the subsequent replicas is determined by the replication strategy in clockwise direction in the Node ring.
For example, if replication_factor is 4, then four different nodes should store a copy of each row.

NetworkTopologyStrategy:- The network topology strategy works well when Cassandra is set the replication factor for each data-center independently. The network topology strategy is data centre and replicas are stored on the different rack. Cassandra uses snitches to discover the overall network overall topology.  This information is used to efficiently route inter-node requests within the bounds of the replica placement strategy.

An operation’s consistency level specifies how many of the replicas need to respond to the coordinator in order to consider the operation a success.

The following consistency levels are available:
  1. ONE :- Only a single replica must respond.
  2. TWO:-Two replicas must respond.
  3. THREE:- Three replicas must respond.
  4. QUORUM:- A majority (n/2 + 1) of the replicas must respond.
  5. ALL:- All of the replicas must respond.
  6. LOCAL_QUORUM:- A majority of the replicas in the local datacenter (whichever datacenter the coordinator is in) must respond.
  7. EACH_QUORUM:- A majority of the replicas in each datacenter must respond.
  8. LOCAL_ONE:- Only a single replica must respond. In a multi-datacenter cluster, this also gaurantees that read requests are not sent to replicas in a remote datacenter.
  9. ANY:- A single replica may respond, or the coordinator may store a hint. If a hint is stored, the coordinator will later attempt to replay the hint and deliver the mutation to the replicas. This consistency level is only accepted for write operations.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.