Sunday, 18 December 2016

A Complete Introduction to Apache Cassandra database

What is Apache Cassandra?
Cassandra is highly scalability and high availability without compromising performance NoSQL database. Cassandra's support for replicating across multiple datacenters and manage a large set of data.

As per Apache Definition
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

Cassandra History
  • Cassandra was first developed with combination of Google BigTable and amazon DunamoDB at Facebook for originally for inbox search.
  • It released open source in July 2008 by Facebook.
  • Apache incubator accepted Cassandra in March 2009.
  • Cassandra is a top level project of Apache since February 2010.
  • The latest version of Apache Cassandra is 3.9 and released on September 2016.

Properties of Cassandra Database
There are following properties of Cassandra Database.
  • Fault Tolerant: - Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
  • Performer: - Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.
  • Decentralized: - There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.
  • Scalable: - Some of the largest production deployments include Apple's, with over 75,000 nodes storing over 10 PB of data, Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million requests per day), and eBay (over 100 nodes, 250 TB).
  • Durable: - Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.

Cassandra Database Vs Relational databases
  • In Relational databases, the data flow in low velocity while the data flow with high velocity in the Cassandra Database.
  • Data comes from one or few locations in Relational databases while the data can come from multiple locations in the Cassandra Database.
  • Relational databases manage the structure data while Cassandra Database manages structure data, unstructured data and multimedia data.
  • There is single point of failure in Relational databases while there is no single point of failure.
  • Relational databases supports centralized deployments while Cassandra Database supports decentralized deployments
  • Transactions written at one location Relational databases while Transaction written in many locations in the Cassandra Database.

Why  use Cassandra Database
There are following features that Cassandra provides.
  • Gigabyte to petabytes  scalability
  • Linear Scale Performance
  • No Single point of failure
  • Fault Detection and Recovery
  • Flexible and Dynamic Data Model
  • Data Protection
  • Tunable Data Consistency
  • Multi Data Center Replication
  • Data Compression
  • Cassandra Query language

No comments:

Post a Comment

Note: only a member of this blog may post a comment.