Cassandra : How is data deleted and what are Tombstones?

Data in Cassandra is replcated on more than one node. In another terms - ‘Same data is copied on more than one node’. Any change in that data should be made in all the copies of that data stored on various nodes of a cluster. Same is applied for deletes as well. If a data is deleted then it should be deleted from all the nodes. But what if any node which has a copy of that data is down. When that node will come up and will be asked for that data, it will provide that data and same data will be copied on other nodes once again. Such data is called Zombie record.

How does Cassandra make sure that if a data is deleted than it should be deleted from all the nodes even if any node where that data is stored, is down. Cassandra manages that by creating tombstones.

How does these tombstones help Cassandra? Now again think about the scenario in which a data is stored on three nodes. With Local_Quorum consistency and 3 replication factor and just 2 nodes are up and running, if DELETE requests came, data got deleted from 2 live nodes, while remained saved on one node which was down when DELETE request came. Now this third node comes up and a READ request comes. This third node returns the data but other two nodes say that we had data but now we have tombstones at the place of that data. Now, cassandra knows that if two nodes have tombstones that means that data was deleted and then it repairs data on that remaining third node and deletes from that node as well.

Now, we understand why Cassandra creates tombstones to flagged the data as deleted so that same message can be propegated to nodes which were down when DELETE request came. But another question here - Is DELETE is the only operation, in which Cassandra creates tombstone? Answer is ‘NO’.

Tombstones are created in following cases -

Using a CQL DELETE statement
Expiring data with time-to-live (TTL)
Using internal operations, such as Using materialized views
INSERT or UPDATE operations with a null value
UPDATE operations with a collection column

When a tombstone is created, it can be marked on different parts of a partition. Based on the location of the marker, tombstones can be categorized into one of the following groups. Each category typically corresponds to one unique type of data deletion operation.

Partition tombstones
Row tombstones
Range tombstones
ComplexColumn tombstones
Cell tombstones
TTL tombstones

All of these types are explained in details on this page.

The tombstones go through the write path, and are written to SSTables on one or more nodes. A key differentiator of a tombstone is a built-in expiration known as the grace period, set by gc_grace_seconds. At the end of its expiration period, the tombstone is deleted as part of the normal compaction process.

Having an excessive number of tombstones in a table can negatively impact application performance. Many tombstones often indicate potential issues with either the data model or in the application.

Partition tombstones

Partition tombstones are generated when an entire partition is deleted explicitly. In the CQL DELETE statement, the WHERE clause is an equality condition against the partition key.

DELETE from cycling.rank_by_year_and_name WHERE 
 race_year = 2014 AND race_name = '4th Tour of Beijing';

Row tombstones

Row tombstones are generated when a particular row within a partition is deleted explicitly. The schema has a composite primary key that includes both the partition key and the clustering key. In the CQL DELETE statement, the WHERE clause is an equality condition against both the partition key and the clustering key columns.

DELETE from cycling.rank_by_year_and_name WHERE 
 race_year = 2015 AND race_name = 'Giro d''Italia - Stage 11 - Forli > Imola' AND rank = 2;

Range tombstones

Range tombstones occur when several rows within a partition that can be expressed through a range search are deleted explicitly. The schema has a composite primary key that includes both a partition key and a clustering key. In the CQL DELETE statement, the WHERE clause is an equality condition against the partition key, plus an inequality condition against the clustering key. Tip: If following this tutorial from the beginning, drop the rank_by_year_and_name table and then recreate it to populate the table with the necessary data.

DELETE from cycling.rank_by_year_and_name WHERE 
 race_year = 2015 AND race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu' AND rank > 1;

ComplexColumn tombstones

ComplexColumn tombstones are generated when inserting or updating a collection type column, such as set, list, and map.

Previously we created the cyclist_career_teams table. Run the following cqlsh command to insert data into that table.

INSERT INTO cycling.cyclist_career_teams (
     id,
     lastname,
     teams)
     VALUES (cb07baad-eac8-4f65-b28a-bddc06a0de23, 'ARMITSTEAD', { 
     'Boels-Dolmans Cycling Team','AA Drink - Leontien.nl','Team Garmin - Cervelo' } );

Cell tombstones

Cell tombstones are generated when explicitly deleting a value from a cell, such as a column for a specific row of a partition, or when inserting or updating a cell with null values, as shown in the following example.

INSERT INTO cycling.rank_by_year_and_name (
     race_year,
     race_name,
     cyclist_name,
     rank)
     VALUES (2018, 'Giro d''Italia - Stage 11 - Osimo > Imola', null, 1);

TTL tombstones

TTL tombstones are generated when the TTL (time-to-live) period expires. The TTL expiration marker can occur at either the row or cell level. However, DSE marks TTL data differently from tombstone data that was explicitly deleted. Even if a partition has only a single row (with no clustering key), the TTL mark is still made at the row level.

The following statement sets TTL for an entire row.

INSERT INTO cycling.cyclist_career_teams (
     id,
     lastname,
     teams)
     VALUES (
       e7cd5752-bc0d-4157-a80f-7523add8dbcd, 
       'VAN DER BREGGEN', 
       {
         'Rabobank-Liv Woman Cycling Team',
         'Sengers Ladies Cycling Team',
         'Team Flexpoint' 
        }
      ) 
     USING TTL 1;

The following statement sets TTL for a single cell.

UPDATE cycling.rank_by_year_and_name USING TTL 1
  SET cyclist_name = 'Cloudy Archipelago' WHERE race_year = 2018 AND 
  race_name = 'Giro d''Italia - Stage 11 - Osimo > Imola' AND rank = 1;