How to Implement Apache Cassandra Driver for Python?

Apache Cassandra is a NoSQL database system that can handle huge amounts of data on multiple servers. Thus, it ensures your data is always available in case any server fails. This way, it is highly reliable and fault-tolerant.

It is widely used in industries such as finance, healthcare, and e-commerce. Big companies like Netflix, Facebook, and Uber use it to process millions of data requests daily. It is a great choice for real-time analytics, messaging systems, and IoT data storage.

Further, its architecture is what makes it unique. Traditional databases follow a primary-secondary model (one main server controls others). With a crash in the main server, the whole system can fail.

Whereas, Cassandra uses a peer-to-peer network and every node is equal in it. It distributes data evenly across all nodes and each one is independent. This removes any single point of failure and makes it faster.

Is Cassandra Right for Your Project?

It is great for applications that need fast data access, high availability, and seamless scaling, It works well for social media platforms, real-time monitoring systems, recommendation engines, or any other applications that generate huge amounts of data.

Using Cassandra with Python

Cassandra is great at handling big data and Python makes it easy to interact with it. So, you can use them together to store and process data quickly without worrying about performance issues.

Python has libraries like cassandra-driver with which you can connect to Cassandra and perform various operations. You can easily query and update data with just a few lines of code.

Mainly, they can build scalable applications that need to handle a lot of traffic. It is very useful if you need to work with distributed data across multiple systems.

Getting Started with Apache Cassandra

Now let us get into the practical part. How to set up Cassandra and get a powerful database ready to handle your application’s data efficiently? We will discuss the same:

Setting Up Your Environment

You can get started after ensuring that you have a server running Ubuntu 22.04. Further, you need to deploy it on a cloud platform like Cantech.

After the server is set up, you need to create a non-root user with sudo access. With this, you can ensure security and prevent accidental system modifications.

Is the user created? Yes? Now, update the server packages so that everything is up to date.

Also, you need the Apache Cassandra database server installed.

Now, you are ready to configure your database.

Creating a Sample Database in Cassandra

Cassandra uses keyspaces to organize data. A keyspace is like a database in traditional relational database systems.

Steps to Create One –

Log into Cassandra using the cqlsh shell. Then, run  –
CREATE KEYSPACE online_shop 
WITH REPLICATION = { 
    'class': 'SimpleStrategy', 
    'replication_factor' : 1 
};

Now, go to the new keyspace that you created.

USE online_shop;

Next, run the below command to create a table to store product information. For example –The below table will store product details like ID, name, and price. product_id column will uniquely identify each product.

CREATE TABLE products (
    product_id BIGINT PRIMARY KEY,
    product_name TEXT,
    retail_price DOUBLE
);

Below are the commands to insert sample data –

INSERT INTO products (product_id, product_name, retail_price) VALUES (A, 'T-shirt', 550);
INSERT INTO products (product_id, product_name, retail_price) VALUES (B, 'Leggings', 700);
INSERT INTO products (product_id, product_name, retail_price) VALUES (C, 'Crop tops', 600);

Check if the data was inserted successfully

SELECT * FROM products;

Lastly, exit the database server with EXIT;

Connecting Cassandra to a Python Application

Install the required dependencies cassandra-driver module to interact with Cassandra in a Python application. Basically, this driver helps the application connect to the Cassandra database to perform operations seamlessly.

First, run the below command to create a project directory because the source code files of Python must be in a separate directory; it keeps the Python application organized.

mkdir project && cd project

Install pip (Python package manager) (if not installed)

sudo apt install -y python3-pip

Now, install the Cassandra driver for Python

pip install cassandra-driver

Building a Python Module for Database Operations

You need a reusable Python module to handle queries for efficient database interaction.
Well, this module acts as a gateway to Cassandra. It includes functions for inserting data and retrieving products from the database.

So, create a file named cassandra_gateway.py with the below command in a text editor like Nano –

nano cassandra_gateway.py

Next, add the following code inside the file –

from cassandra.cluster import Cluster
from cassandra.query import dict_factory

class CassandraGateway:
    def db_session(self):
        cluster = Cluster()
        session = cluster.connect('online_shop')
        return session

    def execute(self, json_data):
        db_session = self.db_session()
        query_string = "INSERT INTO products (product_id, product_name, retail_price) VALUES (?, ?, ?);"
        stmt = db_session.prepare(query_string)

        product_id = int(json_data["product_id"])
        product_name = json_data["product_name"]
        retail_price = json_data["retail_price"]

        prepared_query = stmt.bind([product_id, product_name, retail_price])
        db_session.execute(prepared_query)

        return self.query(product_id)

    def query(self, product_id=0):
        db_session = self.db_session()
        db_session.row_factory = dict_factory

        if product_id == 0:
            query_string = "SELECT product_id, product_name, retail_price FROM products;"
            stmt = db_session.prepare(query_string)
            prepared_query = stmt.bind([])
        else:
            query_string = "SELECT product_id, product_name, retail_price FROM products WHERE product_id = ?;"
            stmt = db_session.prepare(query_string)
            prepared_query = stmt.bind([int(product_id)])

        rows = db_session.execute(prepared_query)
        return list(rows)

Let’s see what each part of the file declaration does –

from cassandra.cluster import Cluster – To import the Apache Cassandra database driver for Python.

dict_factory function – Return the data from the products table in a dictionary format. This dictionary has column names – useful when sending JSON responses. You will see it in the following code:

from cassandra.query import dict_factory

Next is CassandraGateway class that contains three important methods –

db_session(self) – This method connects to the Apache Cassandra keyspace you set up earlier. It runs two functions –o clstr = Cluster()
o session = clstr.connect(‘online_shop’)
execute(self, json_data) – This uses db_session() function to run a query and insert data into the products table.
```
db_session.prepare(query_string), 
stmt.bind([product_id, product_name, retail_price]) 
db_session.execute(prepared_query)
```
It uses parameterized queries to securely add all column information.

(insert into products (product_id, product_name, retail_price) values (?, ?, ?);)
query(self, product_id = 0)– This executes a SELECT query on the products table.
return list(rows) – returns as list
With 0, it will return all products or specific lists as per the HTTP request (if product_id == 0: … else …).
Whereas, with a specific product_id passed, it returns just that product’s details.

Finally, save and close the file.

Creating the Application’s Main File

Now, create a script with which users can insert and retrieve product data through a web server. It is the main script that will handle HTTP requests.

For that, create a new file –

nano index.py

Add the following code inside it –

import http.server
from http import HTTPStatus
import socketserver
import json
import cassandra_gateway

class HttpHandler(http.server.SimpleHTTPRequestHandler): 

    def do_POST(self):
        self.send_response(HTTPStatus.OK)
        self.send_header('Content-type', 'application/json')
        self.end_headers()

        content_length = int(self.headers['Content-Length'])
        post_data = self.rfile.read(content_length)
        json_data = json.loads(post_data)

        db_gateway = cassandra_gateway.CassandraGateway()
        db_resp = db_gateway.execute(json_data)

        resp = {"data": db_resp}
        self.wfile.write(bytes(json.dumps(resp, indent=2) + "\r\n", "utf8"))

    def do_GET(self):
        self.send_response(HTTPStatus.OK)
        self.send_header('Content-type', 'application/json')
        self.end_headers()

        product_id = 0
        if len(self.path.split("/")) >= 3:
            product_id = self.path.split("/")[2]

        db_gateway = cassandra_gateway.CassandraGateway()
        db_resp = db_gateway.query(product_id)

        resp = {"data": db_resp}
        self.wfile.write(bytes(json.dumps(resp, indent=2) + "\r\n", "utf8"))

httpd = socketserver.TCPServer(('', 8080), HttpHandler)
print("HTTP server started at port 8080...")

try:
    httpd.serve_forever()
except KeyboardInterrupt:
    httpd.server_close()
    print("You've stopped the HTTP server.")

This script runs an HTTP server on port 8080. The below listens to incoming connections –
...
  httpd = socketserver.TCPServer(('', 8080), HttpHandler)
  print("HTTP server started at port 8080...")

  try:

    httpd.serve_forever()     
  ...
=> It listens for GET and POST requests in HttpHandler(http.server.SimpleHTTPRequestHandler) class. The GET request fetches product data and POST inserts new product entries into Cassandra.
=> For importing HTTP functionalities:
  import http.server
  from http import HTTPStatus
  import socketserver
  …
=> Enabling JSON formatting and Cassandra database functions.
  ...
  import json
  import cassandra_gateway
  ...

Testing the Application

Now, test if everything works.

Start the application with the below command. (After that, the output should look like –HTTP server started at port 8080…)

python3 index.py

With $ ssh root@SERVER-IP set up another SSH connection in a new terminal session.
List all products in the database In a new terminal window –

curl -X GET http://localhost:8080/products

Fetch a specific product using its ID –

curl -X GET http://localhost:8080/products/2

Run below to insert a new product –

curl -X POST http://localhost:8080/ -H 'Content-Type: application/json' -d '{"product_id": 4, "product_name": "TRENCH COAT", "retail_price": 456.28}'

If it is working, you will see the inserted product in the database when you run the GET command again.

Conclusion – Why Use Apache Cassandra?

Scalability is one of the biggest benefits of Cassandra. You can start with just a small server and add more as your data grows. To reduce capacity, you can simply remove nodes without affecting performance.

Moreover, it handles read and write operations at high speed so it is great for real-time applications. All in all, businesses that require continuous uptime prefer Cassandra because it ensures data availability at all times