Skip to content

benginsternas/VertexFS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VertexFS: Distributed, Replicated, and Fault-Tolerant File System

A high-performance, cloud-native distributed file system engineered with C++20, gRPC, and Docker, orchestrated by Kubernetes.

Designed to demonstrate advanced concepts in distributed systems, including data replication, strong consistency, high availability, and self-healing architectures.


Motivation

Modern distributed systems require scalable and fault-tolerant storage layers. VertexFS was built as a learning and demonstration project to explore how real-world distributed file systems handle:

  • Data replication and durability
  • Failure detection and recovery
  • Consistency guarantees
  • Cloud-native deployment patterns

Architecture Overview

VertexFS follows a Master-Worker architecture, decoupling metadata management from raw data storage.

  • Metadata Master (C++): Manages namespace, file-to-block mapping, and cluster health
  • Storage Nodes (C++): Store data blocks and handle replication
  • Client: CLI interface for interacting with the system
  • Communication Layer (gRPC): Efficient, low-latency RPC communication

Architecture Diagram

graph TD
    Client -->|Upload/Download| Master
    Master -->|Metadata Ops| Client

    Master -->|Replication Commands| Worker1
    Master -->|Replication Commands| Worker2
    Master -->|Replication Commands| Worker3

    Worker1 --> Worker2
    Worker2 --> Worker3
    Worker3 --> Worker1

    Worker1 -->|Heartbeat| Master
    Worker2 -->|Heartbeat| Master
    Worker3 -->|Heartbeat| Master
Loading

Core Features

  • Strong Consistency: Writes require a quorum (N/2 + 1) of replicas before commit
  • Fault Tolerance: Automatic failure detection via gRPC heartbeats
  • Self-Healing: Kubernetes restarts failed pods; Master re-replicates missing blocks
  • Replication: N-way replication across storage nodes
  • Container-Native: Optimized multi-stage Docker builds

Consistency Model

  • Quorum-based write strategy
  • Reads served from up-to-date replicas
  • Ensures no stale data is returned after committed writes

Failure Model

VertexFS is designed to tolerate:

  • Node crashes (fail-stop failures)
  • Pod restarts in Kubernetes
  • Partial network failures (best-effort handling)

Tech Stack

  • Language: C++20 (Abseil, Google Test)
  • RPC Framework: gRPC / Protocol Buffers
  • Containerization: Docker
  • Orchestration: Kubernetes (StatefulSets)
  • Build System: CMake

Client

The VertexFS client provides a simple CLI interface:

$ client --upload myfile.txt
$ client --download myfile.txt

Features:

  • File upload/download
  • Metadata interaction

Quick Start (Local Development)

1. Run with Docker Compose

git clone https://github.com/benginsternas/VertexFS.git
cd VertexFS
docker-compose up --build

Starts:

  • 1 Metadata Master
  • 3 Storage Nodes

2. Manual Build

mkdir build && cd build
cmake ..
make -j$(nproc)
./storage_node --port=50051

Kubernetes Deployment

Deploy Master

kubectl apply -f k8s/master-deploy.yaml

Deploy Workers

kubectl apply -f k8s/worker-statefulset.yaml

Reliability Tests

# Upload a file
client --upload myfile.txt

# Simulate failure
kubectl delete pod dfs-worker-0

# Verify availability
client --download myfile.txt

# Observe recovery
kubectl logs <master-pod>

Observability

  • Logs accessible via kubectl logs
  • Health monitoring via heartbeat system
  • Future: Prometheus & Grafana integration

Limitations & Future Work

  • Single Metadata Master (single point of failure)
  • No consensus protocol (e.g., Raft) implemented yet
  • No erasure coding (replication only)

Planned improvements:

  • Multi-master support
  • Dynamic load balancing
  • Improved failure handling
  • Metrics and monitoring stack

License

Distributed under the MIT License.


Engineered by Bengin Sternas

About

VertexFS: A resilient distributed file system. C++ | gRPC | Docker | K8s. Built-in replication for mission-critical data integrity.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors