Bastion is a lightweight, high-performance data ingestion gateway built in Rust. Validate, transform, and route your data before it touches your infrastructure. Send events to S3, Kafka, BigQuery, or any destination. Deploy anywhere — from cloud to Raspberry Pi.
Getting data into your data platform shouldn't require a PhD in distributed systems.
Real-time schema validation before data enters your pipeline. No more 3 AM surprises from malformed payloads or silent schema changes.
A single binary under 20 MB. No JVM, no runtime dependencies, no GC pauses. Runs on a Raspberry Pi or a cloud VM. Built in Rust for when every millisecond matters.
Publish to multiple destinations in a single pass — Kafka clusters, S3, webhooks. No MirrorMaker. No cross-cluster replication. Data goes where it needs to from the start.
Built-in data pipeline architecture. Raw data is validated (Bronze), cleaned and transformed (Silver), and enriched with business logic (Gold) — all before storage.
Start ingesting on day one. Bastion writes clean Parquet files directly to S3 or GCS — no Kafka, no Spark jobs, no Glue pipelines to maintain. Add destinations as your stack grows.
Clean, structured data isn't just good engineering — it's what makes AI actually work. The validated, well-typed data that Bastion produces is exactly the context an LLM needs to query your data accurately. No preprocessing, no guesswork. Bastion doesn't just prepare your data for your stack — it prepares it for AI.
Bastion isn't a replacement for Kafka. It's the layer that sits in front, ensuring your data is clean and properly routed.
| Bastion | REST Proxy | Kafka Connect | Custom | |
|---|---|---|---|---|
| Memory footprint | ~20 MB | 512 MB+ (JVM) | 512 MB+ (JVM) | Varies |
| Schema validation | Built-in | Separate service | Limited | Manual |
| Data transformation | Bronze → Silver → Gold | None | SMTs (limited) | Manual |
| Multi-destination fan-out | Native | Single cluster | Single cluster | Manual |
| Edge deployable | ✓ | ✗ | ✗ | Depends |
| Parquet output | Native | ✗ | Requires Spark/Glue | Manual |
| Requires Kafka | No | Yes | Yes | Depends |
| Deployment | Single binary | JVM + Schema Registry | JVM + Kafka cluster | Varies |
Where we are and where we're heading.
Django-based dashboard for CRUD operations, schema versioning, authentication, and datasource management.
Full REST API with auto-generated OpenAPI specs via drf-spectacular.
High-performance HTTP server, real-time validation, Bronze → Silver pipeline, and micro-batching engine.
Native routing to Kafka, S3, webhooks, and custom destinations from a single ingestion point.
Local buffering when downstream is unavailable. Zero data loss, automatic catch-up on reconnect.
Pluggable Python workers for data enrichment — joins, API calls, ML inference — communicating via Apache Arrow IPC.
Prometheus metrics, structured logging, distributed tracing, and automatic circuit breaking for downstream failures.
Be the first to know when Bastion is ready. No spam — just launch updates and early access.
You're on the list! We'll be in touch.