Moving Data from Splunk to Amazon S3 Using Cribl Stream
As Splunk environments grow, so do data volumes—and so does your license bill. Many organizations want to offload data from Splunk to cheaper storage like Amazon S3, while still retaining the ability to search, report, or rehydrate data later.
Cribl Stream is a perfect fit for this job. You can drop it into your existing Splunk pipeline and fan out events to both Splunk and S3, or even send some data only to S3 for long-term retention.
In this post, we’ll walk through:
- The architecture: where Cribl sits between Splunk and S3
- Configuring Splunk to send data to Cribl
- Creating an S3 destination in Cribl
- Building a Cribl Route & Pipeline to move data into S3
- Example configs and code snippets you can adapt
1. Architecture: Where Cribl Fits
The most common pattern looks like this:
Forwarders → Cribl Stream → Splunk Indexers + S3
- Splunk Universal/Heavy Forwarders send data to Cribl instead of straight to Splunk.
- Cribl Stream receives the data, optionally transforms or reduces it.
- Cribl:
- Forwards a copy to Splunk (for hot search/detection).
- Sends a copy to Amazon S3 for long-term storage.
You get:
- Real-time analytics in Splunk
- Cheaper long-term retention in S3
- Ability to filter, mask, or reduce data before it hits expensive storage
2. Configure Splunk to Send Data to Cribl
On your Splunk forwarder (or HF), you’ll update outputs.conf so that data goes to Cribl as the “indexer”.
Example: outputs.conf on a Splunk Universal Forwarder
[tcpout]
defaultGroup = cribl_group
[tcpout:cribl_group]
server = cribl.company.local:9997
compressed = true
[tcpout-server://cribl.company.local:9997]
- cribl.company.local → Cribl Stream host
- 9997 → Port Cribl will listen on (we’ll configure that next)
Now, all the logs that used to go to Splunk indexers can be routed through Cribl. From there, Cribl can still forward to Splunk, plus S3.
3. Configure a Source in Cribl for Splunk Data
In the Cribl UI:
- Go to Sources → TCP (or Splunk HEC if you’re using HEC).
- Add a new TCP Source, e.g.:
- Name: splunk_tcp_9997
- Port: 9997
- TLS: Optional (recommended in production)
Now Cribl is listening for Splunk data on port 9997.
4. Configure an Amazon S3 Destination in Cribl
Next, we set up the S3 Destination to move data out of the pipeline.
- Go to Destinations → Amazon S3
- Click New Destination and configure:
- Name: s3_longterm_logs
- Bucket: my-company-logs-prod
- Region: us-east-1 (or your region)
- Path Template: something like
- logs/splunk/$sourcetype/%Y/%m/%d/%H/%M/
- Format: ndjson or json (newline-delimited JSON is typical)
- Credentials: IAM Access Key / Role with write permission to the bucket
Save and Enable the destination.
5. Create a Cribl Pipeline (Optional but Recommended)
Cribl Pipelines let you do things like:
- Drop noisy events
- Mask sensitive fields
- Normalize field names
- Reduce event size
For moving Splunk data to S3, a simple pipeline might:
- Drop unnecessary fields
- Add metadata for downstream analytics
- Serialize to JSON
Example Pipeline Functions (Conceptual)
In the UI: Pipelines → New Pipeline, name it to_s3_archive.
Add a few functions:
5.1 Reduce (remove noisy fields)
Drop fields you don’t need in long-term storage:
// Reduce function example (fields to remove)
[“_raw”, “linecount”, “punct”, “splunk_server”]
5.2 Eval (add metadata / normalize fields)
Add a “source_system” tag and normalize index if you want:
__e = {
source_system: “splunk”,
index: index || “unknown”,
}
5.3 Serialize (write as JSON for S3)
Set Serialize function to output json or ndjson, depending on your downstream tools.
6. Configure a Route: Splunk → Pipeline → S3
Routes are where the magic happens.
In Cribl UI:
- Go to Routes
- Add a New Route, e.g. splunk_to_s3_archive
Set:
- Filter: which events should go to S3?
- Example: only archive certain indexes:
- index == “web” || index == “app” || index == “infra”
- Pipeline: to_s3_archive
- Destinations: s3_longterm_logs
This route means:
Any events coming in that match the filter will be processed by to_s3_archive pipeline and written to s3_longterm_logs (your S3 bucket).
If you also want them to still go to Splunk, you can either:
- Use a separate Route that forwards to a Splunk Destination, or
- Add Splunk Indexers as another Destination in this Route.
7. Example Route + Destination Config (YAML-style)
Below is a pseudo-config version of what this looks like under the hood. Don’t paste this directly, but use it as a reference.
Route (YAML-style example)
routes:
– id: splunk_to_s3_archive
name: “Splunk Logs to S3 Archive”
filter: ‘index == “web” || index == “app” || index == “infra”‘
pipeline: “to_s3_archive”
outputs:
– “s3_longterm_logs”
S3 Destination (conceptual)
destinations:
s3_longterm_logs:
type: s3
bucket: “my-company-logs-prod”
region: “us-east-1”
path: “logs/splunk/$index/%Y/%m/%d/%H/”
format: “ndjson”
compression: “gzip”
credentials:
access_key: “AKIA…”
secret_key: “********”
8. Verifying Data in S3
Once the route is active and events are flowing:
- Go to your S3 bucket, e.g. s3://my-company-logs-prod/logs/splunk/
- You should see folders like:
- web/2025/11/15/12/
- app/2025/11/15/12/
Each directory contains gzipped JSON or ndjson files with your events.
You can then:
- Query them with Athena, Snowflake, Databricks, or Cribl Search.
- Use them for long-term compliance, forensics, or cost-efficient analytics.
9. Add a Splunk Destination (Optional: Keep Splunk Online)
Most people don’t want to lose Splunk search capability. So typically you:
- Add a Splunk HEC or TCP destination back to your Splunk indexers.
- Create another Route that forwards events to Splunk as usual.
Example Route for Splunk:
routes:
– id: splunk_hot
name: “Splunk Hot Search”
filter: ‘true’ # All events, or a subset
pipeline: “default”
outputs:
– “splunk_hec_prod”
Now:
- Splunk keeps its hot/warm index for 30–90 days
- S3 keeps years of data for cheap
All controlled by Cribl.
10. Putting It All Together
To move data from Splunk to S3 using Cribl, you:
- Redirect Splunk traffic (from forwarders) to Cribl Stream.
- Set up an S3 destination in Cribl.
- Create a pipeline to reduce/normalize/serialize data.
- Configure a route that sends selected Splunk events to S3 (and optionally Splunk too).
- Verify in S3 and plug into your chosen analytics engine.
You now have:
- A cost-effective long-term retention strategy
- Fine-grained control over what goes to Splunk vs S3
- The ability to evolve your storage strategy without touching every forwarder or indexer again
