Lightweight Solution for Moving Data

NSecurity Consulting

4 months ago

6f569bd8 42f9 4c45 afff 6fea3cc842ce NSecurity Consulting

Moving Data from Splunk to Amazon S3 Using Cribl Stream

As Splunk environments grow, so do data volumes—and so does your license bill. Many organizations want to offload data from Splunk to cheaper storage like Amazon S3, while still retaining the ability to search, report, or rehydrate data later.

Cribl Stream is a perfect fit for this job. You can drop it into your existing Splunk pipeline and fan out events to both Splunk and S3, or even send some data only to S3 for long-term retention.

In this post, we’ll walk through:

The architecture: where Cribl sits between Splunk and S3
Configuring Splunk to send data to Cribl
Creating an S3 destination in Cribl
Building a Cribl Route & Pipeline to move data into S3
Example configs and code snippets you can adapt

1. Architecture: Where Cribl Fits

The most common pattern looks like this:

Forwarders → Cribl Stream → Splunk Indexers + S3

Splunk Universal/Heavy Forwarders send data to Cribl instead of straight to Splunk.
Cribl Stream receives the data, optionally transforms or reduces it.
Cribl:
- Forwards a copy to Splunk (for hot search/detection).
- Sends a copy to Amazon S3 for long-term storage.

You get:

Real-time analytics in Splunk
Cheaper long-term retention in S3
Ability to filter, mask, or reduce data before it hits expensive storage

2. Configure Splunk to Send Data to Cribl

On your Splunk forwarder (or HF), you’ll update outputs.conf so that data goes to Cribl as the “indexer”.

Example: outputs.conf on a Splunk Universal Forwarder

[tcpout]

defaultGroup = cribl_group

[tcpout:cribl_group]

server = cribl.company.local:9997

compressed = true

[tcpout-server://cribl.company.local:9997]

cribl.company.local → Cribl Stream host
9997 → Port Cribl will listen on (we’ll configure that next)

Now, all the logs that used to go to Splunk indexers can be routed through Cribl. From there, Cribl can still forward to Splunk, plus S3.

3. Configure a Source in Cribl for Splunk Data

In the Cribl UI:

Go to Sources → TCP (or Splunk HEC if you’re using HEC).
Add a new TCP Source, e.g.:

Name: splunk_tcp_9997
Port: 9997
TLS: Optional (recommended in production)

Now Cribl is listening for Splunk data on port 9997.

4. Configure an Amazon S3 Destination in Cribl

Next, we set up the S3 Destination to move data out of the pipeline.

Go to Destinations → Amazon S3
Click New Destination and configure:

Name: s3_longterm_logs
Bucket: my-company-logs-prod
Region: us-east-1 (or your region)
Path Template: something like
logs/splunk/$sourcetype/%Y/%m/%d/%H/%M/
Format: ndjson or json (newline-delimited JSON is typical)
Credentials: IAM Access Key / Role with write permission to the bucket

Save and Enable the destination.

5. Create a Cribl Pipeline (Optional but Recommended)

Cribl Pipelines let you do things like:

Drop noisy events
Mask sensitive fields
Normalize field names
Reduce event size

For moving Splunk data to S3, a simple pipeline might:

Drop unnecessary fields
Add metadata for downstream analytics
Serialize to JSON

Example Pipeline Functions (Conceptual)

In the UI: Pipelines → New Pipeline, name it to_s3_archive.

Add a few functions:

5.1 Reduce (remove noisy fields)

Drop fields you don’t need in long-term storage:

// Reduce function example (fields to remove)

[“_raw”, “linecount”, “punct”, “splunk_server”]

5.2 Eval (add metadata / normalize fields)

Add a “source_system” tag and normalize index if you want:

__e = {

source_system: “splunk”,

index: index || “unknown”,

}

5.3 Serialize (write as JSON for S3)

Set Serialize function to output json or ndjson, depending on your downstream tools.

6. Configure a Route: Splunk → Pipeline → S3

Routes are where the magic happens.

In Cribl UI:

Go to Routes
Add a New Route, e.g. splunk_to_s3_archive

Set:

Filter: which events should go to S3?
- Example: only archive certain indexes:
- index == “web” || index == “app” || index == “infra”
Pipeline: to_s3_archive
Destinations: s3_longterm_logs

This route means:

Any events coming in that match the filter will be processed by to_s3_archive pipeline and written to s3_longterm_logs (your S3 bucket).

If you also want them to still go to Splunk, you can either:

Use a separate Route that forwards to a Splunk Destination, or
Add Splunk Indexers as another Destination in this Route.

7. Example Route + Destination Config (YAML-style)

Below is a pseudo-config version of what this looks like under the hood. Don’t paste this directly, but use it as a reference.

Route (YAML-style example)

routes:

– id: splunk_to_s3_archive

name: “Splunk Logs to S3 Archive”

filter: ‘index == “web” || index == “app” || index == “infra”‘

pipeline: “to_s3_archive”

outputs:

– “s3_longterm_logs”

S3 Destination (conceptual)

destinations:

s3_longterm_logs:

type: s3

bucket: “my-company-logs-prod”

region: “us-east-1”

path: “logs/splunk/$index/%Y/%m/%d/%H/”

format: “ndjson”

compression: “gzip”

credentials:

access_key: “AKIA…”

secret_key: “********”

8. Verifying Data in S3

Once the route is active and events are flowing:

Go to your S3 bucket, e.g. s3://my-company-logs-prod/logs/splunk/
You should see folders like:
- web/2025/11/15/12/
- app/2025/11/15/12/

Each directory contains gzipped JSON or ndjson files with your events.

You can then:

Query them with Athena, Snowflake, Databricks, or Cribl Search.
Use them for long-term compliance, forensics, or cost-efficient analytics.

9. Add a Splunk Destination (Optional: Keep Splunk Online)

Most people don’t want to lose Splunk search capability. So typically you:

Add a Splunk HEC or TCP destination back to your Splunk indexers.
Create another Route that forwards events to Splunk as usual.

Example Route for Splunk:

routes:

– id: splunk_hot

name: “Splunk Hot Search”

filter: ‘true’ # All events, or a subset

pipeline: “default”

outputs:

– “splunk_hec_prod”

Now:

Splunk keeps its hot/warm index for 30–90 days
S3 keeps years of data for cheap

All controlled by Cribl.

10. Putting It All Together

To move data from Splunk to S3 using Cribl, you:

Redirect Splunk traffic (from forwarders) to Cribl Stream.
Set up an S3 destination in Cribl.
Create a pipeline to reduce/normalize/serialize data.
Configure a route that sends selected Splunk events to S3 (and optionally Splunk too).
Verify in S3 and plug into your chosen analytics engine.

You now have:

A cost-effective long-term retention strategy
Fine-grained control over what goes to Splunk vs S3
The ability to evolve your storage strategy without touching every forwarder or indexer again

Let us handle the secure, reliable movement of your data between systems

Schedule a call