Site icon NSecurity Consulting

Lightweight Solution for Moving Data

6f569bd8 42f9 4c45 afff 6fea3cc842ce NSecurity Consulting

Moving Data from Splunk to Amazon S3 Using Cribl Stream

As Splunk environments grow, so do data volumes—and so does your license bill. Many organizations want to offload data from Splunk to cheaper storage like Amazon S3, while still retaining the ability to search, report, or rehydrate data later.

Cribl Stream is a perfect fit for this job. You can drop it into your existing Splunk pipeline and fan out events to both Splunk and S3, or even send some data only to S3 for long-term retention.

In this post, we’ll walk through:

  1. The architecture: where Cribl sits between Splunk and S3
  2. Configuring Splunk to send data to Cribl
  3. Creating an S3 destination in Cribl
  4. Building a Cribl Route & Pipeline to move data into S3
  5. Example configs and code snippets you can adapt

1. Architecture: Where Cribl Fits

The most common pattern looks like this:

Forwarders → Cribl Stream → Splunk Indexers + S3

You get:


2. Configure Splunk to Send Data to Cribl

On your Splunk forwarder (or HF), you’ll update outputs.conf so that data goes to Cribl as the “indexer”.

Example: outputs.conf on a Splunk Universal Forwarder

[tcpout]

defaultGroup = cribl_group

[tcpout:cribl_group]

server = cribl.company.local:9997

compressed = true

[tcpout-server://cribl.company.local:9997]

Now, all the logs that used to go to Splunk indexers can be routed through Cribl. From there, Cribl can still forward to Splunk, plus S3.


3. Configure a Source in Cribl for Splunk Data

In the Cribl UI:

  1. Go to Sources → TCP (or Splunk HEC if you’re using HEC).
  2. Add a new TCP Source, e.g.:

Now Cribl is listening for Splunk data on port 9997.


4. Configure an Amazon S3 Destination in Cribl

Next, we set up the S3 Destination to move data out of the pipeline.

  1. Go to Destinations → Amazon S3
  2. Click New Destination and configure:

Save and Enable the destination.


5. Create a Cribl Pipeline (Optional but Recommended)

Cribl Pipelines let you do things like:

For moving Splunk data to S3, a simple pipeline might:

  1. Drop unnecessary fields
  2. Add metadata for downstream analytics
  3. Serialize to JSON

Example Pipeline Functions (Conceptual)

In the UI: Pipelines → New Pipeline, name it to_s3_archive.

Add a few functions:

5.1 Reduce (remove noisy fields)

Drop fields you don’t need in long-term storage:

// Reduce function example (fields to remove)

[“_raw”, “linecount”, “punct”, “splunk_server”]

5.2 Eval (add metadata / normalize fields)

Add a “source_system” tag and normalize index if you want:

__e = {

  source_system: “splunk”,

  index: index || “unknown”,

}

5.3 Serialize (write as JSON for S3)

Set Serialize function to output json or ndjson, depending on your downstream tools.


6. Configure a Route: Splunk → Pipeline → S3

Routes are where the magic happens.

In Cribl UI:

  1. Go to Routes
  2. Add a New Route, e.g. splunk_to_s3_archive

Set:

This route means:

Any events coming in that match the filter will be processed by to_s3_archive pipeline and written to s3_longterm_logs (your S3 bucket).

If you also want them to still go to Splunk, you can either:


7. Example Route + Destination Config (YAML-style)

Below is a pseudo-config version of what this looks like under the hood. Don’t paste this directly, but use it as a reference.

Route (YAML-style example)

routes:

  – id: splunk_to_s3_archive

    name: “Splunk Logs to S3 Archive”

    filter: ‘index == “web” || index == “app” || index == “infra”‘

    pipeline: “to_s3_archive”

    outputs:

      – “s3_longterm_logs”

S3 Destination (conceptual)

destinations:

  s3_longterm_logs:

    type: s3

    bucket: “my-company-logs-prod”

    region: “us-east-1”

    path: “logs/splunk/$index/%Y/%m/%d/%H/”

    format: “ndjson”

    compression: “gzip”

    credentials:

      access_key: “AKIA…”

      secret_key: “********”


8. Verifying Data in S3

Once the route is active and events are flowing:

Each directory contains gzipped JSON or ndjson files with your events.

You can then:


9. Add a Splunk Destination (Optional: Keep Splunk Online)

Most people don’t want to lose Splunk search capability. So typically you:

  1. Add a Splunk HEC or TCP destination back to your Splunk indexers.
  2. Create another Route that forwards events to Splunk as usual.

Example Route for Splunk:

routes:

  – id: splunk_hot

    name: “Splunk Hot Search”

    filter: ‘true’          # All events, or a subset

    pipeline: “default”

    outputs:

      – “splunk_hec_prod”

Now:

All controlled by Cribl.


10. Putting It All Together

To move data from Splunk to S3 using Cribl, you:

  1. Redirect Splunk traffic (from forwarders) to Cribl Stream.
  2. Set up an S3 destination in Cribl.
  3. Create a pipeline to reduce/normalize/serialize data.
  4. Configure a route that sends selected Splunk events to S3 (and optionally Splunk too).
  5. Verify in S3 and plug into your chosen analytics engine.

You now have:

Let us handle the secure, reliable movement of your data between systems


Schedule a call

Exit mobile version