Sampling
Reduce log volume with intelligent sampling strategies
Sampling
Sampling reduces log volume by selectively keeping a percentage of logs while maintaining visibility. It's essential for high-throughput applications where logging every event would overwhelm storage or processing capacity.
Quick Start
Basic Sampling
Sample logs by level:
import { createLogger, samplingPlugin } from "cenglu";
const logger = createLogger({
plugins: [
samplingPlugin({
rates: {
trace: 0, // Drop all trace logs
debug: 0.1, // Keep 10% of debug logs
info: 0.5, // Keep 50% of info logs
warn: 1.0, // Keep all warnings
},
alwaysLogErrors: true, // Always keep errors and fatal
}),
],
});
logger.debug("Debug message"); // 10% chance of being logged
logger.info("Info message"); // 50% chance of being logged
logger.error("Error message"); // Always loggedLogger-Level Sampling
Configure sampling at logger creation:
const logger = createLogger({
service: "high-traffic-api",
sampling: {
rates: {
debug: 0.1,
info: 0.5,
},
defaultRate: 1.0,
},
});Why Sampling?
Problem: High-Volume Logging
// Without sampling: 1 million requests = 1 million debug logs
app.use((req, res, next) => {
logger.debug("Request received", {
method: req.method,
path: req.path,
});
next();
});
// With 10% sampling: 1 million requests = 100,000 logs
// Still provides visibility without overwhelming storageBenefits
- Reduced storage costs - Store 10-50x less data
- Lower processing overhead - Less CPU/memory for logging
- Faster log analysis - Smaller datasets to query
- Maintain visibility - Still catch issues with statistical confidence
- Control costs - Predictable log volume
When to Use Sampling
- ✅ High-throughput applications (>1000 req/sec)
- ✅ Debug/trace logs in production
- ✅ Non-critical info logs
- ✅ Cost-sensitive environments
- ❌ Error logs (always keep 100%)
- ❌ Compliance/audit logs (legal requirement)
- ❌ Critical business events
Sampling Plugin
Basic Configuration
import { samplingPlugin } from "cenglu";
const logger = createLogger({
plugins: [
samplingPlugin({
// Per-level rates (0-1)
rates: {
trace: 0, // 0% - Drop all
debug: 0.1, // 10% - Keep 1 in 10
info: 0.5, // 50% - Keep 1 in 2
warn: 1.0, // 100% - Keep all
},
// Default rate for unspecified levels
defaultRate: 1.0,
// Always log errors (default: true)
alwaysLogErrors: true,
// Always log fatal (default: true)
alwaysLogFatal: true,
// Callback when log is dropped
onDrop: (record) => {
// Track dropped logs
metrics.increment("logs.dropped", {
level: record.level,
});
},
}),
],
});Options
| Option | Type | Default | Description |
|---|---|---|---|
rates | Record<LogLevel, number> | {} | Per-level sampling rates (0-1) |
defaultRate | number | 1.0 | Default rate for unspecified levels |
alwaysLogErrors | boolean | true | Always keep error logs |
alwaysLogFatal | boolean | true | Always keep fatal logs |
random | () => number | Math.random | Custom random function (for testing) |
onDrop | (record) => void | - | Callback when log is dropped |
shouldSample | (record) => boolean | - | Custom sampling function |
Sampling Strategies
Random Sampling (Default)
Each log has an independent chance of being kept:
samplingPlugin({
rates: { debug: 0.1 }, // Each debug log has 10% chance
});
// Over 1000 debug logs, expect ~100 to be kept
// Statistical distribution ensures representative samplePros:
- Simple and fast
- Unbiased sampling
- Good statistical properties
Cons:
- Non-deterministic (same log may or may not be sampled)
- Can miss patterns if unlucky
Deterministic Sampling
Same log message always produces same result:
import { deterministicSamplingPlugin } from "cenglu";
const logger = createLogger({
plugins: [
deterministicSamplingPlugin({
rate: 0.1, // 10% sampling
hashField: "msg", // Hash based on message
alwaysLogErrors: true,
}),
],
});
// "User logged in" will always be sampled or never sampled
// Useful for consistent behavior across instancesHash fields:
msg- Hash based on log message (default)traceId- Hash based on trace ID (same trace always sampled/dropped)correlationId- Hash based on correlation ID
Pros:
- Consistent across multiple instances
- Same log always sampled or dropped
- Good for debugging specific issues
Cons:
- Can miss certain patterns if they hash to "drop"
- Less statistically representative
Custom Sampling
Implement complex sampling logic:
samplingPlugin({
shouldSample: (record) => {
// Always sample important logs
if (record.context?.important) {
return true;
}
// Always sample errors from specific service
if (record.level === "error" && record.service === "payment") {
return true;
}
// Sample 10% of everything else
return Math.random() < 0.1;
},
});Adaptive Sampling
Adjust sampling based on load:
let currentRate = 1.0;
let requestsPerSecond = 0;
setInterval(() => {
// Lower sampling rate under high load
if (requestsPerSecond > 1000) {
currentRate = 0.1;
} else if (requestsPerSecond > 500) {
currentRate = 0.5;
} else {
currentRate = 1.0;
}
requestsPerSecond = 0;
}, 1000);
const logger = createLogger({
plugins: [
samplingPlugin({
shouldSample: (record) => {
// Skip errors (always log)
if (record.level === "error" || record.level === "fatal") {
return true;
}
return Math.random() < currentRate;
},
}),
],
});Context-Based Sampling
Sample based on log context:
samplingPlugin({
shouldSample: (record) => {
// Always sample VIP users
if (record.context?.userTier === "premium") {
return true;
}
// Sample 50% of authenticated users
if (record.context?.userId) {
return Math.random() < 0.5;
}
// Sample 10% of anonymous traffic
return Math.random() < 0.1;
},
});Production Patterns
High-Traffic API
const logger = createLogger({
service: "api",
level: "info",
plugins: [
samplingPlugin({
rates: {
debug: 0.01, // 1% of debug logs
info: 0.1, // 10% of info logs
warn: 1.0, // 100% of warnings
},
alwaysLogErrors: true,
}),
],
});
// 10,000 req/sec × 10% sampling = 1,000 logs/sec
// Manageable log volume while maintaining visibilityDevelopment vs Production
const samplingRates = process.env.NODE_ENV === "production"
? { debug: 0.1, info: 0.5 }
: { debug: 1.0, info: 1.0 }; // No sampling in dev
const logger = createLogger({
plugins: [
samplingPlugin({
rates: samplingRates,
alwaysLogErrors: true,
}),
],
});Per-Environment Configuration
const getSamplingConfig = () => {
switch (process.env.NODE_ENV) {
case "production":
return {
rates: { debug: 0.01, info: 0.1 },
alwaysLogErrors: true,
};
case "staging":
return {
rates: { debug: 0.1, info: 0.5 },
alwaysLogErrors: true,
};
default: // development
return {
rates: { debug: 1.0, info: 1.0 },
};
}
};
const logger = createLogger({
plugins: [samplingPlugin(getSamplingConfig())],
});Performance Impact
Overhead
No sampling: 0.010ms per log
Random sampling: 0.012ms per log (+20%)
Deterministic: 0.015ms per log (+50%)
Custom function: 0.013ms per log (+30%)Throughput Improvement
Without sampling:
- 10,000 logs/sec
- All logs processed
With 10% sampling:
- 1,000 logs/sec (90% reduction)
- 10x less storage
- 10x faster queries
- ~5% faster overall performanceBest Practices
- Sample early: Use low
ordervalue (default: 5) to drop logs before expensive processing - Always log errors: Never sample error/fatal logs
- Monitor drop rate: Track how many logs are dropped
- Tune rates: Start conservative, increase sampling as needed
- Test sampling: Use deterministic sampling in tests
Monitoring Sampling
Track Dropped Logs
let droppedCount = 0;
let totalCount = 0;
const logger = createLogger({
plugins: [
samplingPlugin({
rates: { debug: 0.1 },
onDrop: (record) => {
droppedCount++;
totalCount++;
},
}),
],
});
// Report metrics
setInterval(() => {
if (totalCount > 0) {
const dropRate = droppedCount / totalCount;
console.log(`Drop rate: ${(dropRate * 100).toFixed(1)}%`);
// Reset counters
droppedCount = 0;
totalCount = 0;
}
}, 60000);Sampling Metrics
import { samplingPlugin } from "cenglu";
const metrics = {
dropped: 0,
kept: 0,
byLevel: {} as Record<string, { dropped: number; kept: number }>,
};
const logger = createLogger({
plugins: [
samplingPlugin({
rates: { debug: 0.1, info: 0.5 },
onDrop: (record) => {
metrics.dropped++;
const level = record.level;
if (!metrics.byLevel[level]) {
metrics.byLevel[level] = { dropped: 0, kept: 0 };
}
metrics.byLevel[level].dropped++;
},
}),
{
name: "sampling-metrics",
order: 10,
onRecord: (record) => {
metrics.kept++;
const level = record.level;
if (!metrics.byLevel[level]) {
metrics.byLevel[level] = { dropped: 0, kept: 0 };
}
metrics.byLevel[level].kept++;
return record;
},
},
],
});
// Expose metrics endpoint
app.get("/metrics", (req, res) => {
res.json(metrics);
});Testing with Sampling
Disable Sampling in Tests
const logger = createLogger({
plugins: [
samplingPlugin({
rates: process.env.NODE_ENV === "test"
? { debug: 1.0, info: 1.0 } // No sampling
: { debug: 0.1, info: 0.5 }, // Normal sampling
}),
],
});Mock Random Function
import { test, expect } from "vitest";
test("samples 50% of logs", () => {
let callCount = 0;
const mockRandom = () => {
callCount++;
return callCount % 2 === 0 ? 0.6 : 0.4; // Alternates above/below 0.5
};
const logs: any[] = [];
const logger = createLogger({
plugins: [
samplingPlugin({
rates: { info: 0.5 },
random: mockRandom,
}),
],
adapters: [
{ name: "test", handle: (record) => logs.push(record) },
],
});
logger.info("Log 1"); // 0.6 > 0.5 → dropped
logger.info("Log 2"); // 0.4 < 0.5 → kept
logger.info("Log 3"); // 0.6 > 0.5 → dropped
logger.info("Log 4"); // 0.4 < 0.5 → kept
expect(logs).toHaveLength(2);
});Test Deterministic Sampling
import { deterministicSamplingPlugin } from "cenglu";
test("deterministic sampling is consistent", () => {
const logs: any[] = [];
const logger = createLogger({
plugins: [
deterministicSamplingPlugin({
rate: 0.5,
hashField: "msg",
}),
],
adapters: [
{ name: "test", handle: (record) => logs.push(record) },
],
});
// Same message should produce same result
logger.info("Message A");
logger.info("Message B");
logger.info("Message A"); // Same as first
logger.info("Message B"); // Same as second
const countA = logs.filter((l) => l.msg === "Message A").length;
const countB = logs.filter((l) => l.msg === "Message B").length;
// Each message is consistently sampled or not
expect(countA === 0 || countA === 2).toBe(true);
expect(countB === 0 || countB === 2).toBe(true);
});Troubleshooting
Too Many Logs Dropped
Problem: Sampling is too aggressive, missing important logs
Solutions:
-
Increase sampling rate:
samplingPlugin({ rates: { debug: 0.5 }, // Increase from 0.1 to 0.5 }); -
Add important log markers:
samplingPlugin({ shouldSample: (record) => { // Never drop important logs if (record.context?.important) return true; // Normal sampling for others return Math.random() < 0.1; }, }); -
Use higher rates for specific services:
samplingPlugin({ shouldSample: (record) => { // Critical service - 100% sampling if (record.service === "payment") return true; // Others - 10% sampling return Math.random() < 0.1; }, });
Inconsistent Sampling
Problem: Same log sometimes logged, sometimes not
Solutions:
-
Use deterministic sampling:
deterministicSamplingPlugin({ rate: 0.1, hashField: "msg", }); -
Sample by trace ID:
deterministicSamplingPlugin({ rate: 0.1, hashField: "traceId", // Entire trace sampled/dropped together });
Errors Being Sampled
Problem: Critical errors are being dropped
Solutions:
-
Enable
alwaysLogErrors:samplingPlugin({ rates: { error: 0.1 }, // This is ignored alwaysLogErrors: true, // Errors always logged }); -
Custom sampling logic:
samplingPlugin({ shouldSample: (record) => { // Always log errors if (record.level === "error" || record.level === "fatal") { return true; } // Sample others return Math.random() < 0.1; }, });
Performance Not Improved
Problem: Sampling doesn't reduce overhead
Solutions:
-
Ensure plugin runs early:
samplingPlugin({ // ... options }); // Default order is 5 (very early) -
Check if logs are processed before sampling:
// Bad: Expensive operation before sampling logger.debug("Debug", expensiveFunction()); // Always called // Good: Check level first if (logger.isLevelEnabled("debug")) { logger.debug("Debug", expensiveFunction()); // Only if needed }
Environment Variables
# Sampling rates by level
SAMPLING_RATE_TRACE=0
SAMPLING_RATE_DEBUG=0.1
SAMPLING_RATE_INFO=0.5
SAMPLING_RATE_WARN=1.0
# Default rate
SAMPLING_DEFAULT_RATE=1.0
# Always log errors
SAMPLING_ALWAYS_LOG_ERRORS=trueUsage:
const logger = createLogger({
plugins: [
samplingPlugin({
rates: {
trace: parseFloat(process.env.SAMPLING_RATE_TRACE || "0"),
debug: parseFloat(process.env.SAMPLING_RATE_DEBUG || "0.1"),
info: parseFloat(process.env.SAMPLING_RATE_INFO || "0.5"),
warn: parseFloat(process.env.SAMPLING_RATE_WARN || "1.0"),
},
defaultRate: parseFloat(process.env.SAMPLING_DEFAULT_RATE || "1.0"),
alwaysLogErrors: process.env.SAMPLING_ALWAYS_LOG_ERRORS !== "false",
}),
],
});