Feb 17, 2025

How to Build Reliable Mission Control Systems

Arisay A.

Arisay A.

CEO

TL;DR: Mission-critical systems demand more than basic error handling. This guide covers using Effect.ts for typed error management, implementing wide events for observability, integrating Sentry for real-time monitoring, crafting user-friendly error messages, and maintaining platform stability through conservative update strategies.


The Cost of Silent Failures

At 3:47 AM, your payment system stops processing transactions. No alerts fire. No errors appear in your logs. Customers simply can’t complete their purchases. By the time you discover the issue during business hours, you’ve lost $50,000 in revenue and 200 customers have tweeted about your “broken” website.

This scenario plays out across companies every day. The culprit isn’t a lack of logging or error handling. It’s a lack of systematic reliability engineering. Traditional approaches treat errors as exceptions to handle when they occur. Mission control systems treat reliability as a first-class feature.

The fundamental shift: Stop asking “How do we fix bugs?” and start asking “How do we build systems where critical failures are impossible to miss?”

Architecture Overview: The Reliability Stack

Mission Control Architecture A comprehensive mission control system layers multiple reliability mechanisms

Building reliable systems requires five interconnected layers:

  1. Typed Error Handling: Effect.ts for explicit, trackable errors
  2. Real-Time Monitoring: Sentry integration with traces and context
  3. User-Centric Communication: Clear, actionable error messages
  4. Comprehensive Observability: Wide events for complete context
  5. Stability-First Operations: Conservative updates and graceful degradation

Let’s explore how these layers work together to create systems that fail gracefully, alert immediately, and recover automatically.


Layer 1: Typed Error Handling with Effect.ts

Traditional try-catch blocks are the silent killers of reliability. They hide error types from your type system, catch everything indiscriminately, and provide zero compile-time guarantees about error handling.

The Problem with Throw/Catch

// The type system lies to you
async function fetchUser(userId: string): Promise<User> {
  const response = await fetch(`/api/users/${userId}`);
  if (!response.ok) throw new Error("Request failed");
  return await response.json();
}

// TypeScript thinks this returns User
// But it can throw! When? What type? No one knows.

When something goes wrong, you’re left grep-ing through logs hoping to find a clue. The error might be a network timeout, a JSON parsing failure, or an authentication issue. But your code treats them identically.

Effect.ts: Errors as First-Class Citizens

Effect.ts introduces Effect<A, E, R>, a type that explicitly declares:

  • A: What success looks like
  • E: What can go wrong (typed)
  • R: What dependencies are needed
import { Effect, Data } from "effect";

// Define errors as data
class NetworkError extends Data.TaggedError("NetworkError")<{
  url: string;
  statusCode: number;
  retryable: boolean;
}> {}

class ParseError extends Data.TaggedError("ParseError")<{
  input: string;
  line?: number;
}> {}

// The signature tells the whole story
const fetchUser = (
  userId: string
): Effect.Effect<User, NetworkError | ParseError, never> =>
  Effect.gen(function* () {
    const response = yield* Effect.tryPromise({
      try: () => fetch(`/api/users/${userId}`),
      catch: (error) =>
        new NetworkError({
          url: `/api/users/${userId}`,
          statusCode: 0,
          retryable: true,
        }),
    });

    if (!response.ok) {
      yield* Effect.fail(
        new NetworkError({
          url: `/api/users/${userId}`,
          statusCode: response.status,
          retryable: response.status >= 500,
        })
      );
    }

    const text = yield* Effect.promise(() => response.text());

    return yield* Effect.try({
      try: () => JSON.parse(text) as User,
      catch: (error) =>
        new ParseError({
          input: text.substring(0, 200),
        }),
    });
  });

Now when you use fetchUser, the compiler forces you to handle both NetworkError and ParseError. No surprises at runtime.

Strategic Error Handling

Effect provides powerful combinators for handling errors based on their type and severity:

import { Effect, Schedule } from "effect";

// Retry only retryable network errors with exponential backoff
const resilientFetch = fetchUser("123").pipe(
  Effect.retry({
    schedule: Schedule.exponential("100 millis").pipe(
      Schedule.intersect(Schedule.recurs(3))
    ),
    while: (error) => error._tag === "NetworkError" && error.retryable,
  }),
  
  // Fall back to cache on persistent failures
  Effect.orElse(() => fetchFromCache("123")),
  
  // Log all failures for observability
  Effect.tapError((error) =>
    Effect.logError("User fetch failed", { error, userId: "123" })
  )
);

Key insight: Not all errors are equal. Expected errors like network timeouts or validation failures deserve recovery strategies. Defects like null pointer exceptions or logic errors should crash loudly and immediately. Effect distinguishes between them.


Layer 2: Real-Time Error Monitoring with Sentry

Typed errors are powerful, but you still need visibility into what’s happening in production. That’s where Sentry comes in. It works not just as an error logger, but as a mission control dashboard.

Beyond Basic Error Reporting

Most Sentry integrations capture the error and call it a day. A mission control integration captures context, traces execution, and correlates across services.

import { Effect, Data, Layer } from "effect";
import * as Sentry from "@sentry/browser";

// Create a Sentry service that integrates with Effect
class SentryService extends Effect.Service<SentryService>()("SentryService", {
  effect: Effect.gen(function* () {
    return {
      captureException: (error: unknown, context?: Record<string, unknown>) =>
        Effect.sync(() => {
          Sentry.withScope((scope) => {
            if (context) {
              Object.entries(context).forEach(([key, value]) => {
                scope.setContext(key, value);
              });
            }
            Sentry.captureException(error);
          });
        }),

      startTransaction: (context: Sentry.TransactionContext) =>
        Effect.sync(() => Sentry.startTransaction(context)),

      addBreadcrumb: (breadcrumb: Sentry.Breadcrumb) =>
        Effect.sync(() => Sentry.addBreadcrumb(breadcrumb)),
    };
  }),
}) {}

// Wrap operations with automatic Sentry reporting
const withSentryTracing = <A, E, R>(
  operationName: string,
  effect: Effect.Effect<A, E, R>
): Effect.Effect<A, E, R | SentryService> =>
  Effect.gen(function* () {
    const sentry = yield* SentryService;
    const transaction = yield* sentry.startTransaction({
      name: operationName,
      op: "mission-operation",
    });

    yield* sentry.addBreadcrumb({
      category: "operation",
      message: `Starting ${operationName}`,
      level: "info",
    });

    const startTime = Date.now();

    try {
      const result = yield* effect;
      
      transaction.setStatus("ok");
      transaction.finish();
      
      yield* sentry.addBreadcrumb({
        category: "operation",
        message: `${operationName} completed`,
        level: "info",
        data: { duration: Date.now() - startTime },
      });

      return result;
    } catch (error) {
      transaction.setStatus("error");
      transaction.finish();

      yield* sentry.captureException(error, {
        operation: operationName,
        duration: Date.now() - startTime,
      });

      throw error;
    }
  });

Distributed Tracing Across Services

When a checkout request flows through your API gateway, user service, payment processor, and inventory system, you need to trace the entire journey:

// Propagate trace context through Effect services
const processCheckout = (
  orderId: string
): Effect.Effect<CheckoutResult, CheckoutError, SentryService> =>
  withSentryTracing("checkout-process", 
    Effect.gen(function* () {
      const sentry = yield* SentryService;

      // Each step adds breadcrumbs
      yield* sentry.addBreadcrumb({
        category: "checkout",
        message: "Validating order",
        data: { orderId },
      });
      const order = yield* validateOrder(orderId);

      yield* sentry.addBreadcrumb({
        category: "checkout",
        message: "Processing payment",
        data: { amount: order.total, method: order.paymentMethod },
      });
      const payment = yield* processPayment(order);

      yield* sentry.addBreadcrumb({
        category: "checkout",
        message: "Updating inventory",
        data: { items: order.items.map(i => i.sku) },
      });
      yield* updateInventory(order);

      return { order, payment };
    })
  );

Result: In Sentry, you see a complete timeline of the checkout process. If the payment fails, you know exactly which step failed, how long it took, and what data was involved. No more saying “It works on my machine.”


Layer 3: User-Centric Error Communication

When errors happen, your users need to know three things:

  1. What went wrong in plain language
  2. Why it happened without blaming them
  3. What to do next with clear actions

The Error Classification System

Not all errors deserve the same treatment. Build a classification system:

import { Data } from "effect";

interface UserError {
  message: string;
  action?: string;
  severity: "info" | "warning" | "error" | "fatal";
  shouldReport: boolean;
  errorCode: string;
}

// User-facing errors implement a common interface
class NetworkError extends Data.TaggedError("NetworkError")<{
  url: string;
}> {
  toUserError(): UserError {
    return {
      message: "We're having trouble connecting to our servers. This isn't your fault. There might be a temporary network issue.",
      action: "Please check your internet connection and try again in a few moments.",
      severity: "warning",
      shouldReport: true,
      errorCode: "NET_001",
    };
  }
}

class ValidationError extends Data.TaggedError("ValidationError")<{
  field: string;
  message: string;
}> {
  toUserError(): UserError {
    return {
      message: `We noticed an issue with the ${this.field} field: ${this.message}`,
      action: "Please review the highlighted field and try again.",
      severity: "info",
      shouldReport: false,
      errorCode: "VAL_001",
    };
  }
}

class PaymentDeclinedError extends Data.TaggedError("PaymentDeclinedError")<{
  reason: string;
}> {
  toUserError(): UserError {
    return {
      message: `Your payment couldn't be processed. ${this.reason}`,
      action: "Try a different payment method or contact your bank if the problem persists.",
      severity: "warning",
      shouldReport: false,
      errorCode: "PAY_003",
    };
  }
}

Converting Internal Errors to User Messages

Create a translation layer that converts technical errors into user-friendly messages:

const toUserError = (error: unknown): UserError => {
  // Effect tagged errors with toUserError method
  if (error && typeof error === "object" && "_tag" in error) {
    const taggedError = error as { _tag: string; toUserError?: () => UserError };
    
    if (taggedError.toUserError) {
      return taggedError.toUserError();
    }
  }

  // Fallback for unexpected errors
  return {
    message: "Something unexpected happened. Our team has been notified and is working on a fix.",
    action: "Please try again in a few moments. If the problem persists, contact support with error code: UNK_001.",
    severity: "error",
    shouldReport: true,
    errorCode: "UNK_001",
  };
};

UI Components for Error Display

interface ErrorDisplayProps {
  error: UserError;
  onRetry?: () => void;
  onDismiss?: () => void;
}

const ErrorDisplay: React.FC<ErrorDisplayProps> = ({
  error,
  onRetry,
  onDismiss,
}) => {
  const severityStyles = {
    info: "bg-blue-50 border-blue-200 text-blue-800",
    warning: "bg-amber-50 border-amber-200 text-amber-800",
    error: "bg-red-50 border-red-200 text-red-800",
    fatal: "bg-red-100 border-red-300 text-red-900",
  };

  return (
    <div className={`rounded-lg border p-4 ${severityStyles[error.severity]}`}>
      <div className="flex items-start gap-4">
        <div className="flex-1">
          <h3 className="font-semibold text-base">{error.message}</h3>
          {error.action && (
            <p className="mt-1 text-sm opacity-90">{error.action}</p>
          )}
          <p className="mt-2 text-xs opacity-60 font-mono">
            Error code: {error.errorCode}
          </p>
        </div>
        <div className="flex gap-2">
          {onRetry && (
            <button
              onClick={onRetry}
              className="px-4 py-2 text-sm font-medium bg-white rounded border hover:bg-gray-50 transition-colors"
            >
              Try Again
            </button>
          )}
          {onDismiss && (
            <button
              onClick={onDismiss}
              className="px-4 py-2 text-sm font-medium rounded hover:bg-black/5 transition-colors"
            >
              Dismiss
            </button>
          )}
        </div>
      </div>
    </div>
  );
};

Best practices:

  • Never blame the user: “Your card was declined” → “The payment couldn’t be processed”
  • Always provide next steps: Every error message should include an action
  • Include error codes: Help support teams identify issues quickly
  • Match severity to styling: Info = blue, Warning = amber, Error = red

Layer 4: Wide Events for Comprehensive Observability

Traditional logging is broken. When a request fails, you grep through 50 log lines scattered across 5 services, trying to reconstruct what happened. Wide events, also called Canonical Log Lines, solve this by emitting one comprehensive event per request containing all relevant context.

Traditional Logging vs. Wide Events

Traditional approach: 6+ scattered log lines

[INFO] Request started: POST /api/checkout
[INFO] User authenticated: user_123
[INFO] Database query: SELECT * FROM carts...
[INFO] Payment processing started
[ERROR] Payment failed: card_declined
[INFO] Request completed in 1200ms

Wide event approach: One comprehensive JSON object

{
  "timestamp": "2025-01-15T10:23:45.612Z",
  "request_id": "req_8bf7ec2d",
  "trace_id": "trace_abc123",
  "service": "checkout-service",
  "method": "POST",
  "path": "/api/checkout",
  "duration_ms": 1200,
  "status": "error",
  "user": {
    "id": "user_123",
    "plan": "premium",
    "region": "us-east-1"
  },
  "payment": {
    "method": "card",
    "status": "failed",
    "error_code": "card_declined",
    "amount_cents": 9999
  },
  "database": {
    "query_time_ms": 45,
    "rows_affected": 3
  }
}

Implementing Wide Events with Effect

import { Effect, Ref, Context } from "effect";
import { randomUUID } from "crypto";

// Wide event accumulator context
class WideEventLogger extends Context.Tag("WideEventLogger")<
  WideEventLogger,
  {
    readonly set: (data: Record<string, unknown>) => Effect.Effect<void>;
    readonly emit: () => Effect.Effect<void>;
  }
>() {}

// Create a wide event logger for each request
const createWideEventLogger = (config: {
  service: string;
  version: string;
  environment: string;
}) =>
  Effect.gen(function* () {
    const eventRef = yield* Ref.make<Record<string, unknown>>({
      timestamp: new Date().toISOString(),
      request_id: randomUUID(),
      trace_id: randomUUID(),
      service: config.service,
      version: config.version,
      environment: config.environment,
      status: "in_progress",
    });

    return {
      set: (data: Record<string, unknown>) =>
        Ref.update(eventRef, (event) => ({ ...event, ...data })),

      emit: () =>
        Effect.gen(function* () {
          const event = yield* Ref.get(eventRef);
          const finalEvent = {
            ...event,
            timestamp: new Date().toISOString(),
          };
          
          // Send to your observability platform
          console.log(JSON.stringify(finalEvent));
        }),
    };
  });

// Use in request handlers
const processCheckout = (
  cartId: string
): Effect.Effect<Order, CheckoutError, WideEventLogger> =>
  Effect.gen(function* () {
    const logger = yield* WideEventLogger;
    const startTime = Date.now();

    // Build the event throughout the request
    yield* logger.set({
      operation: "checkout",
      cart_id: cartId,
    });

    // Fetch cart data
    const cart = yield* fetchCart(cartId);
    yield* logger.set({
      user_id: cart.userId,
      items_count: cart.items.length,
      subtotal_cents: cart.subtotal,
    });

    // Process payment
    const payment = yield* processPayment(cart);
    yield* logger.set({
      payment_id: payment.id,
      payment_processor: payment.processor,
      payment_duration_ms: payment.duration,
    });

    // Success!
    yield* logger.set({
      status: "success",
      duration_ms: Date.now() - startTime,
      order_id: payment.orderId,
    });

    yield* logger.emit();
    return payment;
  });

What to Include in Wide Events

A comprehensive wide event includes:

Request Context:

  • request_id, trace_id, span_id for correlation
  • method, path, status_code
  • duration_ms, timestamp
  • user_agent, ip_address (anonymized)

User Context:

  • user_id, user_plan, user_region
  • session_id, account_age_days

Business Context:

  • order_id, payment_id, cart_id
  • amount_cents, currency, items_count
  • feature_flags, experiment_variants

Performance Context:

  • db_query_time_ms, cache_hits
  • external_api_calls with latencies
  • memory_usage_mb

Error Context (on failures):

  • error_type, error_code, error_message
  • retry_count, is_retryable
  • fallback_used

Querying Wide Events

With wide events, complex queries become simple:

-- Find slow checkouts for premium users
SELECT *
FROM wide_events
WHERE service = 'checkout-service'
  AND user_plan = 'premium'
  AND duration_ms > 2000
  AND timestamp > now() - interval '1 hour'

-- Analyze payment failures by processor
SELECT 
  payment_processor,
  payment_method,
  COUNT(*) as failures,
  AVG(duration_ms) as avg_time
FROM wide_events
WHERE status = 'error'
  AND error_type = 'PaymentError'
GROUP BY payment_processor, payment_method

-- Compare feature flag performance
SELECT 
  feature_flags.new_checkout_ui,
  AVG(duration_ms) as avg_duration,
  SUM(CASE WHEN status = 'error' THEN 1 ELSE 0 END) / COUNT(*) * 100 as error_rate
FROM wide_events
WHERE operation = 'checkout'
GROUP BY feature_flags.new_checkout_ui

Tools that excel with wide events:

  • Honeycomb: Purpose-built for high-cardinality, wide event querying
  • ClickHouse: Excellent performance for analytical queries on wide events
  • BigQuery: Great for long-term storage and complex analysis

Layer 5: Stability-First Operations

The final layer of reliability isn’t technical. It’s operational. Mission control systems stay stable through conservative practices, not aggressive updates.

The “If It Ain’t Broke” Philosophy

// package.json stability strategy
{
  "dependencies": {
    // Major frameworks: Pin exact versions, update annually
    "react": "18.2.0",
    "effect": "3.0.0",
    
    // Security patches only, automated with review
    "axios": "1.6.0",
    
    // Dev tools: More flexible
    "typescript": "^5.2.0",
    "vitest": "^1.0.0"
  }
}

Rules for stability:

  1. Never update during critical periods: Blackout dates during product launches, sales events, or end-of-quarter
  2. Test in isolation: Staging environments that mirror production exactly
  3. Instant rollback: Deployments must be reversible within 60 seconds
  4. Canary releases: Roll out to 1% → 5% → 25% → 100% over 48 hours
  5. Feature flags over deployments: Enable features via configuration, not code changes

Feature Flagging for Safety

import { Effect, Layer } from "effect";

// Feature flag service
const FeatureFlagsLive = Layer.succeed(
  FeatureFlags,
  {
    isEnabled: (flag: string) =>
      Effect.sync(() => {
        // Check feature flag service (LaunchDarkly, etc.)
        return launchDarklyClient.variation(flag, false);
      }),
  }
);

// Kill switch pattern for new features
const withKillSwitch = <A, E, R>(
  flagName: string,
  newImplementation: Effect.Effect<A, E, R>,
  oldImplementation: Effect.Effect<A, E, R>
): Effect.Effect<A, E, R | FeatureFlags> =>
  Effect.gen(function* () {
    const flags = yield* FeatureFlags;
    const isEnabled = yield* flags.isEnabled(flagName);

    if (isEnabled) {
      try {
        return yield* newImplementation;
      } catch (error) {
        yield* Effect.logError(`Kill switch activated for ${flagName}`, { error });
        return yield* oldImplementation;
      }
    }

    return yield* oldImplementation;
  });

Circuit Breakers and Graceful Degradation

When external services fail, don’t fail your entire system:

import { Effect, Schedule, Data } from "effect";

class CircuitBreakerOpen extends Data.TaggedError("CircuitBreakerOpen")<{
  service: string;
}> {}

// Circuit breaker prevents cascading failures
const withCircuitBreaker = (
  serviceName: string,
  options: {
    failureThreshold: number;
    resetTimeout: number;
  }
) => {
  let failures = 0;
  let lastFailureTime = 0;
  let state: "closed" | "open" | "half-open" = "closed";

  return <A, E, R>(
    effect: Effect.Effect<A, E, R>
  ): Effect.Effect<A, E | CircuitBreakerOpen, R> =>
    Effect.gen(function* () {
      // Check if circuit is open
      if (state === "open") {
        if (Date.now() - lastFailureTime > options.resetTimeout) {
          state = "half-open";
          failures = 0;
        } else {
          return yield* Effect.fail(
            new CircuitBreakerOpen({ service: serviceName })
          );
        }
      }

      // Execute with tracking
      return yield* effect.pipe(
        Effect.tap(() => {
          failures = 0;
          if (state === "half-open") state = "closed";
        }),
        Effect.catchAll((error) => {
          failures++;
          lastFailureTime = Date.now();
          
          if (failures >= options.failureThreshold) {
            state = "open";
          }
          
          return Effect.fail(error);
        })
      );
    });
};

// Graceful degradation with fallback
const fetchRecommendations = (userId: string) =>
  Effect.gen(function* () {
    const circuitBreaker = withCircuitBreaker("recommendation-service", {
      failureThreshold: 5,
      resetTimeout: 30000,
    });

    return yield* fetchPersonalizedRecommendations(userId).pipe(
      circuitBreaker,
      // Fallback to popular items on failure
      Effect.orElse(() => fetchPopularItems()),
      // Fallback to empty array if everything fails
      Effect.orElse(() => Effect.succeed([])),
      Effect.timeout("2 seconds")
    );
  });

Putting It All Together: A Mission Control Example

Here’s how all five layers work together in a real checkout flow:

import { Effect, pipe } from "effect";

const missionControlCheckout = (
  cartId: string,
  paymentMethod: string
) =>
  Effect.gen(function* () {
    // 1. Initialize wide event logging
    const logger = yield* WideEventLogger;
    const sentry = yield* SentryService;

    yield* logger.set({
      operation: "checkout",
      cart_id: cartId,
      payment_method: paymentMethod,
    });

    // 2. Validate with typed errors
    const cart = yield* validateCart(cartId).pipe(
      Effect.tap((cart) =>
        logger.set({
          user_id: cart.userId,
          items_count: cart.items.length,
          total_cents: cart.total,
        })
      ),
      // Convert to user-friendly error
      Effect.mapError((error) => error.toUserError())
    );

    // 3. Process payment with circuit breaker
    const payment = yield* processPayment(cart, paymentMethod).pipe(
      withCircuitBreaker("payment-service", {
        failureThreshold: 3,
        resetTimeout: 30000,
      }),
      withSentryTracing("payment-processing"),
      Effect.retry({
        schedule: Schedule.exponential("100 millis"),
        while: (e) => e.retryable,
      }),
      Effect.tap((payment) =>
        logger.set({
          payment_id: payment.id,
          payment_status: payment.status,
          payment_duration_ms: payment.duration,
        })
      )
    );

    // 4. Update inventory with kill switch
    yield* withKillSwitch(
      "new-inventory-system",
      updateInventoryV2(cart),
      updateInventoryV1(cart)
    );

    // 5. Emit comprehensive wide event
    yield* logger.set({ status: "success" });
    yield* logger.emit();

    return payment;
  }).pipe(
    // Handle any uncaught errors
    Effect.catchAll((error) =>
      Effect.gen(function* () {
        yield* logger.set({
          status: "error",
          error_type: error.constructor.name,
          error_message: error.message,
        });
        yield* logger.emit();

        // Report to Sentry if needed
        if (error.shouldReport) {
          yield* sentry.captureException(error);
        }

        return yield* Effect.fail(error);
      })
    )
  );

What this gives you:

  • ✅ Type-safe error handling with forced error management
  • ✅ Real-time Sentry alerts with full context
  • ✅ User-friendly error messages with clear actions
  • ✅ Comprehensive wide events for debugging
  • ✅ Circuit breakers preventing cascading failures
  • ✅ Graceful degradation when services fail
  • ✅ Feature flags for safe rollouts

Conclusion: Reliability as a Culture

Building mission control systems isn’t about using the right tools. It’s about adopting a reliability-first mindset:

  • Errors aren’t exceptions. They’re expected outcomes with recovery strategies
  • Users aren’t blamed. They’re informed, guided, and supported
  • Updates aren’t exciting. They’re risky changes requiring careful rollout
  • Observability isn’t logs. It’s comprehensive context for every request

The systems you build today will handle millions of requests tomorrow. Build them like lives depend on it, because for your users’ businesses, they do.

Key Takeaways

  1. Use Effect.ts for typed, trackable error handling that forces you to handle failures
  2. Integrate Sentry deeply with traces, breadcrumbs, and rich context
  3. Craft user messages that explain, don’t blame, and always provide next steps
  4. Emit wide events: one comprehensive log per request beats 20 scattered logs
  5. Update conservatively: stability beats shiny new features every time

Start Building Today

Begin with one layer. Add Effect.ts to your next feature. Implement wide events for your checkout flow. Set up Sentry with proper tracing. Each step compounds into a system you can trust at 3:47 AM.

Your users and your sleep schedule will thank you.


Have questions about implementing mission control systems? Reach out to our engineering team. We’d love to help you build more reliable software.