Podlink: Real-Time Group Accountability with Next.js and Socket.IO
Back to Log
Next.jsSocket.IOPostgreSQLReal-time Systems

Podlink: Real-Time Group Accountability with Next.js and Socket.IO

5 min read
Next.js

Problem Context

The goal was to create small accountability groups (4-6 people) focused on specific habits (quitting smoking, waking up early, etc.). The system needed:

  • Real-time messaging (to create a sense of presence)
  • Group state synchronization (typing indicators, online status)
  • Automated moderation via AI to reduce manual oversight
  • Persistence for messages and user progress

Constraint: This had to work reliably with the free tier limitations of hosting providers (Vercel, Neon Postgres).

Initial Approach

The first version used HTTP polling every 2 seconds to fetch new messages. This worked but had problems:

What didn't work:

  • Polling delay felt sluggish (2-second lag before seeing new messages)
  • Server costs would scale poorly with more users
  • No way to show "User is typing..." without spamming requests
  • Connection state was ambiguous (was the user offline or just slow to poll?)

The system needed WebSockets for true real-time communication.

Design Decisions & Trade-offs

Real-Time Layer: Socket.IO on Next.js Custom Server

Next.js doesn't support WebSockets natively in the App Router. Options:

  1. Separate WebSocket server: More flexible but adds deployment complexity
  2. Custom Next.js server with Socket.IO: Keeps everything in one codebase

We chose Option 2 (custom server) because deployment simplicity mattered more than perfect separation of concerns.

Trade-off: This meant ejecting from Vercel's serverless model and self-hosting on Railway. Loss of automatic scaling, but gained WebSocket support.

State Synchronization Challenge

Each "Pod" (group) needed to track:

  • Who is currently online
  • Who is typing
  • Unread message counts per user

Problem: WebSocket connections are stateless. If a user refreshes the page, the server loses their connection context.

Solution:

  • On connection, client sends user_id and pod_id
  • Server maintains an in-memory mapping: { socket_id -> {user_id, pod_id} }
  • On disconnect, remove mapping and broadcast updated online status

What went wrong initially: Rapid reconnections (user switching tabs, poor network) caused duplicate entries in the mapping. Solution: debounce disconnect events by 3 seconds before marking user offline.

Database Schema: PostgreSQL with Prisma

The schema evolved significantly during development.

Initial schema (too simple):

model Message {
  id      String
  content String
  userId  String
  podId   String
}

This didn't handle:

  • Deleted users (orphan messages)
  • Pod disbanding (what happens to messages?)
  • Message reactions or edits

Final schema (more rigid):

model Message {
  id        String   @id
  content   String
  createdAt DateTime @default(now())
  
  author    User     @relation(fields: [authorId], references: [id], onDelete: Cascade)
  authorId  String
  
  pod       Pod      @relation(fields: [podId], references: [id], onDelete: Cascade)
  podId     String
}

The onDelete: Cascade ensures that if a Pod is deleted, all messages are removed automatically. This prevented orphan data.

Schema migration challenge: Adding the cascade delete required a migration on production data. We had to manually reassign messages with missing podId before the migration could run.

AI Moderation: Llama 3 via OpenRouter

Each Pod has an AI"member" powered by Llama 3. It's configured to intervene when:

  • Engagement drops (no messages for 2 hours)
  • Conflict arises (detected via sentiment analysis)

Configuration:

  • Temperature: 0.1 (consistency over creativity)
  • Max tokens: 150 (short, focused responses)

What didn't work: The AI sometimes misread sarcasm as conflict. We added a "humor flag" to messages where users can mark sarcasm, preventing false positives.

Implementation Notes

Connection Status UI

Users needed to know if they were truly connected or just seeing stale data.

Visual states:

  • 🟡 Connecting...
  • 🟢 Connected
  • 🔴 Disconnected

The status badge updates based on Socket.IO's built-in connection events. We added a "retry connection" button that appears after 10 seconds of disconnection.

Message Persistence vs Real-Time

Messages are:

  1. Sent via WebSocket immediately (for real-time feel)
  2. Persisted to Postgres in the background
  3. Acknowledged back to the sender once saved

Trade-off: If the database write fails (rare but possible), the message appears in the UI but isn't saved. We added a "pending" indicator that clears once the database confirms the write.

Results & Impact

The platform supports 40 active Pods with ~200 total users. Observations:

  • Average message latency: <100ms
  • Connection drop rate: ~2% (mostly due to poor mobile networks)
  • AI intervention is used in ~30% of Pods (more than expected)

What stayed hard:

  • Handling users who join multiple Pods (managing multiple WebSocket subscriptions)
  • Time zones for scheduled check-ins (storing user timezone was added later)
  • Scaling Socket.IO beyond a single server instance (not yet a problem but will need clustering)

What I Would Change

If I were to rebuild this with current knowledge:

  1. Use Supabase Realtime instead of Socket.IO: Would simplify state synchronization and leverage database subscriptions
  2. Add message queuing: Right now message persistence is synchronous. A queue would make failures more manageable.
  3. Better connection recovery: Currently, if a user disconnects and reconnects, they might see duplicate messages. Need idempotency keys.

Takeaways

Building real-time social apps taught me that low latency isn't just about speed—it's about perceived responsiveness. A 2-second delay in seeing a message kills conversation flow.

State synchronization is harder than it looks. The in-memory socket mapping worked for our scale, but wouldn't survive a server restart. Proper production would need Redis or similar.

For anyone building WebSocket apps on Next.js: plan your deployment strategy early. The custom server requirement eliminates some hosting options, and that constraint affects architecture decisions.

Share this article