Elysium Operations Platform

The challenge

Elysium manages a number of hosted client applications across a fleet of AWS instances. Day-to-day, that means engineers need quick answers to questions that span several different tools: what version is deployed on that server? Are there any known vulnerabilities in the current gem set? What is the status of this support ticket? Where is the documentation for this process?

The answers exist - but they live in different places. Redmine holds the support queue and time tracking. Mattermost is where the team communicates. Outline is the documentation wiki. Infrastructure status requires either SSH access or checking deployment logs. Getting a clear operational picture means switching between systems constantly, which is both slow and error-prone.

We also wanted to explore how AI could be integrated into real engineering workflows - not as a novelty, but as a tool that could actually read from and write to the systems the team uses every day. The operations platform became both a practical solution to the visibility problem and a proving ground for our AI integration approach.

Our approach

The platform is built on Rails 8 with Hotwire for real-time UI updates, and divides into two distinct but connected systems: infrastructure monitoring and AI-assisted operations.

Infrastructure monitoring

The monitoring layer - built around an internal module called Hookd - polls each registered application instance on a configurable schedule. Each poll makes an authenticated HTTPS request to the target server, which responds with structured status data: the current revision and branch, the Ruby version in use, the full Gemfile.lock, and any environment variables relevant to the deployment context.

That response is parsed and persisted as a typed snapshot against the instance record. The result is a live, timestamped view of what is actually running on each server - not what a deployment script says should be running. Engineers can see at a glance whether instances are on the expected revision, whether their dependency sets diverge, and when they were last updated.

Vulnerability scanning runs as a separate background process via a bundle audit integration. Each instance’s Gemfile.lock is checked against a vulnerability database; any findings are stored against the status record with CVE identifiers, criticality ratings, and remediation guidance. Security posture is visible alongside deployment status in the same interface.

Status polling jobs are processed by SolidQueue and push updates to the browser in real time via Turbo Streams, so engineers see results as they arrive rather than waiting for a full page reload.

AI agent system

The second system is a multi-agent AI architecture built on top of AWS Bedrock via a private Elysium API gateway. Rather than a single general-purpose assistant, the platform uses 5 specialised agents - each with a defined scope, calibrated temperature, and a set of callable tools:

AssistantAgent - the conversational entry point, capable of delegating to the other agents as tools
RedmineAgent - queries the support tracker, logs time entries, and adds private notes to issues
MattermostAgent - fetches thread context and user information from the team’s messaging system
OutlineAgent - searches the documentation wiki, and can create or update documents directly
StandupAgent / SummariseAgent - generates daily standup summaries and condenses long conversation histories

The agents are implemented using activeagent, a ruby gem that provides an ActiveRecord-like interface for LLM interactions. Each agent maintains a full conversation context through a polymorphic AgentContext model that tracks every message, every API call, and the token count across the session. When a conversation approaches the model’s context limit - at 20,000 input tokens - the system automatically summarises older turns and compresses the history without losing continuity.

Access to Redmine and Outline is scoped to individual user credentials. Each team member stores their own API keys against their profile (encrypted at rest via Active Record Encryption), so AI actions - logging time, creating notes, updating documents - happen under their identity rather than a shared service account. There is a full audit trail of what was done and by whom.

Mattermost integration works in both directions: the AI can read thread context to answer questions grounded in recent team conversation, and scheduled jobs post absence notifications and daily reminders to the relevant channels automatically.

The outcome

The platform gives Elysium engineers a single place to check the health of every hosted instance, track outstanding support work, and interact with the documentation system - without switching between 4 separate tools.

The AI layer goes beyond question-answering. Because the agents can write as well as read - logging a time entry to Redmine, creating a document in Outline, posting to a Mattermost channel - it acts as an operational assistant rather than a search interface. Engineers can ask it to record time against a ticket, draft documentation from a conversation, or summarise a thread, and the action happens immediately in the relevant system.

Building this platform internally has been valuable beyond the operational benefit. The multi-agent architecture, the activeagent abstraction, and the approach to per-user credential scoping have directly informed how we now approach AI integration projects for clients - with the understanding that AI tools work best when they can act across the systems a team already uses, under the identities of the people using them.

Infrastructure visibility and AI, unified.

The challenge

Our approach

Infrastructure monitoring

AI agent system

The outcome

Ready to build something great?

Infrastructure visibility and AI, unified._

The challenge

Our approach

Infrastructure monitoring

AI agent system

The outcome

Ready to build something great?_

Infrastructure visibility and AI, unified.

Ready to build something great?