How I Built a RAG Pipeline for MTG Card Recommendations

Magic: The Gathering has over 27,000 unique cards. When you're building a Commander deck, the combinatorial space is staggering. I wanted to build something that could reason about card synergies, not just text-match, but actually understand interactions.

That's where RAG (Retrieval-Augmented Generation) comes in.

The Problem

Existing MTG recommendation tools fall into two camps:

Keyword-based search: finds cards that mention the same mechanics, but misses emergent synergies
Community-curated lists: great quality, but static and biased toward popular strategies

I wanted something in between: an AI that could retrieve relevant cards from the full catalogue and reason about why they work together.

Architecture Overview

The pipeline has four stages:

Ingest: Bulk card data from Scryfall API, enriched with rulings and format legality
Embed: Generate vector embeddings for each card using text-embedding-3-small
Store: Index embeddings in a vector database (Supabase pgvector)
Query: User asks a natural language question, we retrieve relevant cards, then pass them as context to an LLM

User Query
    ↓
Embedding (text-embedding-3-small)
    ↓
Vector Search (pgvector, top-k=20)
    ↓
Context Assembly (card text + rulings)
    ↓
LLM Response (Claude)
    ↓
Formatted Recommendations

The Embedding Strategy

The naive approach (embed the card name and rules text) produced poor results. Cards with similar wording but different strategic roles would cluster together.

What worked better: embedding a synthetic description that combines the card's mechanical identity with its strategic role:

function buildEmbeddingText(card: MtgCard): string {
  const parts = [
    card.name,
    card.type_line,
    card.oracle_text,
    `Mana cost: ${card.mana_cost}`,
    card.keywords.length > 0
      ? `Keywords: ${card.keywords.join(", ")}`
      : "",
    card.edhrec_rank
      ? `Commander popularity rank: ${card.edhrec_rank}`
      : "",
  ];
  return parts.filter(Boolean).join(". ");
}

This gave us ~40% better retrieval relevance on our evaluation set.

RAG Transparency

One design principle I was firm on: show the retrieved documents. When the chatbot recommends a card, you can see exactly which cards were retrieved from the vector store and which the LLM added from its training data. This builds trust and makes the system debuggable.

In the UI, retrieved cards appear in a collapsible panel before the LLM's response. Users can verify the recommendations against the raw source material.

Rate Limiting and Cost Control

Running this as a public demo means cost control matters. My approach:

Session-based rate limiting: 10 queries per session via a simple counter in a cookie
Embedding cache: Common queries (commander staples, popular archetypes) hit a Redis cache before the embedding API
Fallback: If the API budget is exceeded or the service is down, the UI switches to a pre-recorded demo video

Total cost: roughly $5-8/month with moderate traffic.

What I Learned

Embedding quality matters more than LLM choice. Swapping from GPT-4 to Claude made almost no difference in output quality. Improving the embedding text improved everything.
RAG transparency builds trust. Users engaged more when they could see the retrieval step.
pgvector is surprisingly capable. For a dataset under 50k documents, it handles similarity search with low latency and no operational overhead beyond a Postgres instance.
Domain knowledge in the prompt beats general-purpose prompting. Telling the LLM about Commander-specific rules (colour identity, singleton format) dramatically improved recommendation quality.

You can try the live demo at /mtg-rag or browse the source code for the bulk scraper and recommendation engine.