ai-integrationstarter-templatescase-study

From ChatGPT to Production: Reproducing a Dining Microapp Using Claude and OpenAI

UUnknown

2026-01-22

11 min read

Recreate a dining microapp using Claude + ChatGPT—starter repo, prompts, architecture and LLM-generated tests for rapid, production-ready prototypes.

Hook: Stop waiting weeks for prototypes — ship a dining microapp in days using Claude + ChatGPT

Decision fatigue, slow iteration cycles, and brittle integration points slow teams down. The rise of microapps — single-purpose, fast-built apps like Rebecca Yu’s Where2Eat — shows a better way: combine modern LLMs for logic, UX text and automated tests, then wire them into a reproducible starter repo. In this guide (2026 edition) you’ll get a ready-to-run architecture, code samples, prompt templates and a GitHub Actions workflow that reproduces a dining microapp that uses Claude and ChatGPT together for rapid prototyping and production readiness.

Why this matters in 2026

By late 2025 and into 2026 the ecosystems for large language models matured from “clever assistants” into platform components. Anthropic’s Cowork and Claude Code made it easy for non-developers to orchestrate file-system and automation tasks; OpenAI expanded multimodal and function-calling tooling for production workflows. That means teams can:

Move from mock-to-prod in days with LLM-powered decision logic.
Automate UX writing and localization with deterministic prompts and templates.
Auto-generate test suites and CI verification from the same prompts that define behavior — keeping tests synchronized with intent.

What you’ll build (and why reproducibility matters)

This article recreates the dining microapp case study with a reproducible starter repo and an architecture diagram. The app suggests restaurants for a group of friends based on preferences, constraints (price, distance, cuisine), and conversation context. Key responsibilities given to LLMs:

Conversation & UX text: Polished prompts, suggestions, and onboarding copy.
Decision logic: Ranking restaurants and resolving tie-breakers.
Test generation: Unit and integration tests auto-created from the spec and stitched into CI.

Architecture diagram (reproducible)

Below is a compact SVG architecture diagram you can copy into a repo README or design doc. It maps how frontend, backend, Claude and ChatGPT interact with external APIs and CI. If you want richer, editable diagrams for your README, consider visual editors that integrate infrastructure diagrams and docs-as-code like Compose.page.

Starter repo layout (reproducible)

Clone and run the following starter layout. The structure keeps LLM prompts and prompt tests in the repo so behavior is auditable.

repo-root/
├─ README.md
├─ package.json
├─ .env.example
├─ src/
│  ├─ client/        # React microapp
│  ├─ api/           # serverless endpoints (rank, search, webhook)
│  ├─ llm/           # prompt templates + adapters
│  │   ├─ claudeAdapter.js
│  │   ├─ openaiAdapter.js
│  │   └─ prompts/
│  ├─ tests/         # auto-generated tests live here
│  └─ lib/           # ranking functions, types
└─ .github/workflows/ci.yml

Quick start (run locally)

git clone https://github.com/your-org/dining-microapp-starter.git
cd dining-microapp-starter && npm install
Copy .env.example to .env and populate OPENAI_API_KEY, CLAUDE_API_KEY, and PLACES_API_KEY.
npm run dev (starts frontend on :3000 and serverless on :8787)

How we split responsibilities between Claude and ChatGPT

In this architecture we use both LLMs for complementary strengths:

Claude — synthesis, summarization and policy-like reasoning. Use it for: generating summary cards, multi-party preference synthesis (e.g., "group likes sushi + cheap + under 20 mins"), and generating explanatory UX copy that remains consistent across locales.
ChatGPT (OpenAI) — direct conversational turn-taking, real-time function-calling and deterministic outputs. Use it for: structured decision calls (via function-calling), step-by-step dialog management, and generating tests using explicit JSON schemas.

This ensemble reduces hallucination risk (Claude for human-friendly synthesis; ChatGPT for strict JSON outputs) and lets you tune temperature and system prompts independently.

Example: ranking function using LLMs

We keep the ranking function deterministic by combining a local scoring function with an LLM “tie-breaker” that returns an ordered list only when scores are equal. The API flow:

API fetches candidate restaurants from Places API.
Local scoring calculates numeric score (distance, price, rating).
If top scores are tied or near-tie, call ChatGPT with a function-call schema to return a final ranked list (strict JSON).
If synopsis is needed for UI cards, call Claude to produce a 2-line human summary per place.

Sample local score (Node)

export function baseScore(place, prefs) {
  // simplistic: normalize rating, distance, price
  const rating = (place.rating || 3) / 5;
  const price = 1 - ((place.price_level || 2) / 4); // higher is cheaper
  const dist = 1 - Math.min(place.distance_km / 10, 1);
  const cuisineBoost = prefs.cuisines.includes(place.cuisine) ? 0.1 : 0;
  return Math.round((rating * 0.5 + price * 0.2 + dist * 0.25 + cuisineBoost) * 100);
}

ChatGPT function-call request (tie-breaker)

We use OpenAI function-calling to force a specific JSON output. Set temperature to 0.0 for deterministic ordering.

const response = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'gpt-4o-mini',
    temperature: 0.0,
    messages: [
      { role: 'system', content: 'You are a strict ranking assistant that outputs valid JSON.' },
      { role: 'user', content: `Tie-break among these places: ${JSON.stringify(candidates)}` }
    ],
    functions: [{ name: 'rank_places', description: 'Return ordered ids', parameters: {
      type: 'object', properties: { orderedIds: { type: 'array', items: { type: 'string' } } }, required: ['orderedIds']
    }}]
  })
});

Claude usage: deterministic UX text

For human-friendly copy and summaries we keep prompts constrained and test outputs. Here’s an example call to an Anthropic-style endpoint to generate short summaries:

const res = await fetch('https://api.anthropic.com/v1/complete', {
  method: 'POST', headers: { 'x-api-key': process.env.CLAUDE_API_KEY, 'Content-Type': 'application/json' },
  body: JSON.stringify({ model: 'claude-2.1', prompt: `Summarize the following restaurant in 2 lines for a mobile card:\n\n${place.description}`, max_tokens: 80 })
});

Auto-generating tests from prompts

One of the most impactful 2026 practices is having the LLM produce test suites that map directly to the conversational spec. We ask ChatGPT to output Jest tests that assert ranking behavior and edge cases. The output is committed to src/tests/ and executed in CI.

Prompt example: generate tests

System: You will generate Jest tests for the ranking module. Output only a JS file with valid tests.
User: Given these cases (preferences, candidate lists), produce tests asserting that baseScore and tie-break function produce the expected ordering.

When generated, the tests are validated by a lint job and run in CI. If a generated test fails because the logic changed, the failing test becomes a signal that the prompt or code must be updated — creating a tight feedback loop.

CI pipeline (GitHub Actions) — enforce reproducibility

Include this condensed workflow in .github/workflows/ci.yml:

name: CI
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: node-version: '20'
      - run: npm ci
      - run: npm run generate-tests # uses LLM to refresh tests (optional safe-mode)
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
      - run: npm test

If you want a deeper read on how modern newsrooms and publishing teams wire CI and edge delivery into their release pipelines, see this practical writeup on newsroom delivery and billing strategies: Newsrooms built for 2026. For instrumenting LLM-generated tests and runtime observability, pair the CI flow with an observability playbook that covers sequence diagrams to runtime validation.

Hardening for production

LLMs introduce variability. To productionize the microapp, apply these hardening steps:

Enforce deterministic outputs — use function-calling with strict JSON schemas and low temperature for decision points.
Cache and validate — cache LLM outputs and validate JSON against schemas before using them in the UI.
Safety filters — run a short deterministic safety check for any user-facing text (Claude outputs) to avoid hallucinated claims (e.g., “we can seat 6” should be verified against source data). See approaches to augmented oversight for supervised systems at the edge.
Cost controls — fall back to local heuristics when the LLM quota is exhausted; track per-call spend with logging. For a broader look at pricing and consumption models in the cloud, read this piece on cloud cost optimization.
Audit logs — store LLM prompts, responses and versioned model metadata for compliance and repro. For chain-of-custody strategies and legal-grade audit trails, see this guide on chain of custody in distributed systems.

Example: fallback policy (pseudocode)

async function getRankedPlaces(prefs) {
  const candidates = await fetchPlaces(prefs);
  const scored = candidates.map(c => ({...c, score: baseScore(c, prefs)}));
  const top = pickTopN(scored, 5);
  try {
    const ordering = await callChatGptTieBreaker(top);
    return ordering;
  } catch (err) {
    // fallback deterministic sort
    console.warn('LLM tie-break failed', err);
    return top.sort((a,b) => b.score - a.score);
  }
}

Real-world example & lessons from Where2Eat

Rebecca Yu’s Where2Eat (case study) is a great microapp example: a focused problem, shipped quickly. Key lessons to replicate:

Scope tightly. A single feature (group choice) turns into a credible product in days.
Use the right LLM for the job. Synthesis vs. strict outputs — pick accordingly.
Iterate prompts like code. Store them in source control and evolve with tests.

"Once vibe-coding apps emerged, I started hearing about people with no tech backgrounds successfully building their own apps." — Rebecca Yu

Advanced strategies (2026 trends)

Leverage these advanced approaches to push microapps toward production-grade systems:

Agentic workflows for offline automation: New desktops and agent features (e.g., Anthropic Cowork-style tools) let you authorize file-system actions for multi-step automation: bulk test generation, dataset augmentation and data labeling — all driven by the same repo prompts. See practical field playbooks for edge and micro-event automation in this Field Playbook 2026.
Prompt-as-contract: Treat prompts as contract files that define expected behavior. Combine with LLM-generated tests to create an auditable spec — similar principles to docs-as-code workflows.
Model ensembles for resilience: Use Claude for long-form analysis of multi-party inputs and ChatGPT for enforcing structured outputs; add a small local LLM or heuristic for fallback.
Observability: Record per-call metadata (model, parameters, tokens, latency) in logs to tune cost vs. quality over time. Pair this with an observability playbook for workflow microservices to make runtime validation actionable (Observability for workflow microservices).

Checklist: take this repo from prototype to production

[ ] Add rate-limiting and request quotas on serverless endpoints.
[ ] Instrument LLM calls with tracing and logging (OpenTelemetry).
[ ] Add a gated review for prompts that change production behavior.
[ ] Add periodic re-run of test generation in safe-mode to surface drift.
[ ] Store LLM responses in an append-only audit log with redaction for PII.

Actionable takeaways

Start with a tight scope: limit to one core flow (e.g., pick-a-restaurant) and wire LLMs for only the ambiguous decisions.
Use function-calling: guarantee structured outputs for any decision that touches business logic.
Keep prompts in code: version them, test them, and review diffs like code changes. (See docs-as-code patterns.)
Automate tests via LLMs: they'll produce edge cases you might miss; fail-fast in CI to maintain trust.

Where to go next (repo resources)

In the accompanying starter repository you’ll find:

Prompt templates for Claude and ChatGPT in src/llm/prompts.
Adapters for both APIs (claudeAdapter.js, openaiAdapter.js).
A safe-mode test generator that runs locally without committing outputs.
Example GitHub Actions workflow to run generated tests and block merges on failures.

Final notes: ethics, cost and UX

Microapps scale quickly because they solve focused problems. But when you move from hobby to team usage, consider:

Privacy: PII must not be sent to LLMs without consent and masking.
Transparency: Inform users when recommendations include AI-generated content.
Budget guardrails: Automated usage can balloon costs; use quotas and fallbacks. Read more about cloud cost strategies in cloud cost optimization.

Conclusion & call-to-action

Reproducing a dining microapp with Claude and ChatGPT shows how modern LLMs accelerate app creation while keeping production controls intact. Use the starter repo pattern in this article to instrument prompts, tests and CI, and treat prompts as first-class, versioned artifacts. If you want a jumpstart, clone the reproducible repository (link in the accompanying post), run the quickstart and open a PR with your first prompt tweak — your changes will generate tests automatically and prove the feature in CI.

Take action: Clone the starter repo, add your Places API key, and open a PR with one prompt improvement. We’ll review prompt diffs like code and merge only when the generated tests pass — that’s how microapps become production-ready.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.