GapSense Developer Documentation

Platform Evolution

Four phases of continuous improvement: from infrastructure hardening to multi-country scaling.
Each phase measured, documented, and battle-tested in production.

Phase 1

Infrastructure Hardening

December 2025

Eliminated session corruption, duplicate processing, and message loss in the SQS-backed worker pipeline. Introduced exception hierarchy, idempotency ledger, and safe requeue ordering.

Reliability

99.9% +34%

Duplicates

0 -100%

SQS Session Factory Idempotency FIFO Queue

src/gapsense/core/models/processing_ledger.py

# Idempotency guard using PostgreSQL INSERT ... ON CONFLICT
from sqlalchemy.dialects.postgresql import insert as pg_insert

stmt = pg_insert(ProcessingLedger).values(
    sqs_message_id=task.message_id,
    task_type=task.task_type,
    student_id=payload.get("student_id"),
).on_conflict_do_nothing(constraint="uq_ledger_msg_task")

result = await db.execute(stmt)
await db.commit()

if result.rowcount == 0:
    # Duplicate message - skip processing
    logger.warning("duplicate_message_skipped")
    return

Phase 2

Hybrid RAG Retrieval

January 2026

Replaced brute-force curriculum dumping with semantic search + prerequisite graph traversal. pgvector cosine similarity (top-k=15) + recursive CTE (depth=2) for surgical node injection.

Accuracy

78% +13%

Nodes Injected

18 -48%

Token Reduction

12.5k -50%

pgvector OpenAI Embeddings IVFFlat Index Recursive CTE

Hybrid RAG Pipeline

flowchart LR IMG[Exercise Book Image] --> DESC[Claude Haiku:
Image Description] DESC --> EMB[OpenAI:
text-embedding-3-small] EMB --> VS[Vector Search:
top_k=15] VS --> SEEDS[Seed Node IDs] SEEDS --> CTE[Recursive CTE:
depth=2] CTE --> PREREQS[Prerequisite
Node IDs] SEEDS --> MERGE[Merge &
Deduplicate] PREREQS --> MERGE MERGE --> JSON[18 Relevant
Nodes as JSON] JSON --> ANALYSIS[ANALYSIS-001
Prompt] style IMG fill:#E8F5E9 style DESC fill:#FFE082 style VS fill:#81C784 style JSON fill:#25D366 style ANALYSIS fill:#4CAF50

Phase 3

Two-Stage OCR + Diagnosis

February 2026

Separated OCR from diagnosis for clarity. Stage 1 (TRANSCRIPTION-001) extracts structured JSON from handwriting. Stage 2 (ANALYSIS-001) diagnoses gaps from clean text + image fallback. Temperature 0.1 for deterministic OCR.

Accuracy

85% +7%

Cost

$0.023 +28%

Stages

2

TRANSCRIPTION-001 Separation of Concerns Graceful Degradation

Stage 1: Pure OCR (No Diagnosis)

# TRANSCRIPTION-001: Extract structured JSON from handwriting
response = await ai_client.generate(
    prompt_id="TRANSCRIPTION-001",
    model="claude-sonnet-4-6",
    temperature=0.1,  # Deterministic OCR
    max_tokens=2048,
    images=[exercise_book_image],
    json_mode=True,
)

transcript = response["transcription_result"]
# Output: {
#   "questions": [
#     {"question_number": "1", "question_text": "Add 1/3 + 1/4",
#      "student_work": "1/3 + 1/4 = 2/7", "teacher_mark": "✗"}
#   ],
#   "overall_legibility": "mostly_legible"
# }

Two-Stage Pipeline Flow

sequenceDiagram participant O as Orchestrator participant AI as Claude Sonnet 4.6 participant DB as AIUsageLog Note over O: Stage 1: TRANSCRIPTION-001 O->>AI: Image + OCR Prompt (temp=0.1) AI-->>O: Structured JSON Transcript O->>DB: Log cost (prompt_id=TRANSCRIPTION-001) Note over O: Use transcript for vector search O->>O: Build query from transcript text O->>O: Semantic search + prerequisite walk Note over O: Stage 2: ANALYSIS-001 O->>AI: Transcript + Image + RAG Nodes AI-->>O: Gap Diagnosis JSON O->>DB: Log cost (prompt_id=ANALYSIS-001) Note over O: Total: 2 AI calls, 2 cost logs

Phase 4

Grade Normalization + Multi-Country

March 2026

Unified grade representations across Ghana, Uganda, Kenya, Nigeria. Canonical B1-B9 format with adjacent-grade filtering (radius=1) for vector search. SQS heartbeat prevents timeout redelivery. Partner config from YAML.

Countries Supported

4

Auto-Normalized

98.5%

Search Precision

+15% Grade Filter

Grade Utils SQS Heartbeat Partner Config Metrics Logging

src/gapsense/core/grade_utils.py

# Multi-country grade normalization
GRADE_MAPS = {
    "ghana": {
        "jhs1": "B7", "jhs2": "B8", "jhs3": "B9",
        "primary 4": "B4", "basic 5": "B5", # ...
    },
    "uganda": {
        "s1": "B7", "s2": "B8", "s3": "B9",
        "p4": "B4", "p5": "B5", # ...
    },
    # Kenya, Nigeria...
}

def adjacent_grades(grade: str, country: str, radius: int = 1) -> list[str]:
    """Return canonical grades within ±radius.

    Example: adjacent_grades("B5", "ghana", radius=1) -> ["B4", "B5", "B6"]
    """
    sequence = GRADE_SEQUENCES[country]
    idx = sequence.index(grade)
    start = max(0, idx - radius)
    end = min(len(sequence), idx + radius + 1)
    return sequence[start:end]

Complete Architecture

7-step pipeline orchestrating AI analysis, hybrid RAG retrieval, and multi-country normalization.
Every step measured, logged, and optimized for Ghana's 3G networks.

End-to-End Analysis Pipeline (Phase 1-4 Integrated)

flowchart TD START[SQS Message
student_id, image_url] --> POLL[WorkerService
Poll & Route] POLL --> DEDUP{Idempotency
Ledger Check} DEDUP -->|Duplicate| SKIP[Skip &
Delete Message] DEDUP -->|New| HB[Start SQS
Heartbeat Loop] HB --> S1[Step 1: Load Student Context
↳ Grade Normalization
↳ Partner Config Lookup] S1 --> S2[Step 2: Fetch Image
↳ S3 Download
↳ Validate Format] S2 --> S3[Step 3: Transcribe Image
↳ TRANSCRIPTION-001
↳ temp=0.1, JSON mode] S3 --> S4[Step 4: Build Curriculum Graph
↳ Vector Search top_k=15
↳ Prerequisite Walk depth=2
↳ Grade Filter ±1 Adjacent] S4 --> S5[Step 5: Render Prompt
↳ Inject Transcript
↳ Inject RAG Nodes
↳ Inject Student Context] S5 --> S6[Step 6: Call AI
↳ ANALYSIS-001
↳ Image + Transcript + Nodes] S6 --> S7[Step 7: Dispatch Results
↳ Save GapProfile
↳ Queue Remediation
↳ Notify Teacher] S7 --> METRICS[Emit Structured Metrics
↳ Latency, Accuracy
↳ Cost, Country] METRICS --> DELETE[Delete SQS Message] DELETE --> DONE[Complete] S3 -.->|Failure| S4 S4 -.->|No embeddings| FALLBACK[Fallback: Code-ordered
SELECT LIMIT 20] style START fill:#E8F5E9 style DEDUP fill:#FFE082 style HB fill:#81C784 style S3 fill:#4CAF50 style S4 fill:#2E7D32 style S6 fill:#1B5E20 style METRICS fill:#FFB74D style DONE fill:#66BB6A

Production Metrics

Real performance data from production deployment in Ghana, Uganda, Kenya, and Nigeria.

85%

Diagnostic Accuracy

Phase 3 two-stage pipeline

103s

Median Analysis Time

70-136s range (P50-P95)

$0.05

Cost per Analysis

Includes remediation exercises

99.9%

Reliability

Phase 1 idempotency guard

18

Avg Nodes Injected

Down from 35 (Phase 2 RAG)

4

Countries Supported

Ghana, Uganda, Kenya, Nigeria

Documentation & Resources

Comprehensive technical specs, architecture docs, and deployment guides.

📚

Architecture Overview

Complete system architecture with diagrams, data models, and design decisions.

Read Architecture →

⚙️

Phase Specifications

Detailed design docs for all 4 phases: requirements, implementation, and testing.

Browse Specs →

🚀

Deployment Guide

AWS ECS Fargate setup, environment variables, and production deployment procedures.

Deploy to AWS →

💻

Source Code

Full source code on GitHub: Python backend, FastAPI, PostgreSQL, pgvector, SQS workers.

View on GitHub →

🎯

Live Demo

Try GapSense live: upload an exercise book photo and get real-time gap analysis.

Try Live Demo →

📊

API Reference

FastAPI endpoints, request/response schemas, authentication, and rate limits.

API Docs →

AI-Powered Learning Gap
Diagnostics for Africa

Try it in 60 seconds

Platform Evolution

Infrastructure Hardening

Hybrid RAG Retrieval

Two-Stage OCR + Diagnosis

Grade Normalization + Multi-Country

Complete Architecture

Production Metrics

Documentation & Resources

Architecture Overview

Phase Specifications

Deployment Guide

Source Code

Live Demo

API Reference

AI-Powered Learning GapDiagnostics for Africa

Try it in 60 seconds

Platform Evolution

Infrastructure Hardening

Hybrid RAG Retrieval

Two-Stage OCR + Diagnosis

Grade Normalization + Multi-Country

Complete Architecture

Production Metrics

Documentation & Resources

Architecture Overview

Phase Specifications

Deployment Guide

Source Code

Live Demo

API Reference

AI-Powered Learning Gap
Diagnostics for Africa