πŸš€ Built for UNICEF StartUp Lab Cohort 6

AI-Powered Learning Gap
Diagnostics for Africa

From single-stage vision analysis to hybrid RAG + two-stage OCR.
Built for Ghana's 3G networks. Powered by Claude Sonnet 4.6.

85% Accuracy
70-136s Analysis Time
$0.05 Cost per Analysis
4 Countries

Try it in 60 seconds

Upload an exercise book photo and get AI-powered gap analysis + remediation exercises

$ curl -X POST https://gapsense.org/demo/api/upload-image \
-F "teacher_phone=+233501234567" \
-F "image=@exercise_book.jpg"
{"status": "success", "message": "Analysis started"}
# Get results at: /demo/reports/+233501234567

Platform Evolution

Four phases of continuous improvement: from infrastructure hardening to multi-country scaling.
Each phase measured, documented, and battle-tested in production.

Phase 1

Infrastructure Hardening

December 2025

Eliminated session corruption, duplicate processing, and message loss in the SQS-backed worker pipeline. Introduced exception hierarchy, idempotency ledger, and safe requeue ordering.

Reliability
99.9% +34%
Duplicates
0 -100%
SQS Session Factory Idempotency FIFO Queue
src/gapsense/core/models/processing_ledger.py
# Idempotency guard using PostgreSQL INSERT ... ON CONFLICT
from sqlalchemy.dialects.postgresql import insert as pg_insert

stmt = pg_insert(ProcessingLedger).values(
    sqs_message_id=task.message_id,
    task_type=task.task_type,
    student_id=payload.get("student_id"),
).on_conflict_do_nothing(constraint="uq_ledger_msg_task")

result = await db.execute(stmt)
await db.commit()

if result.rowcount == 0:
    # Duplicate message - skip processing
    logger.warning("duplicate_message_skipped")
    return
Phase 2

Hybrid RAG Retrieval

January 2026

Replaced brute-force curriculum dumping with semantic search + prerequisite graph traversal. pgvector cosine similarity (top-k=15) + recursive CTE (depth=2) for surgical node injection.

Accuracy
78% +13%
Nodes Injected
18 -48%
Token Reduction
12.5k -50%
pgvector OpenAI Embeddings IVFFlat Index Recursive CTE
Hybrid RAG Pipeline
flowchart LR IMG[Exercise Book Image] --> DESC[Claude Haiku:
Image Description] DESC --> EMB[OpenAI:
text-embedding-3-small] EMB --> VS[Vector Search:
top_k=15] VS --> SEEDS[Seed Node IDs] SEEDS --> CTE[Recursive CTE:
depth=2] CTE --> PREREQS[Prerequisite
Node IDs] SEEDS --> MERGE[Merge &
Deduplicate] PREREQS --> MERGE MERGE --> JSON[18 Relevant
Nodes as JSON] JSON --> ANALYSIS[ANALYSIS-001
Prompt] style IMG fill:#E8F5E9 style DESC fill:#FFE082 style VS fill:#81C784 style JSON fill:#25D366 style ANALYSIS fill:#4CAF50
Phase 3

Two-Stage OCR + Diagnosis

February 2026

Separated OCR from diagnosis for clarity. Stage 1 (TRANSCRIPTION-001) extracts structured JSON from handwriting. Stage 2 (ANALYSIS-001) diagnoses gaps from clean text + image fallback. Temperature 0.1 for deterministic OCR.

Accuracy
85% +7%
Cost
$0.023 +28%
Stages
2
TRANSCRIPTION-001 Separation of Concerns Graceful Degradation
Stage 1: Pure OCR (No Diagnosis)
# TRANSCRIPTION-001: Extract structured JSON from handwriting
response = await ai_client.generate(
    prompt_id="TRANSCRIPTION-001",
    model="claude-sonnet-4-6",
    temperature=0.1,  # Deterministic OCR
    max_tokens=2048,
    images=[exercise_book_image],
    json_mode=True,
)

transcript = response["transcription_result"]
# Output: {
#   "questions": [
#     {"question_number": "1", "question_text": "Add 1/3 + 1/4",
#      "student_work": "1/3 + 1/4 = 2/7", "teacher_mark": "βœ—"}
#   ],
#   "overall_legibility": "mostly_legible"
# }
Two-Stage Pipeline Flow
sequenceDiagram participant O as Orchestrator participant AI as Claude Sonnet 4.6 participant DB as AIUsageLog Note over O: Stage 1: TRANSCRIPTION-001 O->>AI: Image + OCR Prompt (temp=0.1) AI-->>O: Structured JSON Transcript O->>DB: Log cost (prompt_id=TRANSCRIPTION-001) Note over O: Use transcript for vector search O->>O: Build query from transcript text O->>O: Semantic search + prerequisite walk Note over O: Stage 2: ANALYSIS-001 O->>AI: Transcript + Image + RAG Nodes AI-->>O: Gap Diagnosis JSON O->>DB: Log cost (prompt_id=ANALYSIS-001) Note over O: Total: 2 AI calls, 2 cost logs
Phase 4

Grade Normalization + Multi-Country

March 2026

Unified grade representations across Ghana, Uganda, Kenya, Nigeria. Canonical B1-B9 format with adjacent-grade filtering (radius=1) for vector search. SQS heartbeat prevents timeout redelivery. Partner config from YAML.

Countries Supported
4
Auto-Normalized
98.5%
Search Precision
+15% Grade Filter
Grade Utils SQS Heartbeat Partner Config Metrics Logging
src/gapsense/core/grade_utils.py
# Multi-country grade normalization
GRADE_MAPS = {
    "ghana": {
        "jhs1": "B7", "jhs2": "B8", "jhs3": "B9",
        "primary 4": "B4", "basic 5": "B5", # ...
    },
    "uganda": {
        "s1": "B7", "s2": "B8", "s3": "B9",
        "p4": "B4", "p5": "B5", # ...
    },
    # Kenya, Nigeria...
}

def adjacent_grades(grade: str, country: str, radius: int = 1) -> list[str]:
    """Return canonical grades within Β±radius.

    Example: adjacent_grades("B5", "ghana", radius=1) -> ["B4", "B5", "B6"]
    """
    sequence = GRADE_SEQUENCES[country]
    idx = sequence.index(grade)
    start = max(0, idx - radius)
    end = min(len(sequence), idx + radius + 1)
    return sequence[start:end]

Complete Architecture

7-step pipeline orchestrating AI analysis, hybrid RAG retrieval, and multi-country normalization.
Every step measured, logged, and optimized for Ghana's 3G networks.

End-to-End Analysis Pipeline (Phase 1-4 Integrated)
flowchart TD START[SQS Message
student_id, image_url] --> POLL[WorkerService
Poll & Route] POLL --> DEDUP{Idempotency
Ledger Check} DEDUP -->|Duplicate| SKIP[Skip &
Delete Message] DEDUP -->|New| HB[Start SQS
Heartbeat Loop] HB --> S1[Step 1: Load Student Context
↳ Grade Normalization
↳ Partner Config Lookup] S1 --> S2[Step 2: Fetch Image
↳ S3 Download
↳ Validate Format] S2 --> S3[Step 3: Transcribe Image
↳ TRANSCRIPTION-001
↳ temp=0.1, JSON mode] S3 --> S4[Step 4: Build Curriculum Graph
↳ Vector Search top_k=15
↳ Prerequisite Walk depth=2
↳ Grade Filter Β±1 Adjacent] S4 --> S5[Step 5: Render Prompt
↳ Inject Transcript
↳ Inject RAG Nodes
↳ Inject Student Context] S5 --> S6[Step 6: Call AI
↳ ANALYSIS-001
↳ Image + Transcript + Nodes] S6 --> S7[Step 7: Dispatch Results
↳ Save GapProfile
↳ Queue Remediation
↳ Notify Teacher] S7 --> METRICS[Emit Structured Metrics
↳ Latency, Accuracy
↳ Cost, Country] METRICS --> DELETE[Delete SQS Message] DELETE --> DONE[Complete] S3 -.->|Failure| S4 S4 -.->|No embeddings| FALLBACK[Fallback: Code-ordered
SELECT LIMIT 20] style START fill:#E8F5E9 style DEDUP fill:#FFE082 style HB fill:#81C784 style S3 fill:#4CAF50 style S4 fill:#2E7D32 style S6 fill:#1B5E20 style METRICS fill:#FFB74D style DONE fill:#66BB6A

Production Metrics

Real performance data from production deployment in Ghana, Uganda, Kenya, and Nigeria.

85%
Diagnostic Accuracy
Phase 3 two-stage pipeline
103s
Median Analysis Time
70-136s range (P50-P95)
$0.05
Cost per Analysis
Includes remediation exercises
99.9%
Reliability
Phase 1 idempotency guard
18
Avg Nodes Injected
Down from 35 (Phase 2 RAG)
4
Countries Supported
Ghana, Uganda, Kenya, Nigeria

Documentation & Resources

Comprehensive technical specs, architecture docs, and deployment guides.

πŸ“š

Architecture Overview

Complete system architecture with diagrams, data models, and design decisions.

Read Architecture β†’
βš™οΈ

Phase Specifications

Detailed design docs for all 4 phases: requirements, implementation, and testing.

Browse Specs β†’
πŸš€

Deployment Guide

AWS ECS Fargate setup, environment variables, and production deployment procedures.

Deploy to AWS β†’
πŸ’»

Source Code

Full source code on GitHub: Python backend, FastAPI, PostgreSQL, pgvector, SQS workers.

View on GitHub β†’
🎯

Live Demo

Try GapSense live: upload an exercise book photo and get real-time gap analysis.

Try Live Demo β†’
πŸ“Š

API Reference

FastAPI endpoints, request/response schemas, authentication, and rate limits.

API Docs β†’