llm-up-chatbot

Project intent, architecture snapshot, and workspace conventions.

Project document

Project README

llm-up-chatbot

UP Diliman graduate student AI demo workspace for a retrieval-grounded chatbot that answers from curated registration notes, cites the supporting sections, and abstains when the notes do not support an answer.

Current State

This workspace is frozen around a simple, testable architecture:

  • curated markdown notes in docs/
  • a generated chunk index in data/chunks.json
  • lexical retrieval over those chunks
  • a local OpenAI-compatible chat model
  • a FastAPI browser UI at / with an institutional visual style and direct links to /readme and /registration
  • a JSON API at /chat
  • CSV logging of interactions in logs/questions.csv
  • benchmark reports under output_to_user/benchmark/

The app does not use embeddings, a vector database, live web search, or university system integrations.

Canonical Docs

  • registration.md - source reference notes for the assistant and benchmark
  • docs/upd_registration.md - retrieval-optimized main knowledge doc
  • docs/prerog.md - prerog logistics supplement
  • docs/form5.md - Form 5 logistics supplement
  • docs/end-to-end-architecture.md - runtime flow and architecture
  • docs/local-llm-setup.md - local model configuration
  • docs/benchmark-framework.md - benchmark contract and scoring rules
  • docs/benchmark-research-agenda.md - later research questions and label plan
  • design.md - frozen product direction
  • mvp-plan.md - frozen implementation snapshot

Local LLM Setup

Verified backend:

  • Server: llama-swap
  • Base URL: http://127.0.0.1:8080/v1
  • Default model alias: gemma-4-26b-q5-chat-vulkan

The current default can be overridden with OPENAI_MODEL, but the workspace is tuned around the 27B chat model.

Smoke Test

curl -sS http://127.0.0.1:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemma-4-26b-q5-chat-vulkan",
    "messages": [
      { "role": "user", "content": "Say hello in one short sentence." }
    ],
    "temperature": 0.2,
    "max_tokens": 32
  }'

Expected result: a short assistant reply, such as Hello!.

Environment Variables

export OPENAI_BASE_URL=http://127.0.0.1:8080/v1
export OPENAI_MODEL=gemma-4-26b-q5-chat-vulkan
export OPENAI_API_KEY=local-dev

Run

cd /projects/llm-up-chatbot
uv venv .venv
uv sync
uv run python ingest.py
uv run uvicorn app:app --reload --host 127.0.0.1 --port 8000

Smoke checks:

curl -sS http://127.0.0.1:8000/health
curl -sS http://127.0.0.1:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{"question":"Kailan ako maglo-lock ng enlistment?"}'
curl -sS http://127.0.0.1:8000/

Output Layout

Benchmark artifacts live under:

  • output_to_user/benchmark/current/
  • output_to_user/benchmark/archive/
  • output_to_user/benchmark/smoke/

A manifest in output_to_user/benchmark/manifest.json maps the previous artifact names to the normalized layout.

Behavior Notes

  • The assistant uses lexical retrieval over a small curated corpus.
  • The model is prompted to return a JSON support decision before answering.
  • The app answers when the context supports the question and returns a fixed abstain response when it does not.
  • The chatbot is intentionally conservative and should prefer abstention over guessing.