llm-up-chatbot

UP Diliman graduate student AI demo workspace for a retrieval-grounded chatbot that answers from curated registration notes, cites the supporting sections, and abstains when the notes do not support an answer.

Current State

This workspace is frozen around a simple, testable architecture:

curated markdown notes in docs/
a generated chunk index in data/chunks.json
lexical retrieval over those chunks
a local OpenAI-compatible chat model
a FastAPI browser UI at / with an institutional visual style and direct links to /readme and /registration
a JSON API at /chat
CSV logging of interactions in logs/questions.csv
benchmark reports under output_to_user/benchmark/

The app does not use embeddings, a vector database, live web search, or university system integrations.

Canonical Docs

registration.md - source reference notes for the assistant and benchmark
docs/upd_registration.md - retrieval-optimized main knowledge doc
docs/prerog.md - prerog logistics supplement
docs/form5.md - Form 5 logistics supplement
docs/end-to-end-architecture.md - runtime flow and architecture
docs/local-llm-setup.md - local model configuration
docs/benchmark-framework.md - benchmark contract and scoring rules
docs/benchmark-research-agenda.md - later research questions and label plan
design.md - frozen product direction
mvp-plan.md - frozen implementation snapshot

Local LLM Setup

Verified backend:

Server: llama-swap
Base URL: http://127.0.0.1:8080/v1
Default model alias: gemma-4-26b-q5-chat-vulkan

The current default can be overridden with OPENAI_MODEL, but the workspace is tuned around the 27B chat model.

Smoke Test

curl -sS http://127.0.0.1:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemma-4-26b-q5-chat-vulkan",
    "messages": [
      { "role": "user", "content": "Say hello in one short sentence." }
    ],
    "temperature": 0.2,
    "max_tokens": 32
  }'

Expected result: a short assistant reply, such as Hello!.

Environment Variables

export OPENAI_BASE_URL=http://127.0.0.1:8080/v1
export OPENAI_MODEL=gemma-4-26b-q5-chat-vulkan
export OPENAI_API_KEY=local-dev

Run

cd /projects/llm-up-chatbot
uv venv .venv
uv sync
uv run python ingest.py
uv run uvicorn app:app --reload --host 127.0.0.1 --port 8000

Smoke checks:

curl -sS http://127.0.0.1:8000/health
curl -sS http://127.0.0.1:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{"question":"Kailan ako maglo-lock ng enlistment?"}'
curl -sS http://127.0.0.1:8000/

Output Layout

Benchmark artifacts live under:

output_to_user/benchmark/current/
output_to_user/benchmark/archive/
output_to_user/benchmark/smoke/

A manifest in output_to_user/benchmark/manifest.json maps the previous artifact names to the normalized layout.

Behavior Notes

The assistant uses lexical retrieval over a small curated corpus.
The model is prompted to return a JSON support decision before answering.
The app answers when the context supports the question and returns a fixed abstain response when it does not.
The chatbot is intentionally conservative and should prefer abstention over guessing.