llm-up-chatbot
UP Diliman graduate student AI demo workspace for a retrieval-grounded chatbot that answers from curated registration notes, cites the supporting sections, and abstains when the notes do not support an answer.
Current State
This workspace is frozen around a simple, testable architecture:
- curated markdown notes in
docs/ - a generated chunk index in
data/chunks.json - lexical retrieval over those chunks
- a local OpenAI-compatible chat model
- a FastAPI browser UI at
/with an institutional visual style and direct links to/readmeand/registration - a JSON API at
/chat - CSV logging of interactions in
logs/questions.csv - benchmark reports under
output_to_user/benchmark/
The app does not use embeddings, a vector database, live web search, or university system integrations.
Canonical Docs
registration.md- source reference notes for the assistant and benchmarkdocs/upd_registration.md- retrieval-optimized main knowledge docdocs/prerog.md- prerog logistics supplementdocs/form5.md- Form 5 logistics supplementdocs/end-to-end-architecture.md- runtime flow and architecturedocs/local-llm-setup.md- local model configurationdocs/benchmark-framework.md- benchmark contract and scoring rulesdocs/benchmark-research-agenda.md- later research questions and label plandesign.md- frozen product directionmvp-plan.md- frozen implementation snapshot
Local LLM Setup
Verified backend:
- Server:
llama-swap - Base URL:
http://127.0.0.1:8080/v1 - Default model alias:
gemma-4-26b-q5-chat-vulkan
The current default can be overridden with OPENAI_MODEL, but the workspace is tuned around the 27B chat model.
Smoke Test
curl -sS http://127.0.0.1:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "gemma-4-26b-q5-chat-vulkan",
"messages": [
{ "role": "user", "content": "Say hello in one short sentence." }
],
"temperature": 0.2,
"max_tokens": 32
}'
Expected result: a short assistant reply, such as Hello!.
Environment Variables
export OPENAI_BASE_URL=http://127.0.0.1:8080/v1
export OPENAI_MODEL=gemma-4-26b-q5-chat-vulkan
export OPENAI_API_KEY=local-dev
Run
cd /projects/llm-up-chatbot
uv venv .venv
uv sync
uv run python ingest.py
uv run uvicorn app:app --reload --host 127.0.0.1 --port 8000
Smoke checks:
curl -sS http://127.0.0.1:8000/health
curl -sS http://127.0.0.1:8000/chat \
-H 'Content-Type: application/json' \
-d '{"question":"Kailan ako maglo-lock ng enlistment?"}'
curl -sS http://127.0.0.1:8000/
Output Layout
Benchmark artifacts live under:
output_to_user/benchmark/current/output_to_user/benchmark/archive/output_to_user/benchmark/smoke/
A manifest in output_to_user/benchmark/manifest.json maps the previous artifact names to the normalized layout.
Behavior Notes
- The assistant uses lexical retrieval over a small curated corpus.
- The model is prompted to return a JSON support decision before answering.
- The app answers when the context supports the question and returns a fixed abstain response when it does not.
- The chatbot is intentionally conservative and should prefer abstention over guessing.