A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.
doc | ||
api.go | ||
cflags_linux_amd64.go | ||
cflags_linux_arm64.go | ||
ggml.c | ||
ggml.h | ||
go.mod | ||
go.sum | ||
LICENSE | ||
llama.cpp | ||
llama.h | ||
main.go | ||
README.md | ||
webui.go |
llamacpphtmld
A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.
Features
- Live streaming responses
- Continuation-based UI, supporting interrupt, modify, and resume
- Configure the maximum number of simultaneous users
- Works with any LLaMA model including Vicuna
- Bundled copy of llama.cpp, no separate compilation required
Usage
All configuration should be supplied as environment variables:
LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \
LCH_NET_BIND=:8090 \
LCH_SIMULTANEOUS_REQUESTS=1 \
./llamacpphtmld
API usage
curl -v -d '{"ConversationID": "", "APIKey": "", "Content": "The quick brown fox"}' -X 'http://localhost:8090/api/v1/generate'
License
MIT