llamacpphtmld

A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.

Features

All configuration should be supplied as environment variables:

LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \
	LCH_NET_BIND=:8090 \
	LCH_SIMULTANEOUS_REQUESTS=1 \
	./llamacpphtmld

The generate endpoint will live stream new tokens into an existing conversation until the LLM stops naturally.

Usage: curl -v -X POST -d '{"Content": "The quick brown fox"}' 'http://localhost:8090/api/v1/generate'
You can optionally supply ConversationID and APIKey string parameters. However, these are not currently used by the server.
You can optionally supply a MaxTokens integer parameter, to cap the number of generated tokens from the LLM.

MIT