A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.
Go to file
2023-04-09 11:12:16 +12:00
doc doc/README: initial commit 2023-04-08 15:30:37 +12:00
.gitignore gitignore 2023-04-08 15:31:24 +12:00
api.go api: raise default context size from 512->1024 2023-04-09 11:12:16 +12:00
cflags_linux_amd64.go initial commit 2023-04-08 15:30:15 +12:00
cflags_linux_arm64.go cflags/arm64: fix mcpu flag syntax 2023-04-08 16:07:58 +12:00
ggml.c llama.cpp: commit upstream files (as of rev 62cfc54f77e5190) 2023-04-08 15:30:02 +12:00
ggml.h llama.cpp: commit upstream files (as of rev 62cfc54f77e5190) 2023-04-08 15:30:02 +12:00
go.mod initial commit 2023-04-08 15:30:15 +12:00
go.sum initial commit 2023-04-08 15:30:15 +12:00
LICENSE doc/license: add MIT license 2023-04-08 15:30:32 +12:00
llama.cpp llama.cpp: commit upstream files (as of rev 62cfc54f77e5190) 2023-04-08 15:30:02 +12:00
llama.h llama.cpp: commit upstream files (as of rev 62cfc54f77e5190) 2023-04-08 15:30:02 +12:00
main.go initial commit 2023-04-08 15:30:15 +12:00
README.md doc/README: changelog for v1.0.0 2023-04-08 16:15:20 +12:00
webui.go webui: new style 2023-04-09 11:11:01 +12:00

llamacpphtmld

A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.

Features

  • Live streaming responses
  • Continuation-based UI
  • Supports interrupt, modify, and resume
  • Configure the maximum number of simultaneous users
  • Works with any LLaMA model including Vicuna
  • Bundled copy of llama.cpp, no separate compilation required

Usage

All configuration should be supplied as environment variables:

LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \
	LCH_NET_BIND=:8090 \
	LCH_SIMULTANEOUS_REQUESTS=1 \
	./llamacpphtmld

API usage

The generate endpoint will live stream new tokens into an existing conversation until the LLM stops naturally.

  • Usage: curl -v -X POST -d '{"Content": "The quick brown fox"}' 'http://localhost:8090/api/v1/generate'
  • You can optionally supply ConversationID and APIKey string parameters. However, these are not currently used by the server.
  • You can optionally supply a MaxTokens integer parameter, to cap the number of generated tokens from the LLM.

License

MIT

Changelog

2023-04-08 v1.0.0

  • Initial release