code.ivysaur.me/llamacpphtmld

Fork 0

Go to file

mappu defc784dd8 cflags/arm64: fix mcpu flag syntax

2023-04-08 16:07:58 +12:00

doc

doc/README: initial commit

2023-04-08 15:30:37 +12:00

.gitignore

gitignore

2023-04-08 15:31:24 +12:00

api.go

api: reduce log verbosity, log the time-per-token

2023-04-08 16:04:43 +12:00

cflags_linux_amd64.go

initial commit

2023-04-08 15:30:15 +12:00

cflags_linux_arm64.go

cflags/arm64: fix mcpu flag syntax

2023-04-08 16:07:58 +12:00

ggml.c

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

ggml.h

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

go.mod

initial commit

2023-04-08 15:30:15 +12:00

go.sum

initial commit

2023-04-08 15:30:15 +12:00

LICENSE

doc/license: add MIT license

2023-04-08 15:30:32 +12:00

llama.cpp

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

llama.h

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

main.go

initial commit

2023-04-08 15:30:15 +12:00

README.md

doc/README: update features + api docs

2023-04-08 16:04:55 +12:00

webui.go

webui: autoscroll new messages

2023-04-08 16:04:26 +12:00

README.md

llamacpphtmld

A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.

Features

Live streaming responses
Continuation-based UI
Supports interrupt, modify, and resume
Configure the maximum number of simultaneous users
Works with any LLaMA model including Vicuna
Bundled copy of llama.cpp, no separate compilation required

Usage

All configuration should be supplied as environment variables:

LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \
	LCH_NET_BIND=:8090 \
	LCH_SIMULTANEOUS_REQUESTS=1 \
	./llamacpphtmld

API usage

The generate endpoint will live stream new tokens into an existing conversation until the LLM stops naturally.

Usage: curl -v -X POST -d '{"Content": "The quick brown fox"}' 'http://localhost:8090/api/v1/generate'
You can optionally supply ConversationID and APIKey string parameters. However, these are not currently used by the server.
You can optionally supply a MaxTokens integer parameter, to cap the number of generated tokens from the LLM.

License

MIT