code.ivysaur.me/llamacpphtmld

Fork 0

Go to file

mappu 252c809f92 webui: prevent zooming into the textarea

2023-04-09 18:46:11 +12:00

doc

doc/README: initial commit

2023-04-08 15:30:37 +12:00

.gitignore

gitignore

2023-04-08 15:31:24 +12:00

api.go

api: raise default context size from 512->1024

2023-04-09 11:12:16 +12:00

cflags_linux_amd64.go

initial commit

2023-04-08 15:30:15 +12:00

cflags_linux_arm64.go

cflags/arm64: fix mcpu flag syntax

2023-04-08 16:07:58 +12:00

ggml.c

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

ggml.h

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

go.mod

go mod tidy

2023-04-09 11:14:41 +12:00

go.sum

initial commit

2023-04-08 15:30:15 +12:00

LICENSE

doc/license: add MIT license

2023-04-08 15:30:32 +12:00

llama.cpp

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

llama.h

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

main.go

initial commit

2023-04-08 15:30:15 +12:00

README.md

doc/README: changelog for v1.1.0

2023-04-09 11:14:47 +12:00

webui.go

webui: prevent zooming into the textarea

2023-04-09 18:46:11 +12:00

README.md

llamacpphtmld

A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.

Features

Live streaming responses
Continuation-based UI
Supports interrupt, modify, and resume
Configure the maximum number of simultaneous users
Works with any LLaMA model including Vicuna
Bundled copy of llama.cpp, no separate compilation required

Usage

All configuration should be supplied as environment variables:

LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \
	LCH_NET_BIND=:8090 \
	LCH_SIMULTANEOUS_REQUESTS=1 \
	./llamacpphtmld

Use the GOMAXPROCS environment variable to control how many threads the llama.cpp engine uses.

API usage

The generate endpoint will live stream new tokens into an existing conversation until the LLM stops naturally.

Usage: curl -v -X POST -d '{"Content": "The quick brown fox"}' 'http://localhost:8090/api/v1/generate'
You can optionally supply ConversationID and APIKey string parameters. However, these are not currently used by the server.
You can optionally supply a MaxTokens integer parameter, to cap the number of generated tokens from the LLM.

License

MIT

Changelog

2023-04-09 v1.1.0

New web interface style, that is more mobile friendly and shows API status messages
Add default example prompt
Use a longer n_ctx by default

2023-04-08 v1.0.0

Initial release