code.ivysaur.me/llamacpphtmld

Fork 0

Go to file

mappu 0c96f2bf6b api: support a MaxTokens parameter

2023-04-08 16:03:59 +12:00

doc

doc/README: initial commit

2023-04-08 15:30:37 +12:00

.gitignore

gitignore

2023-04-08 15:31:24 +12:00

api.go

api: support a MaxTokens parameter

2023-04-08 16:03:59 +12:00

cflags_linux_amd64.go

initial commit

2023-04-08 15:30:15 +12:00

cflags_linux_arm64.go

initial commit

2023-04-08 15:30:15 +12:00

ggml.c

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

ggml.h

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

go.mod

initial commit

2023-04-08 15:30:15 +12:00

go.sum

initial commit

2023-04-08 15:30:15 +12:00

LICENSE

doc/license: add MIT license

2023-04-08 15:30:32 +12:00

llama.cpp

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

llama.h

llama.cpp: commit upstream files (as of rev 62cfc54f77e5190)

2023-04-08 15:30:02 +12:00

main.go

initial commit

2023-04-08 15:30:15 +12:00

README.md

doc/README: initial commit

2023-04-08 15:30:37 +12:00

webui.go

webui: synchronize context size value for clientside warning

2023-04-08 15:48:16 +12:00

README.md

llamacpphtmld

A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.

Features

Live streaming responses
Continuation-based UI, supporting interrupt, modify, and resume
Configure the maximum number of simultaneous users
Works with any LLaMA model including Vicuna
Bundled copy of llama.cpp, no separate compilation required

Usage

All configuration should be supplied as environment variables:

LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \
	LCH_NET_BIND=:8090 \
	LCH_SIMULTANEOUS_REQUESTS=1 \
	./llamacpphtmld

API usage

curl -v -d '{"ConversationID": "", "APIKey": "", "Content": "The quick brown fox"}' -X 'http://localhost:8090/api/v1/generate'

License

MIT