diff --git a/README.md b/README.md index cd0b502..284c86e 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,8 @@ A web interface and API for the LLaMA large language AI model, based on the [lla ## Features - Live streaming responses -- Continuation-based UI, supporting interrupt, modify, and resume +- Continuation-based UI +- Supports interrupt, modify, and resume - Configure the maximum number of simultaneous users - Works with any LLaMA model including [Vicuna](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit) - Bundled copy of llama.cpp, no separate compilation required @@ -23,9 +24,11 @@ LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \ ## API usage -``` -curl -v -d '{"ConversationID": "", "APIKey": "", "Content": "The quick brown fox"}' -X 'http://localhost:8090/api/v1/generate' -``` +The `generate` endpoint will live stream new tokens into an existing conversation until the LLM stops naturally. + +- Usage: `curl -v -X POST -d '{"Content": "The quick brown fox"}' 'http://localhost:8090/api/v1/generate'` +- You can optionally supply `ConversationID` and `APIKey` string parameters. However, these are not currently used by the server. +- You can optionally supply a `MaxTokens` integer parameter, to cap the number of generated tokens from the LLM. ## License