From bb60bb989f35ee1c8c6bf43ab8c4f85794e40787 Mon Sep 17 00:00:00 2001
From: mappu <mappu04@gmail.com>
Date: Sat, 8 Apr 2023 16:04:55 +1200
Subject: [PATCH] doc/README: update features + api docs

---
 README.md | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index cd0b502..284c86e 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,8 @@ A web interface and API for the LLaMA large language AI model, based on the [lla
 ## Features
 
 - Live streaming responses
-- Continuation-based UI, supporting interrupt, modify, and resume
+- Continuation-based UI
+- Supports interrupt, modify, and resume
 - Configure the maximum number of simultaneous users
 - Works with any LLaMA model including [Vicuna](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit)
 - Bundled copy of llama.cpp, no separate compilation required
@@ -23,9 +24,11 @@ LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \
 
 ## API usage
 
-```
-curl -v -d '{"ConversationID": "", "APIKey": "", "Content": "The quick brown fox"}' -X 'http://localhost:8090/api/v1/generate'
-```
+The `generate` endpoint will live stream new tokens into an existing conversation until the LLM stops naturally.
+
+- Usage: `curl -v -X POST -d '{"Content": "The quick brown fox"}' 'http://localhost:8090/api/v1/generate'`
+- You can optionally supply `ConversationID` and `APIKey` string parameters. However, these are not currently used by the server.
+- You can optionally supply a `MaxTokens` integer parameter, to cap the number of generated tokens from the LLM.
 
 ## License