Llama Monitor
Stopped
⚙
Server
Chat
Logs
Model Preset
New
Edit
Copy
Delete
Reset
Port
Start
Stop
Inference
Prompt Speed
—
Generation Speed
—
Context (KV Cache)
—
Slots
—
Server Status
—
GPUs
GPU
Temp
Load
VRAM
Power
SCLK
MCLK
Clear
Send
Configuration
×
Server Paths
llama-server binary
Browse
Working Directory
Browse
GPU Environment
Architecture
GPU Devices
ROCm Path
Cancel
Save
Browse
×
↑ Up
Cancel
Select This Folder
New Preset
×
Model & Memory
Name
Name is required
Model Path
Browse
Model path is required
GPU Layers (-ngl)
no-mmap
mlock
Context & KV Cache
Context Size (-c)
KV Key Type (-ctk)
KV Value Type (-ctv)
Flash Attn (-fa)
(default)
auto
on
off
Batching & Slots
Batch Size (-b)
Micro-batch (-ub)
Parallel Slots (-np)
GPU Distribution
Tensor Split (-ts)
Split Mode
(default)
layer
row
Main GPU (-mg)
Threading
Threads (-t)
Threads Batch (-tb)
Rope Scaling
Leave empty for auto-YaRN when context > 262144. Set explicitly to override.
Rope Scaling
(auto)
yarn
linear
none
Freq Base
Freq Scale
Speculative Decoding
ngram-spec
N-gram Size
Draft Min
Draft Max
Draft Model (-md)
Advanced
Seed (-s)
System Prompt File
Extra Args
Cancel
Save