Llama Monitor

Stopped

Model Preset Port

Inference

Prompt Speed

—

Generation Speed

—

Context (KV Cache)

—

Slots

—

Server Status

—

GPUs

GPU	Temp	Load	VRAM	Power	SCLK	MCLK

Configuration

Server Paths

llama-server binary

Working Directory

GPU Environment

Architecture GPU Devices

ROCm Path

Browse

New Preset

Model & Memory

Name Name is required Model Path

Model path is required

GPU Layers (-ngl)

no-mmap mlock

Context & KV Cache

Context Size (-c)

KV Key Type (-ctk) KV Value Type (-ctv) Flash Attn (-fa)

Batching & Slots

Batch Size (-b) Micro-batch (-ub) Parallel Slots (-np)

GPU Distribution

Tensor Split (-ts)

Split Mode Main GPU (-mg)

Threading

Threads (-t) Threads Batch (-tb)

Rope Scaling

Leave empty for auto-YaRN when context > 262144. Set explicitly to override.

Rope Scaling Freq Base Freq Scale

Speculative Decoding

ngram-spec

N-gram Size Draft Min Draft Max

Draft Model (-md)

Advanced

Seed (-s) System Prompt File

Extra Args