ggml-opt.h. It handles dataset batching, gradient accumulation, forward and backward passes, and optimizer steps — so you can focus on defining your model graph.
Workflow
Select a loss type
Choose the loss function that matches your problem. The built-in options cover most supervised learning tasks:Use
MEAN or SUM when your graph already computes a meaningful scalar loss and you only need the optimizer to minimize it.Create a dataset
Allocate a dataset and populate its
data and labels tensors with your training samples. See Datasets for full details.Build a GGML graph
Define your model as a GGML computation graph with
no_alloc = true. Use two separate contexts:- Parameters context — holds model weights and the
inputstensor. Allocate this statically in your code; its data remains valid throughout training. - Compute context — holds all intermediate tensors. The optimizer reallocates this context automatically; do not read its tensor data directly.
The second dimension of
inputs and outputs is interpreted as the batch size (number of datapoints). Make sure it matches ndata_batch in your dataset batching.Fit the model
Call For more control over the loop — custom callbacks, per-batch metrics, or mid-epoch checkpointing — use
ggml_opt_fit to run the full training loop. It handles shuffling, batching, gradient accumulation, validation splits, and epoch reporting.ggml_opt_epoch instead.ggml_opt_fit parameters
| Parameter | Description |
|---|---|
backend_sched | Backend scheduler that controls which device(s) execute the compute graphs. |
ctx_compute | GGML context holding the temporary (non-parameter) tensors of your graph. Must have been created with no_alloc = true. |
inputs | Input tensor. Shape must be [ne_datapoint, ndata_batch]. |
outputs | Output tensor. When labels are used, shape must be [ne_label, ndata_batch]. |
dataset | Dataset created with ggml_opt_dataset_init. |
loss_type | Loss function to minimize. |
optimizer | GGML_OPT_OPTIMIZER_TYPE_ADAMW or GGML_OPT_OPTIMIZER_TYPE_SGD. |
get_opt_pars | Callback invoked before each backward pass to supply optimizer hyperparameters. The userdata pointer passed to this callback is a pointer to the current epoch number (int64_t), which enables learning rate schedules. |
nepoch | Number of full passes over the training portion of the dataset. |
nbatch_logical | Number of datapoints between optimizer steps. Must be a multiple of the physical batch size (the second dimension of inputs). Values larger than the physical batch trigger gradient accumulation. |
val_split | Fraction of the dataset reserved for validation. Must be in [0.0, 1.0). Pass 0.0f to skip validation. |
silent | When true, suppresses all progress output to stderr. |
Static vs dynamic graph allocation
The optimizer context supports two graph allocation modes:- Static allocation
- Dynamic allocation
Set
ctx_compute, inputs, and outputs on the ggml_opt_params struct before calling ggml_opt_init. The optimizer allocates the forward, gradient, and optimizer graphs once at initialization and reuses them for every evaluation.This is the mode used by ggml_opt_fit. Prefer static allocation when the graph topology is fixed across all batches.Build types
Thebuild_type field on ggml_opt_params controls which graphs the optimizer constructs:
| Build type | Use case |
|---|---|
FORWARD | Evaluation or inference — no gradients computed. |
GRAD | Compute gradients without applying an optimizer step. Useful for inspecting gradients or implementing custom update rules. |
OPT | Full training: forward pass, backward pass, and optimizer parameter update. This is the default for ggml_opt_fit. |
Optimizers
Configure AdamW and SGD, set learning rate schedules, and manage the optimizer context.
Datasets
Initialize datasets, populate tensors, shuffle data, and write custom epoch callbacks.
