ggml provides two built-in optimizers: AdamW and SGD. Both are configured through theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/ggml-org/ggml/llms.txt
Use this file to discover all available pages before exploring further.
ggml_opt_optimizer_params struct and supplied to the optimizer context via a callback.
Optimizer types
- AdamW
- SGD
AdamW is the recommended default for most deep learning tasks. It maintains per-parameter first and second moment estimates and applies decoupled weight decay.
| Field | Description |
|---|---|
alpha | Learning rate. Controls the step size applied to each parameter update. |
beta1 | Exponential decay rate for the first moment (mean of gradients). Typical value: 0.9. |
beta2 | Exponential decay rate for the second moment (uncentered variance of gradients). Typical value: 0.999. |
eps | Small constant added to the denominator to prevent division by zero. Typical value: 1e-8. |
wd | Weight decay coefficient. Applied directly to parameters (decoupled from the gradient update). Set to 0.0f to disable. |
AdamW requires two additional momentum tensors (
m and v) per trainable parameter tensor. This increases memory usage relative to SGD.Optimizer params callbacks
The optimizer does not readggml_opt_optimizer_params directly. Instead, it calls a ggml_opt_get_optimizer_params callback before each backward pass, allowing you to change hyperparameters dynamically during training (for example, to implement a learning rate schedule).
userdata pointer carries arbitrary context to the callback. When using ggml_opt_fit, userdata is a pointer to the current epoch number (int64_t *).
Built-in callbacks
ggml_opt_get_constant_optimizer_params when you want to supply fixed hyperparameters without writing a custom callback:
Custom learning rate schedule
Becauseggml_opt_fit passes a pointer to the current epoch as userdata, you can implement epoch-dependent schedules:
ggml_opt_params struct
ggml_opt_params configures the full optimization context, including backend, loss, build type, and optimizer.
ggml_opt_default_params to get a struct with sensible defaults, then override individual fields:
| Field | Description |
|---|---|
backend_sched | Defines which backends are used to construct and execute compute graphs. |
ctx_compute | Compute context for static graph allocation. Leave NULL for dynamic allocation. |
inputs / outputs | Input and output tensors for static graph allocation. Leave NULL for dynamic allocation. |
loss_type | Loss function to minimize during training. |
build_type | Controls which graphs are built: FORWARD, GRAD, or OPT. Default for training is OPT. |
opt_period | Number of gradient accumulation micro-steps between optimizer parameter updates. |
get_opt_pars | Callback to retrieve optimizer hyperparameters before each backward pass. |
get_opt_pars_ud | Arbitrary pointer passed as userdata to get_opt_pars. |
optimizer | Optimizer algorithm: ADAMW or SGD. |
Context lifecycle
ggml_opt_reset with optimizer = false clears accumulated gradients and resets the loss scalar without discarding the optimizer’s internal momentum state. Pass true to perform a full reset, which is equivalent to starting a fresh training run with the same graph.