Train and evaluate a neural network on MNIST using ggml
The examples/mnist directory shows how to use ggml for both training and inference on the MNIST handwritten digit dataset. Two model architectures are provided: a fully connected network and a convolutional network.
Training in ggml is a work-in-progress and not production-ready. These examples are intended for learning purposes.
The evaluator prints a random test image as ASCII art, the model’s prediction for that image, and aggregate accuracy over the full test set:
main: loaded model in 109.44 msmnist_model_eval: model evaluation on 10000 images took 76.92 ms, 7.69 us/imagemain: predicted digit is 3main: test_loss=0.066379+-0.009101main: test_acc=97.94+-0.14%
main: loaded model in 91.99 msmnist_model_eval: model evaluation on 10000 images took 267.61 ms, 26.76 us/imagemain: predicted digit is 1main: test_loss=0.047955+-0.007029main: test_acc=98.46+-0.12%
Gradient accumulation is used during training via separate logical and physical batch sizes:
#define MNIST_NBATCH_LOGICAL 1000 // datapoints per gradient update#define MNIST_NBATCH_PHYSICAL 500 // datapoints processed in parallel
The logical batch size controls gradient update frequency; the physical batch size controls parallelism and memory usage. Any multiple works as long as MNIST_NBATCH_LOGICAL % MNIST_NBATCH_PHYSICAL == 0.
The evaluation code can be compiled to WebAssembly using Emscripten:
# Copy model and test data into examples/mnist (symlinks do not work)cp mnist-fc-f32.gguf examples/mnist/mnist-f32.ggufcp data/MNIST/raw/t10k-images-idx3-ubyte examples/mnist/# Build from the repo rootmkdir -p build-ememcmake cmake .. -DGGML_BUILD_EXAMPLES=ON \ -DCMAKE_C_FLAGS="-pthread -matomics -mbulk-memory" \ -DCMAKE_CXX_FLAGS="-pthread -matomics -mbulk-memory"make mnist
Serve the output files with a local HTTP server:
python3 examples/mnist/server.py# Serving directory at http://localhost:8000
Open the link in your browser. Draw a digit on the canvas and the model predicts it, or click Random to pull an image from the test set.
Neural networks are susceptible to distributional shift. Digits that look significantly different from the MNIST training data (e.g. not centred) may be misclassified.