Crucible

Proprietary local inference engine. Parses GGUF model weights, runs transformer computation on GPU, generates text. No wrappers, no third-party servers.

PythonCUDAGGUF

Visit →

Crucible is Grimvane’s from-scratch LLM inference engine. It reads GGUF model files directly, implements the transformer forward pass against CUDA via CuPy, and handles tokenization and sampling — without wrapping Ollama, llama.cpp, vLLM, or any other inference framework.

Designed for single-GPU local deployment. Primary consumer is Project Blackbox, but any Grimvane project needing local model inference imports Crucible.