Crucible is Grimvane’s from-scratch LLM inference engine. It reads GGUF model files directly, implements the transformer forward pass against CUDA via CuPy, and handles tokenization and sampling — without wrapping Ollama, llama.cpp, vLLM, or any other inference framework.
Designed for single-GPU local deployment. Primary consumer is Project Blackbox, but any Grimvane project needing local model inference imports Crucible.