High-performance multi-tier caching for LLM workloads with P2P sharing, vector quantization, and intelligent prefetching.
NenCache is a high-performance multi-tier caching system designed specifically for LLM workloads. Built with static memory allocation and zero-allocation principles, it provides sub-millisecond access to cached embeddings, model outputs, and graph data with intelligent P2P sharing between instances.