Skip to main content

LLAMA.cpp and RAG Resources To Read

LLAMA.cpp

  • https://retr0.blog/blog/llama-rpc-rce
  • https://tekkix.com/articles/ai/2024/09/distributed-inference-llamacpp-via-rpc
  • https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md
  • https://huggingface.co/google/gemma-4-31B-it
https://onyx.app/self-hosted-llm-leaderboard

RAG

  • https://blog.premai.io/rag-chunking-strategies-the-2026-benchmark-guide/
  • https://docs.openwebui.com/troubleshooting/rag/
  • https://machinelearningplus.com/gen-ai/optimizing-rag-chunk-size-your-definitive-guide-to-better-retrieval-accuracy/
  • https://gemini.google.com/app/97b148ccb6e03fc1
  • https://community.openai.com/t/processing-large-documents-128k-limit/620347/9