Offline LLM Inferencing
When it comes to running LLMs locally (or offline - i.e., not in the cloud), we have several tools to choose from. Coming from the “cloud-native” services, I was quite surprised by their popularity. However, after observing the rapid cloud cost accumulation, I decided to do a little research on the subject.
Probably the best article on the subject is https://getstream.io/blog/best-local-llm-tools/ - this is a great one-stop-shop to start your offline inferencing with open-source models.
Tools for Offline LLM Inferencing:
- Ollama
- Llamafile / Executable Linkable Format (ELF)
- Jan
- LM Studio
- LLaMa.cpp
Engineering excellence metrics
The Engineering Excellence is a measurable efficiency of the dev piline described in PR lifecycle
Sorry, this articale is not finished yet, but the core idea should be visible from here:
- Commits in all PRs
- Commits to all PRs ratio
- Updated Open PRs
- Closed (not Merged) PRs
- Abandoned PRs ratio
- Merged PRs
- Merged PRs Avg. Duration
- Merged PRs Additions (+lines)
- Merged PRs Deletions (-lines)
- Merged PRs Changed Files
- Merged PRs → main
- Merged PRs → main ratio
- AutoCRQ Deployments
- AutoCRQ PRs Lead Time
PR lifecycle
PR lifecycle
In our group we follow simple yet effetive PR workflow: poly-repo setup with a simplified GitHub process. Basically we use main branch with automatic staging deployment and ability to promote a build to the production environment. The promotion script automatically creates a CRQ (a change record in ServiceNow), that we call “auto-CRQ” since it doesn’t requite manual steps (approvals), unless the site status is not green.