Knowledge base / Microblog about software development related things by Hans-Peter Störr
Created 24-09-2023, last change 26-10-2023
Simon Willison has a very comprehensive command line tool llm for working with large language models (both ChatGPT and others, incl. local models). Very recommended! There are just some notes about using LLM embeddings for search in local files, mostly according to his blog entry about that , as well as this one.
See his blog. I use homebrew on Macl
brew install llm
brew upgrade llm
llm install -U llm
llm install -U llm-sentence-transformers
llm sentence-transformers register --lazy -a minilm all-MiniLM-L12-v2
llm sentence-transformers register --lazy -a mpnet all-mpnet-base-v2
llm install -U llm-embed-jina
# llm aliases set minilm sentence-transformers/all-MiniLM-L12-v2
llm keys set openai
This registers shortcuts minilm
, mpnet
for the resp. models, and declares models jina-embeddings-v2-small-en
,
jina-embeddings-v2-base-en
and jina-embeddings-v2-large-en
- all of them will be downloaded on first use.
Just print embedding for something: llm embed -m ada-002 -c "something"
(OpenAI)
llm embed -m sentence-transformers/all-MiniLM-L6-v2 -c "Hello World"
Just omit the -d test.db if you like to use your llm database stored in your
~/Library/Application Support/io.datasette.llm/embeddings.db
.
llm embed -d test.db -m minilm test hello -c "hello world"
llm embed -d test.db test helloagain -c "hello again"
llm similar -d test.db test -c "hello"
llm embed-multi til -d til.db -m mpnet --store --files . '**/*.md'
llm similar til -d til.db -n 5 -c "how to sync content from JCR to AEM"
Argument –store for embed-multi stores the content, but that makes the output of llm similar
very hard to read.
Mac M1 and Llama-2 etc: https://github.com/simonw/llm-mlc
Generally llm plugins: https://llm.datasette.io/en/stable/plugins/directory.html