Skip to content
/ docsim.el Public

An Emacs tool for searching and comparing notes.

License

Notifications You must be signed in to change notification settings

hrs/docsim.el

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docsim.el

https://melpa.org/packages/docsim-badge.svg https://img.shields.io/badge/License-GPL%20v3-blue.svg https://github.com/hrs/docsim.el/actions/workflows/test.yml/badge.svg?branch=main

A fast, local search engine for your text files. Query your notes and get a ranked list of the best matches.

How is this different from grep (or ripgrep, or ag, or…)?

Those tools are all great, and I use them all the time! But they search for literal text matches, usually line-by-line. That’s often what I want, but sometimes I want to know, “what notes are most similar to this query, or to this other note?”

If I search for “chunky bacon,” I still want to see documents that talk about “chunks of bacon.” And, below those, I probably want to see notes that discuss regular “bacon,” even if it’s not chunky.

docsim uses a few different information retrieval algorithms to provide a ranked list of text documents. It takes term frequency into account, weighs rare words more heavily, uses stoplists to ignore common terms, and (optionally) stems English words (so “spinning” can match “spinner”).

It’s also slower and more memory-intensive than those other tools, of course, since it does more work. But performance is a goal, and on a mid-range machine it’ll process a few thousand documents without notable lag.

Installation

docsim.el relies on the docsim command-line tool to do most of the work, so you’ll need to install that on your $PATH first. It’s a small Go utility with precompiled binaries available for most popular platforms, or you can install it from source.

Once that’s done, just install docsim.el from MELPA like any other package. With use-package:

(use-package docsim
  :ensure t
  :bind (("C-c n s" . docsim-search)
         ("C-c n d" . docsim-search-buffer))

  :custom
  (docsim-search-paths '("~/directory/where/you/keep/your/notes")))

Or, if you’d rather install manually, add docsim.el to your load path and:

(require 'docsim)
(setq docsim-search-paths '("~/directory/where/you/keep/your/notes"))
(global-set-key (kbd "C-c n s") 'docsim-search)
(global-set-key (kbd "C-c n d") 'docsim-search-buffer)

Those keybindings are just made up, of course. Use whatever you’d like!

Usage

Once you’ve got docsim.el installed and configured, you’ll probably just use:

docsim-search
Search your notes for a query.
docsim-search-buffer
Parse the current buffer and list similar notes.

If you link your notes together into a zettelkasten (maybe you use org-roam or denote?), docsim-search-buffer can help you surface similar notes that “should” be linked together. That can help you find connections you’d otherwise miss and makes it easier to add notes into your personal knowledge graph.

In particular, the maintainer of this package 👋 happens to use denote to manage their notes, so docsim.el includes an option to exclude notes that you’ve already linked from your search results. Configure that like so:

(setq docsim-search-paths (list denote-directory))
(setq docsim-omit-denote-links t)

WARNING: docsim doesn’t respect .ignore or .gitignore files yet, so it’ll try to search through .git directories, node_modules, and so on. That should be fixed in the near future.

But I don’t write in English!

Because its author is a monoglot (and because it was easy for them to find a good Go stemming library for English) docsim defaults to using English stoplists and stemming to improve accuracy. If you mostly write in a language other than English you might need to tweak things a bit to get comparable results.

First, disable those English-specific features:

(setq docsim-assume-english nil)

Next, optionally, provide a custom stoplist of common words in your language. This is a text file of words that don’t carry much semantic meaning (like “as” or “because” in English) separated by newlines. Words like this would probably already be mostly ignored by docsim, because words are weighted by how rarely they show up, but if you add a stoplist they’ll be skipped completely.

(setq docsim-stoplist-file "~/path/to/my/custom/stoplist.txt")

There’s no universal authority on what constitutes “a good stoplist,” but here’s a repo of stoplists for 29 languages that might be helpful!

Search results dont’t show titles for my notes’ format!

By default docsim.el displays the “title” of each file in a results buffer, instead of just a filename. It tries to get a decent title: it looks for the #+TITLE: in Org files, checks for a title: key in YAML frontmatter in everything else, but otherwise just shows a relative path. Parsing titles from every possible file type is an enormous task. It’s not docsim’s job.

If the default titles aren’t working for you, you can write your own and point docsim-get-title-function to it. Your function should take an absolute file path as an argument and return a string that represents the title of that file.

For example, to show the name of each search result file without its directory and in ALL CAPS:

(defun my-title-parser (path)
  (upcase (file-name-nondirectory path)))

(setq docsim-get-title-function #'my-title-parser)