Vulpea v0.3

Posted on July 14, 2022
Updated on July 20, 2022
Tagged as #emacs, #org-roam, #vulpea

The time has come to release a new version of Vulpea, a collection of note-taking functions based on Org and Org Roam. Almost a year has passed since the release of v0.2. Over this time, Vulpea got several major features that deserve a dedicated version. In truth, I wanted to release it almost half a year ago, but the war kicked in, and I had more important things to focus on.

The main focus of this release is the performance of query operations. Expect a 4.5x speed boost for generic vulpea-db-query and even more with specialized queries. All that thanks to materialized view table for notes.

vulpea-db-query performance on a set of 9554 notes (in seconds)
test result size regular view table ratio
tags-some 30 notes 4.6693460650999995 1.0112478712 4.6174100
tags-every 3168 notes 4.7333844436999996 1.0059819176 4.7052381
links-some 1657 notes 4.8095771283 1.0462236128999999 4.5970833
links-every 92 notes 4.5517473337999995 1.0204833089 4.4603839
specialized queries performance on a set of 9554 notes (in seconds)
test result size generic specialized ratio
tags-some 30 notes 1.0112478712 0.0066033426 153.14182
tags-every 3168 notes 1.0059819176 0.5709392964999999 1.7619770
links-some 1657 notes 1.0462236128999999 0.4248580532 2.4625251
links-every 92 notes 1.0204833089 0.0545313596 18.713696

Checkout CHANGELOG for full list of changes. Read further to learn details.

Materialized view table

Materialized view table, huh? The concept is actually pretty simple. But first, you need to understand how Org Roam DB is designed.

Under the hood, Org Roam uses SQLite with 7 different tables. Each table stores various information about every single note. There is a table with ‘node’ information like title, level, properties, etc. But tags, aliases, and links are stored in separate tables. And you have to query data from all these tables to get a fully populated note. It becomes messy and slow if you need to get lots of fully populated notes.

I won’t go too much into detail explaining how it worked previously in Vulpea and how it still works in Org Roam. That deserves a separate post that can be based on my conversation with Jörg Volbers. In short, it uses a trick to multiply all needed tables into one big table. Definitely not my proudest contribution. But most importantly, it is error-prone and slow.

After adding links to the vulpea-note, I realised that vulpea-db-query became too slow for my collection of notes. And so I decided to add an extra table, where every row contains a fully materialised/populated note. This simple idea provided a 4.5x performance boost to all functions using vulpea-db-query.

This is a game changer with only one downside - write performance degradation. Of course, we write the same data twice - in Org Roam tables and then in materialised view table. But there is more to writing performance degradation. It is hard to hook into Org Roam parsing and writing routine, so I had to duplicate buffer parsing. And this is the biggest performance offender. In most cases, when you simply modify a single note, this is hardly noticeable. But syncing 9k notes takes x2 time.

synchronisation performance (in seconds)
test regular view table diff ratio
9554 notes 172.79389154999998 337.61603822 164.82215 1.9538656
small note 0.000354079889 0.000416262194 6.2182305e-5 1.1756166
medium note 0.000492199416 0.000539389997 4.7190581e-5 1.0958770
huge note 0.1732851848 0.2240508243 0.050765640 1.2929601

This penalty can be omitted if materialised view table becomes part of Org Roam. Alternatively, the Org Roam routine can benefit from refactoring to enable this kind of hook without any performance loss. Do I plan to work in this direction? Definitely! But I feel a lack of motivation and resources to promise anything.

See vulpea#116 for benchmarks.

Specialized queries

As I already mentioned, all this performance mumbo jumbo started because of the inclusion of links to vulpea-note. I literally was disappointed with the performance. And I decided to take the path of least resistance and provided some specialized queries (because they kind of solve the problem for my most used patterns).

What are the most used patterns? Get the list of notes tagged by all/any given tags. Get the list of notes linking to all/any given notes.

  • vulpea-db-query-by-tags-some - return all notes tagged with one of the provided TAGS.
  • vulpea-db-query-by-tags-every - return all notes tagged by every tag from the list of provided TAGS.
  • vulpea-db-query-by-links-some - return all notes linking at least one of the provided DESTINATIONS.
  • vulpea-db-query-by-links-every - return all notes linking each and every provided DESTINATIONS.

These are benchmark results on a set of 8390 notes before the materialized view was introduced.

query by links performance (in seconds)
test org-roam API vulpea-db-query specialized
simple 0.28974334 2.28282158 0.02278065
popular 1.85930086 2.34636907 1.42250805
big intersection 4.53420141 2.27249325 0.46843158

What exactly makes specialized queries so fast? We start by narrowing down all notes in the fastest possible way. And then, we exchange this list of ids to list of notes in a single transaction.

*-some

We use IN operator to get notes tagged by at least one of the provided tags.

select distinct node_id
from tags
where tag in ('"wine"', '"barberry/public"');

Ignore serialisation of the string data. It is related to emacsql limitations.

Now that we have the list of ids, we can get list of notes in one transaction by using vulpea-db-query-by-ids.

*-every

We use INTERSECT operator to get notes tagged by all provided tags. Looks more tricky than IN operator, but the idea is simple. We find a list of notes tagged by first tag, a list of notes tagged by second tag, a list of notes tagged by third tag, and so on… And then we calculate intersection of these lists. Voilà!

select distinct node_id
from tags
where tag = '"wine"'

INTERSECT

select distinct node_id
from tags
where tag = '"barberry/public"'

;

Now that we have the list of ids, we can get list of notes in one transaction by using vulpea-db-query-by-ids.

Other goodies

  • More data is available in vulpea-note: properties and links.
  • Note meta is persisted in org-roam-db in meta table.
  • API to access data from vulpea-note-meta:
    • vulpea-note-meta-get-list - to get all values of given PROP and TYPE;
    • vulpea-note-meta-get - to get the first value of given PROP and TYPE.
  • New function to remove buffer properties - vulpea-buffer-prop-remove.
  • Improve filetags handling:
    • Property format them with : as separator;
    • Remove property when setting them to empty list instead of leaving empty property.
  • Allow to configure candidates source for vulpea-find function via vulpea-find-default-candidates-source variable.
  • New function to select from arbitrary list of notes as opposed to relying on filter - vulpea-select-from.
  • Add shortcuts for checking tags on the note:
    • vulpea-note-tagged-all-p - return non-nil if a NOTE is tagged by all of the TAGS.
    • vulpea-note-tagged-any-p - return non-nil if a NOTE is tagged by any of the TAGS.

Next steps

Vulpea already offers a lot of features. While working on other projects that use Vulpea, I discover new missing features that I would love to implement. And issue tracker is not empty. Unless something changes, my plan is the following.

  1. Add attachment directory path to vulpea-note. This is needed to boost barberry.io construction, so I prioritise this simple feature.
  2. Provide a simple way to add more tables to org-roam-db and improve their initialisation flow.
  3. Provide outline-level metadata. See vulpea#75.

Stay tuned and safe travels! Remember to use Emacs responsibly.