Vulpea v0.3
The time has come to release a new version of Vulpea, a collection of note-taking functions based on Org and Org Roam. Almost a year has passed since the release of v0.2. Over this time, Vulpea got several major features that deserve a dedicated version. In truth, I wanted to release it almost half a year ago, but the war kicked in, and I had more important things to focus on.
The main focus of this release is the performance of query operations. Expect a 4.5x speed boost for generic vulpea-db-query
and even more with specialized queries. All that thanks to materialized view table for notes.
test | result size | regular | view table | ratio |
---|---|---|---|---|
tags-some |
30 notes | 4.6693460650999995 | 1.0112478712 | 4.6174100 |
tags-every |
3168 notes | 4.7333844436999996 | 1.0059819176 | 4.7052381 |
links-some |
1657 notes | 4.8095771283 | 1.0462236128999999 | 4.5970833 |
links-every |
92 notes | 4.5517473337999995 | 1.0204833089 | 4.4603839 |
test | result size | generic | specialized | ratio |
---|---|---|---|---|
tags-some |
30 notes | 1.0112478712 | 0.0066033426 | 153.14182 |
tags-every |
3168 notes | 1.0059819176 | 0.5709392964999999 | 1.7619770 |
links-some |
1657 notes | 1.0462236128999999 | 0.4248580532 | 2.4625251 |
links-every |
92 notes | 1.0204833089 | 0.0545313596 | 18.713696 |
Checkout CHANGELOG for full list of changes. Read further to learn details.
Materialized view table
Materialized view table, huh? The concept is actually pretty simple. But first, you need to understand how Org Roam DB is designed.
Under the hood, Org Roam uses SQLite with 7 different tables. Each table stores various information about every single note. There is a table with ‘node’ information like title, level, properties, etc. But tags, aliases, and links are stored in separate tables. And you have to query data from all these tables to get a fully populated note. It becomes messy and slow if you need to get lots of fully populated notes.
I won’t go too much into detail explaining how it worked previously in Vulpea and how it still works in Org Roam. That deserves a separate post that can be based on my conversation with Jörg Volbers. In short, it uses a trick to multiply all needed tables into one big table. Definitely not my proudest contribution. But most importantly, it is error-prone and slow.
After adding links to the vulpea-note
, I realised that vulpea-db-query
became too slow for my collection of notes. And so I decided to add an extra table, where every row contains a fully materialised/populated note. This simple idea provided a 4.5x performance boost to all functions using vulpea-db-query
.
This is a game changer with only one downside - write performance degradation. Of course, we write the same data twice - in Org Roam tables and then in materialised view table. But there is more to writing performance degradation. It is hard to hook into Org Roam parsing and writing routine, so I had to duplicate buffer parsing. And this is the biggest performance offender. In most cases, when you simply modify a single note, this is hardly noticeable. But syncing 9k notes takes x2 time.
test | regular | view table | diff | ratio |
---|---|---|---|---|
9554 notes | 172.79389154999998 | 337.61603822 | 164.82215 | 1.9538656 |
small note | 0.000354079889 | 0.000416262194 | 6.2182305e-5 | 1.1756166 |
medium note | 0.000492199416 | 0.000539389997 | 4.7190581e-5 | 1.0958770 |
huge note | 0.1732851848 | 0.2240508243 | 0.050765640 | 1.2929601 |
This penalty can be omitted if materialised view table becomes part of Org Roam. Alternatively, the Org Roam routine can benefit from refactoring to enable this kind of hook without any performance loss. Do I plan to work in this direction? Definitely! But I feel a lack of motivation and resources to promise anything.
See vulpea#116 for benchmarks.
Specialized queries
As I already mentioned, all this performance mumbo jumbo started because of the inclusion of links to vulpea-note
. I literally was disappointed with the performance. And I decided to take the path of least resistance and provided some specialized queries (because they kind of solve the problem for my most used patterns).
What are the most used patterns? Get the list of notes tagged by all/any given tags. Get the list of notes linking to all/any given notes.
vulpea-db-query-by-tags-some
- return all notes tagged with one of the providedTAGS
.vulpea-db-query-by-tags-every
- return all notes tagged by every tag from the list of providedTAGS
.vulpea-db-query-by-links-some
- return all notes linking at least one of the providedDESTINATIONS
.vulpea-db-query-by-links-every
- return all notes linking each and every providedDESTINATIONS
.
These are benchmark results on a set of 8390 notes before the materialized view was introduced.
test | org-roam API |
vulpea-db-query |
specialized |
---|---|---|---|
simple | 0.28974334 | 2.28282158 | 0.02278065 |
popular | 1.85930086 | 2.34636907 | 1.42250805 |
big intersection | 4.53420141 | 2.27249325 | 0.46843158 |
What exactly makes specialized queries so fast? We start by narrowing down all notes in the fastest possible way. And then, we exchange this list of ids to list of notes in a single transaction.
*-some
We use IN operator to get notes tagged by at least one of the provided tags.
select distinct node_id
from tags
where tag in ('"wine"', '"barberry/public"');
Ignore serialisation of the string data. It is related to emacsql limitations.
Now that we have the list of ids, we can get list of notes in one transaction by using vulpea-db-query-by-ids
.
*-every
We use INTERSECT operator to get notes tagged by all provided tags. Looks more tricky than IN
operator, but the idea is simple. We find a list of notes tagged by first tag, a list of notes tagged by second tag, a list of notes tagged by third tag, and so on… And then we calculate intersection of these lists. Voilà!
select distinct node_id
from tags
where tag = '"wine"'
INTERSECT
select distinct node_id
from tags
where tag = '"barberry/public"'
;
Now that we have the list of ids, we can get list of notes in one transaction by using vulpea-db-query-by-ids
.
Other goodies
- More data is available in
vulpea-note
: properties and links. - Note meta is persisted in
org-roam-db
inmeta
table. - API to access data from
vulpea-note-meta
:vulpea-note-meta-get-list
- to get all values of givenPROP
andTYPE
;vulpea-note-meta-get
- to get the first value of givenPROP
andTYPE
.
- New function to remove buffer properties -
vulpea-buffer-prop-remove
. - Improve
filetags
handling:- Property format them with
:
as separator; - Remove property when setting them to empty list instead of leaving empty property.
- Property format them with
- Allow to configure candidates source for
vulpea-find
function viavulpea-find-default-candidates-source
variable. - New function to select from arbitrary list of notes as opposed to relying on filter -
vulpea-select-from
. - Add shortcuts for checking tags on the note:
vulpea-note-tagged-all-p
- return non-nil if aNOTE
is tagged by all of theTAGS
.vulpea-note-tagged-any-p
- return non-nil if aNOTE
is tagged by any of theTAGS
.
Next steps
Vulpea already offers a lot of features. While working on other projects that use Vulpea, I discover new missing features that I would love to implement. And issue tracker is not empty. Unless something changes, my plan is the following.
- Add attachment directory path to
vulpea-note
. This is needed to boost barberry.io construction, so I prioritise this simple feature. - Provide a simple way to add more tables to org-roam-db and improve their initialisation flow.
- Provide outline-level metadata. See vulpea#75.
Stay tuned and safe travels! Remember to use Emacs responsibly.