🌋 S02 E05: On Booling & Knowledge Management Tooling

Aug 03, 2021

🗺 Personal Updates

This month began with two glorious weeks off of work, with another half-week off towards the end. In that time, I revamped some of my personal project management systems (thanks David Allen), somewhat caught up on my long list of TODOs, and began working on an exciting new side project: a memex, which I'll dive into more detail later.

My pre-frontal cortex also finally fully developed when I turned 25. Mack planned an amazing weekend of festivities that culminated with a trip to Big Island, Hawaii.

Upcoming Travel Plans:

Wesley Chapel, FL (majority of August) 🤠
Madrid, Spain (Sept. 11 to Sept. 17) assuming international travel is still kosher by then
New York City (Sept. 18 to Oct. 2) — if you'll be around, let me know!

🎨 Artifacts

My website has a fresh coat of paint! As I mentioned last month, I get the urge to re-vamp my site every 6-8 months with a new look & feel. This time around, I updated some of the styling to suit my mood of the moment and added some new features including image previews of posts, a progress bar, and some componentry that better adheres to design principles.
I've also added a "Travel Journal" on my site, which will be for writings about my travel experiences. There are a few on there already from when I found myself in Southeast Asia, and I'll be writing about my time in Big Island, Hawaii shortly.

🔮 A Personal Search Engine

A few weeks ago I came across this tweet from Linus, better known as @thesephist:

https://twitter.com/thesephist/status/1412956530220093448

Over a weekend, Linus collected a bunch of his personal data (blog posts, tweets, journal entries, etc.) into a system that would allow him to search through everything on demand. He essentially built a prototype of Vannevar Bush's memex, which Bush defined as "a device in which an individual stores all their books, records, and communications to supplement their memory".

Why is this even useful? The human brain isn't designed to store the vast amounts of information that we create in our modern world. It can approximate this ability to some extent by surfacing thing in associative contexts (going from the word "gray" ⬜️ -> gray dog 🐩 -> your childhood dog sparky 🥺 -> a memory of the time he ate your homework 🧾). However, storing data isn't what the brain is best at — it's best at forming connections from information that it loads into working memory.

This is why I'm so obsessed with knowledge management and can spend hours sorting out my thoughts & learning into the right contexts. The closest thing I've come to having my own memex is my Roam Research graph, but even that has its limitations. Everything I can find in my Roam is something that I've actively parsed through and taken the time to write down. What about the vast amounts of memories, experiences, insight, and information that don't get written down?

This question is what has inspired me to start this next side-project: my own memex, which will serve as what Linus dubs a "personal search engine". I'm sure there are services out there that do the job, but I think the value of building my own system outweighs the time commitment. I much prefer that a system that I create has access to the very personal data, images, and messages that I hope to feed to it. Some of this data, including my journal entries are some of my most private and cherished possessions, and the last thing I would want is some lame SaaS company to have access to all of this. I want ownership of my own data.

So, how am I going to do this? To start, I'll need to figure out exactly what data stores I want to include in this system — here's a non-exhaustive list to get started:

My Roam Research graph: I back this up twice a day to a private GitHub repo in various formats (including .json which I will format all of my data into).
Google Data: For better or for worse, Google collects data on everything you do in their all-encompassing products (unless you opt out) which you can access and download through Google Takeout. This includes search history, YouTube watch history, Chrome history, location history, among many other things.
Publicly shared content: This includes all of my blog posts and tweets mostly.
Text Messages: As an iPhone user, I can download all of my text & iMessage data into a SQLite database — it's a bit technically involved but that's what you get when poking into Apple's fortresses.
Messenger Data: Most of my Facebook data isn't very useful except for my messaging data, which can easily be accessed and download through the DYI tool.
Twitter Likes: This data set has likely some of the highest signal-to-ratio of my consumed information. Unfortunately, the only way to access this is through their public API which isn't meant for this purpose (it can only get you your latest 20 likes).
Pocket Content: Most of the interesting articles I find on the internet are saved to Pocket. It wouldn't make much sense to index every webpage I've come across on the internet, so maybe a curated list of articles that I've actively saved would be more useful.

When building a memex, I see two approaches to solving the problem of sorting & organizing all of your personal data: (1) the "personal search engine" approach (which I am moving forward with) and (2) the "daily dashboard & timeline" approach. The latter is something that Andrew Louis has worked on for a few years now. His system is searchable with a customized query language but isn't a search engine in the purest sense of the term.

With all of this data, I think that context is valuable and that is where I can take a page out of Andrew's system. When did I first write a note, take a picture, or send a message? What was the temperature like that day and what else might have influenced my state of mind? Context is crucial when searching for information on the internet, and even more-so when considering information that you've personally created & consumed. By using this context to inform your thinking, you're building stronger self-awareness into your life and future decisions. In the best world, this turns into a virtuous cycle that further improves your own knowledge graph.

So, what is the most important piece of context to make sure I include? TIME. Knowing when I wrote something, liked something, consumed something, or otherwise captured something will be crucial for deriving other data points like items that show up on the same day or other details about what that day was like.

What are the steps to building this?

Clean and parse data into uniform .json format for all data sources. This will undoubtedly be the hardest and most time-consuming part, since I'll be glueing together various APIs and datasets into something useful. The data is also liable to change without notice, breaking something in the process. And finally, this will also need periodic maintenance so that I can update the data over time until I automate the collect & import process.
The next step is to index the data so that it's searchable. Indexing simply means creating a uniform data structure around your disparate heaps of data so that you can quickly retrieve and process information.
Once we have all of our data in a format that is readable by the machine, we can work on the actual search algorithm. I don't need to reinvent the wheel here and can simply use the best existing full-text search algorithm This should be able to sort the returned data from a search input through some measure of relevance. This should also be able to return a number of specified items so we can paginate on the UI. Which brings me to...
The UI! Once we have a system that has ingested all relevant data and can sort & search through a term given to it, we will need to build a UI that can let the user do it in an intuitive way. I really like how Linus built Monocle and will likely draw a lot of inspiration from his UI once I get this far.

There you have it. This is a project that will take some time to build properly, but is something that has a lot of potential to be expanded in many directions once I get the MVP out (a tool that can search through all my data). Here are some long-term use-cases that I can see myself eventually building out.

A day context for any given note (what weather was like, what you did, where you went, etc.)
Day timeline that can give you an overview of what the day was actually like
Automated habit creation (since you know quantitatively what happened on a given day)
Twilio integration for auto-tagging certain things that I want to notate (if I choose to) like people I was with, off-hand activities, etc.
To scale this beyond myself, turn this product into an electron app with data on a local database or personal cloud so that there's minimal data privacy issues

Let's see how far I can get on this project by the next newsletter!

🍯 Best Finds

📝 The Tyranny of Numbers by Thomas J Bevan: I serendipitously found this article after deciding to take my Apple Watch off for a few days to see how I felt. Shocker — I feel great, minus the FOMO of not recording a particularly intense tennis session or the fact that I couldn't prove I was standing for 18 hours in a day.

"... and once you do this, once you are no longer tyrannised by numbers you will find that intuition, discernment and the appreciation of the intangible, the ephemeral and the beautiful will grow in their place."
👾 Github1s: This one's for my code fiends. Github1s allows you to view entire GitHub repositories on the browser as if you're browsing with VSCode! Brilliant.

Carbon is another code-adjacent tool that allows you to input code snippets and get a well formatted and aesthetic box to share it on whatever medium you prefer.
🌡 The Well-Tempered Traveler by Google: Beyond subsidizing earth-shattering science through advertising dollars, Google uses it's unparalleled troves of data to do more mundane things, like making planning travel easier. This is a useful resource that shows you a visual representation of weather (temperature and rainfall) of various cities to plan out the optimal time to visit.
🔊 Pink Noise: I was never a fan of white noise, as it was annoying and irksome. Pink noise on the other hand feels much more amenable to my ears and I find myself able to get into a deep state of focus much more easily, especially when my environment is peppered with noise.

👋🏽 Conclusion

This month has been rich with life and inspiration, and I'm genuinely so grateful for all of the opportunities, experiences, and people I have in my life. 24 has been one hell of a year, and I can't even imagine what 25 will be like — I'm glad to have you along for this journey. Stay tuned.

Just a reminder that you can interact by replying to this email or tweeting @nikhilthota.

Ciao,
— Thot

Thot's Thoughts

Discussion about this post