Mataroa Series #1: Introducing Search

Some days ago, my friend Vinícius mentioned a project he wanted to build that involved doing Full-text searches (I’m going to call it just FTS throughout the rest of this post). This reminded that there was a previous discussion about adding search functionality into Mataroa. Well, I like PostgreSQL and I also like to write code for open source projects I have used.1

For some context, Mataroa is a “dead simple blogging” platform2. It is highly inspired by Bear Blog, but also introduces some other features. I’m really fond of its founder approach of doing software, and for this reason I declare open the Matora Series!

🗒️ Keep in mind that things implemented here might not even enter Mataroa’s upstream repositories! This is being done solely as an exercise to broaden my knowledge and perhaps make real contributions to projects.

OK, let’s get our hands dirty by first cloning the repository locally:

git clone https://git.sr.ht/~sirodoht/mataroa

Fortunately, ~sirodoht takes really good care of documenting all the things, from development to deployment. This makes it a lot easier to work with the project! However, as expected from this writer right here, let’s setup our enviroment with Nix. For the past few weeks I’ve been using devenv to configure the languages here and I’m going to use it again here. ☺️

For brevity, the code below is omitting parts of the flake.nix file:

default = devenv.lib.mkShell {
  inherit inputs pkgs;
  modules = [
    ({ pkgs, ... }: {
      languages.python = {
        enable = true;
        venv.enable = true;
      };

      services.postgres = {
        enable = true;
        listen_addresses = "127.0.0.1";
      };

      packages = with pkgs; [
        gcc
        gnumake
        nodePackages.pyright
      ];
    })
  ];
};

This should give enough to start setting up the project. According to the README, I need to run the following commands to download the Python dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements_dev.txt
pip install -r requirements.txt

Now, we need to copy the .envrc.example to .envrc and modify it according our needs:

export DEBUG=1
export SECRET_KEY=secret
export DATABASE_URL=postgres://localhost:5432/<user>
export EMAIL_HOST_USER=smtp-user
export EMAIL_HOST_PASSWORD=smtp-password
use flake . --impure

The environment variable DATABASE_URL has the URL for the PostgreSQL database setup by devenv. If you keep the initalDatabases attribute as an empty list, it will setup everything with the current user’s username. Running the database is really painless, just run devenv up on your terminal.

Now, we need to add the following entries on the /etc/hosts file to develop locally through the URL http://mataroalocal.blog. One is the root domain and the other will be our develop user domain.

127.0.0.1 mataroalocal.blog
127.0.0.1 develop.mataroalocal.blog

In order to register the new user, I went to the page http://mataroalocal.blog:8000/accounts/create/invite/ and registered the user develop with the W^$a2o5Hn5jkUNmrcQotQerjL*!xWxGE password.

This is the time to actually start writing code here. I don’t know much about Django, but while reading more about PostgreSQL’s FTS, I was fortunate enough to stumble upon Simon Willison’s blog post talking about implementing a text search on his website. There’s also Django’s documentation, which is pretty extensive in this matter.

With all this information available, we can start implementing the SearchVectorField on the Post model and its GinIndex:

# main/models.py
class Post(models.Model):
    # ...
    search_post = SearchVectorField(null=True, blank=True)

    class Meta:
        # ...
        indexes = [
            GinIndex(fields=["search_post"])
        ]

We also need to install the django.contrib.postgres app into our Django application:

# mataroa/settings.py
INSTALLED_APPS = [
    # ...
    "django.contrib.postgres",
    # ...
]

Nonetheless, there’s a warning on the SearchVectorField class stating:

You’ll need to keep it populated with triggers, for example, as described in the PostgreSQL documentation.

This is not good, as this will introduce the first piece of custom SQL code on Mataroa. I mean, not that this is always bad, but keeping everything on the Django codebase is definitely a plus!

Fortunately, Simon Willison mentions that we can use Django’s Signals to update the index on all the model updates. You can understand Signals as a way to send messages/notifications between specific senders and receivers. We can leverage this to notify a function when a post is modified, this function will then update the index for us:

# main/signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from django.contrib.postgres.search import SearchVector

from main.models import Post


@receiver(post_save, sender=Post)
def update_search_post(sender, instance, **kwargs):
    Post.objects.filter(id=instance.id).update(search_vector=SearchVector('title') + SearchVector('body'))
# main/apps.py
# ...
class MainConfig(AppConfig):
    name = "main"

    def ready(self):
        import main.signals

After all these modifications, we need to run python manage.py makemigrations to generate the new migrations containing our new field and index and python manage.py migrate to apply them.

$ python manage.py makemigrations
Migrations for 'main':
  main/migrations/0087_post_search_post_post_main_post_search__b3a77b_gin.py
    - Add field search_post to post
    - Create index main_post_search__b3a77b_gin on field(s) search_post of model post

$ python manage.py migrate
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, main, sessions
Running migrations:
  Applying main.0087_post_search_post_post_main_post_search__b3a77b_gin... OK

For the visual part, there’s a need to modify the blog_index.html template. My idea was to put this search on the blog header, next to the title:

<!-- main/templates/main/blog_index.html -->
<!-- ... -->
{% if blog_user.blog_title %}
  <header>
    <h1 itemprop="name">{{ blog_user.blog_title }}</h1>
    <form action="{{ request.path }}" method="GET">
      <input type="search" class="search-input" name="q" value="{{ q }}">
      <input type="submit" class="search-submit" value="Search">
    </form>
  </header>
{% endif %}
<!-- ... -->

A small tweak on the CSS for this new header tag:

/* main/templates/assets/style.css */
/* ... */
header {
    border-bottom: 2px solid var(--light-grey-color);
    display: flex;
    justify-content: space-between;
    align-items: center;
}

header h1 {
    width: 50%;
}

header > form > input {
    margin: 0;
    width: 50%;
}

header > form > input[type="submit"] {
    margin: 0;
    width: 44%;
}
/* ... */

For now, it looks like the image below, however, if the maintainer accepts this new feature, we may discuss a new place for this search bar. Maybe on the bottom of the page like Hacker News does?


Can you tell that web development is my passion?
Can you tell that web development is my passion? (full size)

The last missing piece is the search itself that has to be done on the blog_index view. The form sends a query parameter q and we have to capture it on the view to filter the posts accordingly.

# main/views.py
# ...
def index(request):
    search_query = request.GET.get("q", None)
    if hasattr(request, "subdomain"):
        if models.User.objects.filter(username=request.subdomain).exists():
            if request.user.is_authenticated and request.user == request.blog_user:
                posts = models.Post.objects.filter(owner=request.blog_user)

                if search_query:
                    posts = posts.filter(search_post=search_query)

                posts = posts.defer("body")
            else:
                models.AnalyticPage.objects.create(user=request.blog_user, path="index")
                posts = models.Post.objects.filter(
                    owner=request.blog_user,
                    published_at__isnull=False,
                    published_at__lte=timezone.now().date(),
                )

                if search_query:
                    posts = posts.filter(search_post=search_query)

                posts = posts.defer("body")
# ...

That’s it, this implements the whole search logic for Mataroa. Currently, it searches for the title and body of an article and lists the matches. The only caveat being that one needs to run the following Python code to index the old data and make it searchable too:

from django.contrib.postgres.search import SearchVector

from main.models import Post

Post.objects.update(search_post=SearchVector('title') + SearchVector('body'))

  1. Fun fact: Mataroa was the first blogging platform I used seriously. ↩︎

  2. https://nutcroft.com/blog/welcome-to-mataroa/ ↩︎


Articles from blogs I follow around the net

Variations of the Range kata

In the languages I usually employ. The Range kata is succinct, bordering on the spartan in both description and requirements. To be honest, it's hardly the most inspiring kata available, and yet it may help showcase a few in…

via ploeh blog January 1, 2024

Some notes on NixOS

Hello! Over the holidays I decided it might be fun to run NixOS on one of my servers, as part of my continuing experiments with Nix. My motivation for this was that previously I was using Ansible to provision the server, but then I’d ad hoc installed a bunch…

via Julia Evans January 1, 2024

I found some of my first code! Annotating and reflecting on robotics code from 2009.

In high school, one of my teachers shattered my plans for my life, in the most beautiful way. Most of my life, I'd intended to become a math professional of some sort: a math teacher, when that was all I saw math for; an actuary, when I started to lear…

via ntietz.com blog January 1, 2024

Generated by openring