Mataroa Series #1: Introducing Search

Some days ago, my friend Vinícius mentioned a project he wanted to build that involved doing Full-text searches (I’m going to call it just FTS throughout the rest of this post). This reminded that there was a previous discussion about adding search functionality into Mataroa. Well, I like PostgreSQL and I also like to write code for open source projects I have used.1

For some context, Mataroa is a “dead simple blogging” platform2. It is highly inspired by Bear Blog, but also introduces some other features. I’m really fond of its founder approach of doing software, and for this reason I declare open the Matora Series!

🗒️ Keep in mind that things implemented here might not even enter Mataroa’s upstream repositories! This is being done solely as an exercise to broaden my knowledge and perhaps make real contributions to projects.

OK, let’s get our hands dirty by first cloning the repository locally:

git clone https://git.sr.ht/~sirodoht/mataroa

Fortunately, ~sirodoht takes really good care of documenting all the things, from development to deployment. This makes it a lot easier to work with the project! However, as expected from this writer right here, let’s setup our enviroment with Nix. For the past few weeks I’ve been using devenv to configure the languages here and I’m going to use it again here. ☺️

For brevity, the code below is omitting parts of the flake.nix file:

default = devenv.lib.mkShell {
  inherit inputs pkgs;
  modules = [
    ({ pkgs, ... }: {
      languages.python = {
        enable = true;
        venv.enable = true;
      };

      services.postgres = {
        enable = true;
        listen_addresses = "127.0.0.1";
      };

      packages = with pkgs; [
        gcc
        gnumake
        nodePackages.pyright
      ];
    })
  ];
};

This should give enough to start setting up the project. According to the README, I need to run the following commands to download the Python dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements_dev.txt
pip install -r requirements.txt

Now, we need to copy the .envrc.example to .envrc and modify it according our needs:

export DEBUG=1
export SECRET_KEY=secret
export DATABASE_URL=postgres://localhost:5432/<user>
export EMAIL_HOST_USER=smtp-user
export EMAIL_HOST_PASSWORD=smtp-password
use flake . --impure

The environment variable DATABASE_URL has the URL for the PostgreSQL database setup by devenv. If you keep the initalDatabases attribute as an empty list, it will setup everything with the current user’s username. Running the database is really painless, just run devenv up on your terminal.

Now, we need to add the following entries on the /etc/hosts file to develop locally through the URL http://mataroalocal.blog. One is the root domain and the other will be our develop user domain.

127.0.0.1 mataroalocal.blog
127.0.0.1 develop.mataroalocal.blog

In order to register the new user, I went to the page http://mataroalocal.blog:8000/accounts/create/invite/ and registered the user develop with the W^$a2o5Hn5jkUNmrcQotQerjL*!xWxGE password.

This is the time to actually start writing code here. I don’t know much about Django, but while reading more about PostgreSQL’s FTS, I was fortunate enough to stumble upon Simon Willison’s blog post talking about implementing a text search on his website. There’s also Django’s documentation, which is pretty extensive in this matter.

With all this information available, we can start implementing the SearchVectorField on the Post model and its GinIndex:

# main/models.py
class Post(models.Model):
    # ...
    search_post = SearchVectorField(null=True, blank=True)

    class Meta:
        # ...
        indexes = [
            GinIndex(fields=["search_post"])
        ]

We also need to install the django.contrib.postgres app into our Django application:

# mataroa/settings.py
INSTALLED_APPS = [
    # ...
    "django.contrib.postgres",
    # ...
]

Nonetheless, there’s a warning on the SearchVectorField class stating:

You’ll need to keep it populated with triggers, for example, as described in the PostgreSQL documentation.

This is not good, as this will introduce the first piece of custom SQL code on Mataroa. I mean, not that this is always bad, but keeping everything on the Django codebase is definitely a plus!

Fortunately, Simon Willison mentions that we can use Django’s Signals to update the index on all the model updates. You can understand Signals as a way to send messages/notifications between specific senders and receivers. We can leverage this to notify a function when a post is modified, this function will then update the index for us:

# main/signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from django.contrib.postgres.search import SearchVector

from main.models import Post


@receiver(post_save, sender=Post)
def update_search_post(sender, instance, **kwargs):
    Post.objects.filter(id=instance.id).update(search_vector=SearchVector('title') + SearchVector('body'))
# main/apps.py
# ...
class MainConfig(AppConfig):
    name = "main"

    def ready(self):
        import main.signals

After all these modifications, we need to run python manage.py makemigrations to generate the new migrations containing our new field and index and python manage.py migrate to apply them.

$ python manage.py makemigrations
Migrations for 'main':
  main/migrations/0087_post_search_post_post_main_post_search__b3a77b_gin.py
    - Add field search_post to post
    - Create index main_post_search__b3a77b_gin on field(s) search_post of model post

$ python manage.py migrate
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, main, sessions
Running migrations:
  Applying main.0087_post_search_post_post_main_post_search__b3a77b_gin... OK

For the visual part, there’s a need to modify the blog_index.html template. My idea was to put this search on the blog header, next to the title:

<!-- main/templates/main/blog_index.html -->
<!-- ... -->
{% if blog_user.blog_title %}
  <header>
    <h1 itemprop="name">{{ blog_user.blog_title }}</h1>
    <form action="{{ request.path }}" method="GET">
      <input type="search" class="search-input" name="q" value="{{ q }}">
      <input type="submit" class="search-submit" value="Search">
    </form>
  </header>
{% endif %}
<!-- ... -->

A small tweak on the CSS for this new header tag:

/* main/templates/assets/style.css */
/* ... */
header {
    border-bottom: 2px solid var(--light-grey-color);
    display: flex;
    justify-content: space-between;
    align-items: center;
}

header h1 {
    width: 50%;
}

header > form > input {
    margin: 0;
    width: 50%;
}

header > form > input[type="submit"] {
    margin: 0;
    width: 44%;
}
/* ... */

For now, it looks like the image below, however, if the maintainer accepts this new feature, we may discuss a new place for this search bar. Maybe on the bottom of the page like Hacker News does?


Can you tell that web development is my passion?
Can you tell that web development is my passion? (full size)

The last missing piece is the search itself that has to be done on the blog_index view. The form sends a query parameter q and we have to capture it on the view to filter the posts accordingly.

# main/views.py
# ...
def index(request):
    search_query = request.GET.get("q", None)
    if hasattr(request, "subdomain"):
        if models.User.objects.filter(username=request.subdomain).exists():
            if request.user.is_authenticated and request.user == request.blog_user:
                posts = models.Post.objects.filter(owner=request.blog_user)

                if search_query:
                    posts = posts.filter(search_post=search_query)

                posts = posts.defer("body")
            else:
                models.AnalyticPage.objects.create(user=request.blog_user, path="index")
                posts = models.Post.objects.filter(
                    owner=request.blog_user,
                    published_at__isnull=False,
                    published_at__lte=timezone.now().date(),
                )

                if search_query:
                    posts = posts.filter(search_post=search_query)

                posts = posts.defer("body")
# ...

That’s it, this implements the whole search logic for Mataroa. Currently, it searches for the title and body of an article and lists the matches. The only caveat being that one needs to run the following Python code to index the old data and make it searchable too:

from django.contrib.postgres.search import SearchVector

from main.models import Post

Post.objects.update(search_post=SearchVector('title') + SearchVector('body'))

  1. Fun fact: Mataroa was the first blogging platform I used seriously. ↩︎

  2. https://nutcroft.com/blog/welcome-to-mataroa/ ↩︎


Articles from blogs I follow around the net

The four tenets of SOA revisited

Twenty years after. In the January 2004 issue of MSDN Magazine you can find an article by Don Box titled A Guide to Developing and Running Connected Systems with Indigo. Buried within the (now dated) discussion of the technology…

via ploeh blog March 4, 2024

Building a demo of the Bleichenbacher RSA attack in Rust

Recently while reading Real-World Cryptography, I got nerd sniped1 by the mention of Bleichenbacher's attack on RSA. This is cool, how does it work? I had to understand, and to understand something, I usually have to build it. Well, friends, that is what…

via ntietz.com blog March 4, 2024

How to unbreak Dolphin on SteamOS after the QT6 update

A recent update to Dolphin made it switch to QT6. This makes it crash with this error or something like it: dolphin-emu: symbol lookup error: dolphin-emu: undefined symbol: _Zls6QDebugRK11QDockWidget, version Qt_6 This is fix…

via Xe Iaso's blog March 3, 2024

Generated by openring