Some days ago, my friend Vinícius mentioned a project he wanted to build that involved doing Full-text searches (I'm going to call it just FTS throughout the rest of this post). This reminded that there was a previous discussion about adding search functionality into Mataroa. Well, I like PostgreSQL and I also like to write code for open source projects I have used.1
For some context, Mataroa is a "dead simple blogging" platform[^fn:2]. It is highly inspired by Bear Blog, but also introduces some other features. I'm really fond of its founder approach of doing software, and for this reason I declare open the Matora Series!
{{< alert class="info" >}} Keep in mind that things implemented here might not even enter Mataroa's upstream repositories! This is being done solely as an exercise to broaden my knowledge and perhaps make real contributions to projects. {{< /alert >}}
OK, let's get our hands dirty by first cloning the repository locally:
git clone https://git.sr.ht/~sirodoht/mataroa
Fortunately, ~sirodoht
takes really good care of documenting all the
things, from development to deployment. This makes it a lot easier to
work with the project! However, as expected from this writer right
here, let's setup our enviroment with Nix. For the past few weeks I've
been using devenv to configure the languages here and I'm going to use
it again here. ☺️
For brevity, the code below is omitting parts of the flake.nix
file:
default = devenv.lib.mkShell {
inherit inputs pkgs;
modules = [
({ pkgs, ... }: {
languages.python = {
enable = true;
venv.enable = true;
};
services.postgres = {
enable = true;
listen_addresses = "127.0.0.1";
};
packages = with pkgs; [
gcc
gnumake
nodePackages.pyright
];
})
];
};
This should give enough to start setting up the project. According to the README, I need to run the following commands to download the Python dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements_dev.txt
pip install -r requirements.txt
Now, we need to copy the .envrc.example
to .envrc
and modify it
according our needs:
export DEBUG=1
export SECRET_KEY=secret
export DATABASE_URL=postgres://localhost:5432/<user>
export EMAIL_HOST_USER=smtp-user
export EMAIL_HOST_PASSWORD=smtp-password
use flake . --impure
The environment variable DATABASE_URL
has the URL for the PostgreSQL
database setup by devenv. If you keep the initalDatabases
attribute
as an empty list, it will setup everything with the current user's
username. Running the database is really painless, just run devenv up
on your terminal.
Now, we need to add the following entries on the /etc/hosts
file to
develop locally through the URL http://mataroalocal.blog
. One is the
root domain and the other will be our develop
user domain.
127.0.0.1 mataroalocal.blog
127.0.0.1 develop.mataroalocal.blog
In order to register the new user, I went to the page
http://mataroalocal.blog:8000/accounts/create/invite/
and registered
the user develop
with the W^$a2o5Hn5jkUNmrcQotQerjL*!xWxGE
password.
This is the time to actually start writing code here. I don't know much about Django, but while reading more about PostgreSQL's FTS, I was fortunate enough to stumble upon Simon Willison's blog post talking about implementing a text search on his website. There's also Django's documentation, which is pretty extensive in this matter.
With all this information available, we can start implementing the
SearchVectorField on the Post
model and its GinIndex:
# main/models.py
class Post(models.Model):
# ...
search_post = SearchVectorField(null=True, blank=True)
class Meta:
# ...
indexes = [
GinIndex(fields=["search_post"])
]
We also need to install the django.contrib.postgres
app into our
Django application:
# mataroa/settings.py
INSTALLED_APPS = [
# ...
"django.contrib.postgres",
# ...
]
Nonetheless, there's a warning on the SearchVectorField
class
stating:
You’ll need to keep it populated with triggers, for example, as described in the PostgreSQL documentation.
This is not good, as this will introduce the first piece of custom SQL code on Mataroa. I mean, not that this is always bad, but keeping everything on the Django codebase is definitely a plus!
Fortunately, Simon Willison mentions that we can use Django's Signals to update the index on all the model updates. You can understand Signals as a way to send messages/notifications between specific senders and receivers. We can leverage this to notify a function when a post is modified, this function will then update the index for us:
# main/signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from django.contrib.postgres.search import SearchVector
from main.models import Post
@receiver(post_save, sender=Post)
def update_search_post(sender, instance, **kwargs):
Post.objects.filter(id=instance.id).update(search_vector=SearchVector('title') + SearchVector('body'))
# main/apps.py
# ...
class MainConfig(AppConfig):
name = "main"
def ready(self):
import main.signals
After all these modifications, we need to run python manage.py makemigrations
to generate the new migrations containing our new
field and index and python manage.py migrate
to apply them.
$ python manage.py makemigrations
Migrations for 'main':
main/migrations/0087_post_search_post_post_main_post_search__b3a77b_gin.py
- Add field search_post to post
- Create index main_post_search__b3a77b_gin on field(s) search_post of model post
$ python manage.py migrate
Operations to perform:
Apply all migrations: admin, auth, contenttypes, main, sessions
Running migrations:
Applying main.0087_post_search_post_post_main_post_search__b3a77b_gin... OK
For the visual part, there's a need to modify the blog_index.html
template. My idea was to put this search on the blog header, next to
the title:
<!-- main/templates/main/blog_index.html -->
<!-- ... -->
{% if blog_user.blog_title %}
<header>
<h1 itemprop="name">{{ blog_user.blog_title }}</h1>
<form action="{{ request.path }}" method="GET">
<input type="search" class="search-input" name="q" value="{{ q }}">
<input type="submit" class="search-submit" value="Search">
</form>
</header>
{% endif %}
<!-- ... -->
A small tweak on the CSS for this new header
tag:
/* main/templates/assets/style.css */
/* ... */
header {
border-bottom: 2px solid var(--light-grey-color);
display: flex;
justify-content: space-between;
align-items: center;
}
header h1 {
width: 50%;
}
header > form > input {
margin: 0;
width: 50%;
}
header > form > input[type="submit"] {
margin: 0;
width: 44%;
}
/* ... */
For now, it looks like the image below, however, if the maintainer accepts this new feature, we may discuss a new place for this search bar. Maybe on the bottom of the page like Hacker News does?
{{< image src="/blog/mataroa-series-1-introducing-search/01.png" side="center" >}} Can you tell that web development is my passion? {{< /image >}}
The last missing piece is the search itself that has to be done on the
blog_index
view. The form sends a query parameter q
and we have to
capture it on the view to filter the posts accordingly.
# main/views.py
# ...
def index(request):
search_query = request.GET.get("q", None)
if hasattr(request, "subdomain"):
if models.User.objects.filter(username=request.subdomain).exists():
if request.user.is_authenticated and request.user == request.blog_user:
posts = models.Post.objects.filter(owner=request.blog_user)
if search_query:
posts = posts.filter(search_post=search_query)
posts = posts.defer("body")
else:
models.AnalyticPage.objects.create(user=request.blog_user, path="index")
posts = models.Post.objects.filter(
owner=request.blog_user,
published_at__isnull=False,
published_at__lte=timezone.now().date(),
)
if search_query:
posts = posts.filter(search_post=search_query)
posts = posts.defer("body")
# ...
That's it, this implements the whole search logic for Mataroa. Currently, it searches for the title and body of an article and lists the matches. The only caveat being that one needs to run the following Python code to index the old data and make it searchable too:
from django.contrib.postgres.search import SearchVector
from main.models import Post
Post.objects.update(search_post=SearchVector('title') + SearchVector('body'))
-
Fun fact: Mataroa was the first blogging platform I used seriously. [^fn:2]: https://nutcroft.com/blog/welcome-to-mataroa/↩