I’ve been working with SQL databases for over 20 years but this year was my first time using a NoSQL database.
Previous conversations with developers often went along the lines of them trying to tell me NoSQL is “better” and me not quite seeing a benefit. At the time I never really had a reason to try something different - what I had worked and was low risk.
It was during Coronavirus lockdown, I was home schooling so couldn’t take on a commercial project.
I took on a pro bono project to help a friend running a small business by setting up a simple booking system.
Usually in the projects I’ve worked on as part of a team salaries have been the biggest cost and hosting costs are relatively minor unless the site is getting a lot of traffic.
Because I was working for free, for a business that couldn’t operate at capacity - I needed to keep hosting costs down.
Mostly I’ve worked on PHP sites (many using Drupal) with a MySQL database backend.
Just paying for a database service was a barrier for this project.
I’d recently setup a couple of static sites - one with Amazon Web Services, and one with Google Firebase. Firebase was hands down the easiest to use with a good free tier and very simple hosting including a free SSL certificate. By comparison AWS needed a lot more setup for the same result.
I prefer to use Open Source technologies with good support - to avoid vendor lock in. But this was a short term project and seemed a perfect time to try something new.
For the traffic levels I expected I was confident we would be well inside the free tier including :
- Static asset hosting
- Cloud functions (like AWS lambda)
- Firestore NoSQL database
People make a big deal of how websites scale up to deal with large amounts of traffic. One of the beauties of this system is that it also scales down with zero costs (and zero carbon footprint) when it is idle.
In short: Firebase is Free for my use-case and scales from zero to well beyond where I need to go.
scales from zero
The downside was learning something new - on a commercial project with a tight deadline I would have been wary of this. But I had time, there was no money on the line, I was doing this partly in order to learn something new anyway.
The Firestore Paradigm Shift
The Client Accesses the Database Directly
This was the first big shock to me. I’m used to doing everything to prevent this - where all data access goes via PHP code and is handled very carefully to avoid things like SQL injection attacks.
Firestore separates out data, query language, and schema changes - you can’t have an injection attack because you can’t have extra commands hidden in data.
The Firestore user is the Real Logged in user
With SQL, access is granted to the webserver user - which must be able to perform any and all functions the application needs. It is unusual in my experience to structure an application so that different front end users result in different database users.
With Firestore, the user identity of the logged in user on the website is used to validate access.
Firestore Allows Granular Access Control
Firestore also offers more granular access control with rules . A common pattern is to define public read-only data, and private data only readable (or writable) by the user who owns it.
This means firestore is a really good fit for apps that mainly store user data for use only by that user - because yes you could hack that data but you are only hacking your own data.
With SQL you define the structure of data when you create the database. Firestore schema just depends on what data you put into it.
You can define additional validation rules for Firebase - for this stage of the project I didn’t do that as it didn’t seem critical for this short term project.
One of the consequences of this is what happens if you change the data design
Part way through the project I added the ability to cancel bookings and added a cancelled field to booking data from then on.
In SQL this would have meant adding a “cancelled” column to the bookings table with a default value FALSE.
All rows (new and old) would have a value and a query like this would be trivial.
SELECT * from bookings WHERE cancelled == FALSE;
In Firestore you can’t do this - the old data didn’t have a false value - they didn’t have a cancelled field and you can’t query on the non-existence of data.
In the end I had to write a conversion tool that added the field to all old rows. Not a huge problem but a change I hadn’t anticipated,
You can’t do things like count the number of bookings directly.
SELECT count(*) from bookings;
What you can do is either keep a count somewhere and increment it for each booking - or fetch all bookings and count them.
You just can’t do them.
I did at one point refactor some data and I had to write conversion scripts that selected all documents iterated through them and create new data structures via admin functions.
If I had been able to use SQL this would have been a lot easier and more scaleable.
With Firestore I could process in batches of 500 documents.
If you need to perform actions that affect more than just one users data you may need to call an admin function that does some validation.
In my case when a user made a booking I could add the booking to the users records but I also wanted to track the bookings centrally - and ensure I didn’t overbook.
So I used a callable function which calls a cloud function - running in my cloud account under my control (not in a browser which could be being used malicously)
This function has admin level access and performs validation before finalising the booking.
It is a lot slower than writing data direct to Firestore - especially for this low traffic site as the cloud function (much like an AS lambda) is subject to cold starts. However this only impacts users who are logged in and are making a booking at which point they have some commitment - and really it is only slow compared to the rest of the site and is still faster than some PHP booking systems.
A better way to do this may be to have the user write data direct to firestore and trigger the validation function from the data write instead of from the http call. This wouldn’t actually speed it up overall but could provide better progress feedback and could be designed to feel faster.
I haven’t used this yet - but Firestore is designed with offline access in mind. If the website is build using a PWA with a service worker it should work offline.
The callable functions won’t work offline but data access does to some extent.
While offline it can’t access the Firestore database but can access any cached Firestore data and can queue writes to be sent later (and the app can use that data in the meantime)
In my case this would mean a user being able to access a (slightly stale) list of availability and (if I replaced the callable function) they could make a booking request all offline. This would go into a pending state and be synced when online - at that point the validation function would kick in and hopefully confirm the booking.
It’s not going to replace SQL for all cases.
But for many cases it is a great fit.
When it fits it is really fast and lightweight.
I love the ability to scale up and down - I’ve seen projects really struggle with SQL as a bottleneck or pay loads of ra big fast database that just isn’t used most of the time.
I will be using it again.