Libraries for writing raw SQL safely
One of my responsibilities at PropelAuth is writing example apps / guides in various languages / frameworks. It’s truly one of the most fun parts of my job. I get to play around with different stacks, new and old, and figure out the best ways to support our customers.
Because of that, I end up creating a lot of projects from scratch. Whenever I start a new project, there’s a few important choices I have to make - and one decision that I tend to spend a lot of time on is:
What DB library should I use?
To me, the libraries that I gravitate towards are the ones that have as few layers of abstraction between the code I’m writing and the SQL query itself.
One reason for this is just practicality - since I switch languages often, I don’t have as much time to get fluent in any particular ORM. I’ve also had jobs in the past that had heavy data science components to them, so SQL is something I am very comfortable with.
But I’m also a developer that tends to dislike “magic” - so I avoid libraries where I can’t easily tell what the generated SQL will look like, or libraries where I feel like I’m spending all my time googling “how to do join in X” followed by “how to do join in X with two conditions”.
In this post, I wanted to highlight a few libraries that I frequently reach for, as well as ones that I’m excited to try out, that all try and minimize the difference between the code I write and the SQL that is executed.
My personal favorite: SQLx
One of my favorite Rust crates is SQLx.
In their own words:
SQLx supports compile-time checked queries. It does not, however, do this by providing a Rust API or DSL (domain-specific language) for building queries. Instead, it provides macros that take regular SQL as input and ensure that it is valid for your database. The way this works is that SQLx connects to your development DB at compile time to have the database itself verify (and return some info on) your SQL queries.
Put differently, SQLx let’s you write a query like this:
let row = sqlx::query!(r#"
SELECT enail
FROM user
WHERE user_id = ?
"#, user_id)
.fetch_one(&pool)
.await?;
which might seem standard, but when you compile your code, you’ll get an error like this:
error returned from database: column "enail" of relation "user" does not exist
This same compile-time checking also applies to complex queries:
SELECT job_id, job_data
FROM job_queue
WHERE job_status = 'Queued' AND run_at >= NOW()
ORDER BY run_at ASC
FOR UPDATE SKIP LOCKE -- oops
LIMIT 1
error returned from database: syntax error at or near "LOCKE"
Since the query is checked against the database, this will also work with any extensions you have installed.
Why is this cool?
The incredible thing about this is we are literally just writing SQL. But unlike a crate like postgres, which also allows you to write raw SQL, SQLx prevents us from making silly mistakes.
This does come at a small cost - we now have a compile-time dependency on a database, but SQLx addresses this with an “offline mode.” When your DB is available, you can generate a file with all the validated queries, and then in your build, SQLx will check this file instead of the database.
In my quest to minimize the difference between the code I write and and the SQL that’s executed, with SQLx there is both no difference AND I didn’t have to sacrifice safety to get it.
Generate TS interfaces for your SQL with PgTyped
As it often goes in the JavaScript/TypeScript ecosystem, there’s a lot of options here.
There are options like Kysely which generate TS types from your database and then provide both a query builder and a way to write raw SQL. There’s Drizzle, which is a query builder, but it’s stated goal is reducing the delta between the TS code you write and generated SQL. There’s even a SQLx port that I have not yet had a chance to try.
But the library that best matches what I am looking for here is PgTyped. With PgTyped, you define your queries in separate files like so:
/* @name FindEmailById */
SELECT email FROM user WHERE user_id = :userId;
You then run a command npx pgtyped -c config.json
which generates a function with proper types based on your schema:
export interface IFindEmailByIdParams {
userId?: string | null;
}
export interface IFindEmailByIdResult {
email: string
}
export const findEmailById = new PreparedQuery< // ...
You can call that function to get the results from the DB. Importantly, if your query is wrong (let’s say it references a column that doesn’t exist), you get an error like this:
Error in query. Details: {
errorCode: 'errorMissingColumn',
hint: 'Perhaps you meant to reference the column "user.email".',
message: 'column "enail" does not exist',
position: '7'
}
This means you not only get to write raw SQL safely - your application code gets a nice TS abstraction to call (or mock in tests).
The biggest downside to PgTyped is this Github issue - the nullability of types isn’t respected, which can be pretty frustrating as it means you might reasonably pass in null for a required field. Another downside is its specific to Postgres… more on that later in the “portability” section.
Prisma recently released TypedSQL - a “a new way to write raw SQL queries in a type-safe way.” They mention that part of the inspiration was both SQLx and PgTyped, so I am excited to try it out!
Something for the Python world: PugSQL
A library I enjoy when I switch to Python is PugSQL (Python). Similar to PgTyped, you create separate SQL files for your queries like this:
-- :name find_email :one
select email from users where user_id = :user_id
which will create a function that you can call:
email = queries.find_email(user_id=42)
The downside (relative to the previous libraries) is these queries aren’t automatically checked for issues. That being said, some tests can surface most (all?) the issues with the query itself - you just need to write them.
If you are feeling fancy, it’s possible to add your own automation which will check the queries. There are ways to verify a query against a DB without running the query - it’s just some additional work. Each query being in its own file makes it a bit easier to automate since you don’t need to go parse out the queries in the first place.
Mark up your interfaces with JDBI
Whenever I talk about how much I liked Dropwizard, I usually get met with blank stares. It’s a bit of a deeper cut in the Java world relative to Spring (either that or normal people don’t discuss Dropwizard at parties).
One of the reasons I liked Dropwizard so much was just because it came with JDBI. That library allowed you to annotate the functions on an interface with SQL queries and it would generate the implementation for you.
public interface UserDAO {
@SqlQuery("select email from user where user_id = :user_id")
String fetchEmail(@Bind("user_id") String userId);
}
final UserDAO userDao = database.onDemand(UserDAO.class);
Again though, this would require additional testing to find issues in the queries.
I should also mention that Spring Data JPA does also have the same concept with it’s @Query
annotation. It’s been a very long time, but back when I was comparing JDBI and Spring Data JPA - I always felt like Spring was trying to get me to use it’s more magical “function name to sql query” methods. Upon re-reading the docs recently though, I was wrong, and it does mention that you can fallback to @Query
pretty frequently.
Other considerations
“Use it sparingly”
If you followed some of the links in this post, you’ll find that some of these libraries don’t advocate for this approach as the primary way to query the database.
TypedSQL describes it as an escape hatch for when querying via their ORM isn’t sufficient. Same for Spring Data JPA which describes it as “fine for a small number of queries”.
This isn’t an unfounded claim - if you go down the path of writing raw SQL for every query, it can be pretty verbose. There are absolutely times where I am making a simple, boring table that’s basically just a key-value store, and the exercise in writing INSERT INTO boring_table VALUES (...)
and SELECT * FROM boring_table WHERE ...
etc is just a typing exercise.
A library that provides the best of both worlds seems great! The devil is really in the details, as it depends on what you consider to be complex enough to warrant writing raw SQL and how frequently those queries come up.
Portability
One issue with the raw SQL approach is it’s less portable. If you are using an ORM, that ORM often will be compatible with more than just the database you are currently working with.
This can mean small things like running sqlite locally and a different DB in production - or big things like making it easier to migrate your database to something else.
Again, your mileage may vary here - it’s really dependent on how much you care about this.
Use a query builder instead
Going back to the java ecosystem, a popular library is jOOQ. With jOOQ, you aren’t writing raw SQL, but it’s very close:
To me, this is great! My stated goal was just keeping the delta between my code and the generated SQL as little as possible, so query builders like jOOQ or Drizzle do a good job of keeping that delta small.
Not all query builders are made equal here, as I tend to dislike ones like Knex which have a larger delta.
Summary
- Raw SQL libraries like SQLx, PgTyped, and JDBI allow writing SQL directly while providing safety and type checking.
- These libraries aim to minimize the gap between code and executed SQL, with some offering benefits like compile-time checking and generated type interfaces.
- Alternatives include query builders like jOOQ and Drizzle, where you are directly writing SQL, but the gap is still small.
- Considerations when choosing a DB library include portability, verbosity, and the need for complex queries versus simple CRUD operations.