• Berta Vincze

Using GraphQL for dynamic query generation

A couple of months ago, we had to deliver a proof-of-concept for a client for migrating their entire backend database system from Postgres to Snowflake. This included extending the code of a custom built tool, which “translates” JSON to SQL. The functionality of this tool is quite unique - they have various filter options coming from the frontend in JSON structure, which are then used to construct proper SQL queries through a mapper module that matches each filter to a SQL operation or function. 


The original functionality had to be kept intact, but some structural refactoring / rearranging was nevertheless required in order to parameterise the mapper functions so that they would be able to handle and use a new collection of operators for another SQL dialect. Ultimately, this proved to be a time-consuming task, as it involved lots of intermittent testing to make sure that the new code additions didn’t end up breaking or changing the original behaviour in any way. 


While the new structure provides a decent framework for introducing new SQL dialects to the tool’s kit, any new additions would require similar amounts of testing before going into production, due to the nuances of each different dialect.

When thinking about the standards of scalability, high maintenance tools like this don’t hold up very well. So how could a tool like this be optimised? First, we should look at its core functionality at a high level, which is something like this: it should dynamically build queries using a collection of filters received as input. 


I had an idea that involves using GraphQL for part the above process. Bear in mind that this would require a complete rewrite, but the end result could be a much more flexible and more easily extendable version of the original tool. In the following paragraphs, I will give a brief introduction of GraphQL, and then demonstrate the above idea through examples.

About GraphQL & its advantages


So what is GraphQL and what are the advantages of using it? GraphQL is not difficult to implement, and thus it is commonly introduced as an alternative to REST for developing APIs. While the end objective of a GraphQL API is the same as REST - getting data from the client to the server - there are many differences in its implementation and working mechanisms. 


With REST, modifying the structure and the pieces of data that is returned usually requires server-side changes to the application code. Using GraphQL, clients are able to modify the structure of the return payload on the spot, on a query-by-query basis. This allows clients to get exactly the data they need and only that - completely eliminating the issues of over or under-fetching data, which can result in a performance boost.


Another nice thing about GraphQL is that its functionality is supposed to be implemented in the business logic layer of the application - so since most database related code will normally be in the data access / ORM layer, the latter can be easily modified or even replaced entirely in case of a database migration and your GraphQL schema can remain mostly unchanged, save for a few minor updates here and there.

How does it work?


I’m going to demonstrate this idea through an example project built with SQLAlchemy, Graphene (which is the most commonly used Python library for GraphQL) along with Graphene-SQLAlchemy-Filter and Flask. I’m also using The Star Wars API for mock data.


Please note, this is not meant to be a guide on how to use GraphQL - there are plenty of articles and documentation available for getting started with this library - so the following paragraphs assume a basic level of familiarity with GraphQL schemas and their integration with SQLAlchemy data models.

Models

In my example project, I have two SQLAlchemy models, People and Planet. 

Schema

Then we have the GraphQL schema using these models:

Filters

So let’s say that we have received some filters from the frontend. They might look something like this:

{"filters": [{"value":["L%"],"op":"ilike","field":"name","table":"people"}]

Obviously you would need to write some code to parse this JSON to get the relevant variables, but I am going to skip over this step for now. For the purpose of this demo, I am also using a web-based UI, GraphiQL, to test the GraphQL queries instead of actually writing the code that would generate them, but in an actual application, the flow would have to be something like this:


JSON filter received → JSON parsed → filter variables are passed to query generator method → GraphQL query generated → GraphQL query executed against schema (This is also the point where the SQL statements are compiled)


The above filter would translate into the following GraphQL query:


And it is really as simple as it looks! I’ve implemented the filters using the Graphene-SQLAlchemy-Filter extension library. Here’s the code:

As seen above, you can define any number of filter classes, connect them to a SQLAlchemy model and define which fields can be filtered in a key-value format. The shortcut I used in line 15 will automatically add all filters that are supported by SQLAlchemy for that particular column’s data type.


Let’s extend the previous JSON filter with a new one:

{"filters": [{"value":["L%"],"op":"ilike","field":"name","table":"people"},{"value":["1,2,3"],"op":"in","field":"id","table":"planet"}]}

The above will result in a query where we join the planet and people tables together:


Filters can also be combined with and and or operators:


Drawbacks


Along with the positives, I also want to mention the drawbacks of this approach to query generation. The most apparent one being that if not monitored, the generated SQL queries can end up being highly inefficient and may not work well in a production environment where performance is important. The good news is that this can be optimised to a point by writing custom resolver methods for underperforming queries.


However, if there are too many of these ‘special case’ queries that need custom resolvers, that does reduce the usefulness of GraphQL by quite a bit, since we are basically no longer relying on it for auto-generation.

Conclusion


In conclusion, using GraphQL for query generation may not be a good fit for every use case out there, depending on the complexity of the schema and the queries. However, since it is relatively easy to implement and customise, it’s worth considering nevertheless.


It is also worth noting that GraphQL is still fairly new, and while there are a few extensions for it already in several programming languages, there are bound to be new ones in development that aim to further enhance the base functionality, and will perhaps even solve some of the issues outlined above.


GitHub repository link: https://github.com/infinitelambda/graphene-sqlalchemy-example