Skip to content

Latest commit

 

History

History
607 lines (446 loc) · 23.6 KB

README.adoc

File metadata and controls

607 lines (446 loc) · 23.6 KB

Fulcro SQL

A utility library for working with SQL databases. It includes the following:

  • A com.stuartsierra component for starting, building, connection pooling, and schema migrations.

  • A with-database test wrapper for easily writing integration tests. It manages the complete lifecycle of a test database within each test.

  • Database seeding tools with abstract ID support. Useful for testing, development, and production.

  • NOTE: MySQL may not be able to allocate IDs properly, and thus seed is not recommended for production there.

  • The ability to run a subset of Datomic’s pull API (Om Next query syntax) against SQL databases: (run-query db schema :root-key om-subquery #{ids of root entity(s)} optional-filtering)

Connection pooling is provided by HikariCP. Migrations are provided by Flyway.

Warning
This library’s graph query support should be considered a proof of concept. That part was added to show that it could be done. The library has some other useful elements (e.g. the database component setup with migrations and connection pooling), but the actual graph query support has better alternatives at this point. You should check out PathOm for general query parsing, and if you really want more of a DSL check out Walkable.

Supported Databases

At present there are tests and drivers for:

  • PostgreSQL

  • MySQL

  • H2

Adding additional support is pretty trivial. See Adding a Driver.

Usage

Add fulcro-sql to your dependencies:

fulcro sql

Then install a database manager as a component in your web server and inject it into any code that needs to run graph queries against your SQL database. See Integrating with a Fulcro Server below.

Configuring the Database Manager

The database manager expects that have a config injected into it. It assumes it will take the same basic form that is generated by Fulcro’s server config code. That is to say, config has the following shape (as a component):

{:value
  { :sqldbm
    { :database-name {:hikaricp-config "test.properties"
                      :driver        :default ; optional, defaults to postgresql
                      :database-name "dbname" ; needed only for mysql
                      :auto-migrate? true
                      :create-drop?  true
                      :migrations    ["classpath:migrations/test"]}}}}

The top two keys (:value and :sqldbm) are in the path for the map because the assumption is you’ll be using a more complex config where other data may appear that you do not want to collide with. If you are using Fulcro server, then this allows you to place this in your config file:

{ :port   3000
  :sqldbm { :database-name {:hikaricp-config "test.properties"
                            :auto-migrate?   true
                            :create-drop?    true
                            :migrations      ["classpath:migrations/test"]}}}

The allowed keys are:

  • hikaricp-config : A relative (if on classpath) or absolute (if on disk) path to the Java properties file (on classpath if relative, or disk if absolute) that contains your desired configuration for HikariCP connection pooling.

  • migrations : A vector of directories that can be prefixed with classpath: or filesystem. The named directories will be searched for migrations to run. See Flyway documentation for the naming conventions of these files.

  • auto-migrate? : When true, will have flyway attempt to bring migrations up-to-date.

  • create-drop? : When true, will drop the complete schema from the database (and all associated data) and recreate that schema from scratch.

  • database-name : The string name of the database. Needed if using mysql.

  • driver : The keyword indicating the kind of database in use (:default, :postgresql, :mysql)

Example files:

in resources/migrations we could place V1__initial.sql:

CREATE TABLE boo (id serial);

on the filesystem disk we could write /usr/local/etc/pool.props:

dataSourceClassName=org.postgresql.ds.PGSimpleDataSource
dataSource.user=test
dataSource.password=
dataSource.databaseName=test
dataSource.portNumber=5432
dataSource.serverName=localhost

Then the configuration for this database would be:

{:value
  { :sqldbm
    { :test {:hikaricp-config "/usr/local/etc/pool.props"
             :auto-migrate? true
             :create-drop? true
             :migrations      ["classpath:migrations"]}}}}

Describing Your Schema

In order for most of this library to work you must describe the parts of your schema that you want to use with it. This can be placed into an EDN map anywhere in your source. It is probably also possible to automate the generation of it with a little bit of elbow grease against your database’s metadata.

(ns schema
  (:require [fulcr-sql.core :as core]))

(def schema {::core/graph->sql {}
             ::core/joins   {:account/members [:account/id :member/account_id]}
             ::core/pks     {:account :id
                             :member  :id}})
  • ::core/graph→sql - A map from the graph query property names to SQL. The SQL naming must follow the convention :table/column, where the table and column portions exactly match a table and column name in your database. This allows you to use whatever UI properties you wish, and map them to their correct location in the database. It is OK for an entry to be missing, in which case it will be assumed the graph property name is already correct.

  • ::core/joins - A map whose key is an graph query property that is obtained by following an SQL join, and whose value is a vector of SQL :table/col keywords that describe (in order) the tables and columns that have to be traversed to resolve the join.

  • ::core/pks - A map whose keys are SQL table names (as keywords) and the columns that represent the PK for that table. These default to :id, so technically you only need them if you used something else.

Property Mapping in Detail

The graph→sql map is just one stage of the property mapping. The complete property transform process is:

  1. Look up the graph property in ::core/graph→sql, if present.

  2. The result of step (1) is processed by the multimethod graphprop→sqlprop*, dispatched by the :driver in your config. The default transform just replaces - with _.

Support for to-one:

When specifying the join sequence, simply wrap it with a call to (core/to-one …​):

(def schema { ...
              ::core/joins {:account/settings (core/to-one [:account/settings_id :settings/id])}})

would describe a to-one join from the following SQL schema:

CREATE TABLE settings (id serial primary key, ...);
CREATE TABLE account (settings_id integer references settings(id), ...);

You can include both directions in the joins:

(def schema { ...
              ::core/joins {:account/settings (core/to-one [:account/settings_id :settings/id])
                            :settings/account (core/to-one [:settings/id :account/settings_id])}})

Support for to-many:

To-many is implied by default, but you can make it explicit with a call to (core/to-many join-seq).

Support for many-to-many:

Many-to-many joins are described with four table/column keywords in the join sequence. For example, say you had the schema:

CREATE TABLE invoice (
  id           SERIAL PRIMARY KEY,
  invoice_date TIMESTAMP NOT NULL DEFAULT now()
);

CREATE TABLE item (
  id   SERIAL PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE invoice_items (
  id         SERIAL PRIMARY KEY,
  quantity   SMALLINT NOT NULL,
  invoice_id INTEGER  NOT NULL REFERENCES invoice (id),
  item_id    INTEGER  NOT NULL REFERENCES item (id)
);

You would describe the possible joins of interest as:

(def schema { ...
              ::core/joins { :invoice/items    (core/to-many [:invoice/id :invoice_items/invoice_id :invoice_items/item_id :item/id])
                             :item/invoices    (core/to-many [:item/id :invoice_items/item_id :invoice_items/invoice_id :invoice/id])}}})

This would allow you to issue the graph-query [{:invoice/items [:item/name]}] or [{:item/invoices [:invoice/invoice-date]}] relative to an invoice in the former, and an invoice item in the latter.

Many-to-many With Data On Join Table

Fulcro SQL currently does not support obtaining data from the join table itself. You must write a custom query for that scenario.

Configuring The Connection Pooling

The connection pooling is provided by HikariCP. In order to support testing, development, and adminstrative production tuning we use the properties-based configuration. This allows you to specify a Java properties file on the classpath or the disk.

The tests for this library have a connection pool set up for use with PostgreSQL in test-resources/test.properties.

Writing Migrations

The migration support is provided by Flyway. Here are the basics:

  1. Define some direction (typically in resources) that will hold SQL files.

  2. Tell this library where that is (see configuration above).

  3. Indicate during startup that you want migration to happen, or write a separate command-line utility or something that can run them by starting a database with migrations turned on (you might want to be paranoid about migrations in production).

See Flyway’s documentation for more details. You can also examine the test suite of this library, which has migrations in test-resources/migrations/test.

Seeding Data

The fulcro-sql.core/seed! function is a simple but powerful way to put data in your database for a number of development, testing, and production reasons:

  • Seeding tests

  • Adding data that has to always be in a production database

  • As a way to write convenient mutation functions. (e.g. when the creation of an object requires insertions and relations).

The seed! function is row-based, but it includes support for ID generation, relations, and the return of the IDs of interest. Here is an example:

Say I want to insert two rows: A person and their address. Address has a FK pointer back to the person. After the insertion, I’d like to know the IDs of the resulting person and address (perhaps for return to the caller, or for test assertions):

(let [rows [(core/seed-row :person {:id :id/joe :name "Joe"})
            (core/seed-row :address {:id :id/address :street "111 Nowhere" :person_id :id/joe})]
      {:keys [id/joe id/address]} (core/seed! db schema rows)]
  ... use `joe` and `address`, which are numbers that correspond to the db row PKs ...)

Keywords-as-ids must appear in a PK column before they are used anywhere else. If you fail to do this then seeding will fail with a database error, since it won’t understand the (unresolved) keyword as an ID. However, this restriction may cause you problems, since some combinations of inserts have loops in them.

In order to resolve this there is also a core/seed-update function that can be used in the vector of items to seed. It is needed when you cannot resolve the order of inserts. Say your person table had a last_updated_by column whose FK pointed to person.id. If Joe last updated Sam and Sam last updated Joe, you’d need this:

(let [rows [(core/seed-row :person {:id :id/joe :name "Joe"})
            (core/seed-row :person {:id :id/sam :name "Sam" :last_updated_by :id/joe})
            (core/seed-update :person :id/joe {:last_updated_by :id/sam})]
      {:keys [id/joe id/sam]} (core/seed! db schema rows)]
  ...)

Writing Integration Tests

Create an alternate connection pool for your tests, typically in the test source or resources of the project, that describes where you’d like to run your test database. Typically you will use the same migrations/schema as your production server.

The combination of seeding and database support makes writing a test very easy. If you’re using fulcro-spec, and have placed your migrations and test.properties on the classpath, then a test specification might look like this:

(def test-database {:hikaricp-config "test.properties"
                    :migrations      ["classpath:migrations"]})
(def schema { ... schema as described above ...})

(specification "Doing things to the database"
  (with-database [db test-database]
    (let [{:keys [rowid/a]} (core/seed! db schema [(core/seed-row :table {:id :rowid/a ...})])]
      (jdbc/query db ...)))

The with-database macro creates a let-like binding environment in which your database is started, migrated, and afterwards cleaned up. You can use seed! to populate your database, etc.

The bound variable (db) is a simple map, containing nothing but :datasource. This is a Java JDBC DataSource, and having it in the map makes it compatible with the clojure.java.jdbc library for convenience.

Integrating With a Fulcro Server

Fulcro comes with a config component that lays out configuration in a way that is compatible with the DatabaseManager component(s) in this library. Remember that the database manager can control any number of databases (of that kind).

(easy/make-fulcro-server
  ; inject config into the database manager
  :components {:dbs (component/using (fulcr-sql.core/build-db-manager {}) [:config])
  :parser-injections #{:dbs})

and now your server-side reads and mutations can access dbs in the env. You can obtain a dbspec compatible with clojure.java.jdbc using (get-dbspec dbs :dbname). This is just a map with the key :datasource whose value is a connection-pooled JDBC data source:

(defmutation boo [params]
  (action [{:keys [dbs]}]
    (let [dbspec (fulcro-sql.core/get-dbspec dbs :test)]
      (jdbc/insert! dbspec ...))))

Running a query should be relatively easy if your schema is correct. Your query code will need to derive a "root set". A root set is simply the IDs of the entities that should root the graph traversal. You might figure this out from query params, the user’s session, a special request cookie, or some other criteria.

(defquery-root :accounts
  (value [{:keys [dbs query]} params]
    (let [dbspec (core/get-dbspec dbs :test)
          account-ids-of-interest #{1 2}]
      (core/run-query dbspec schema :account/id account-ids-of-interest))))

Logging

All of the underlying logging of Flyway, HikariCP, and timbre can use SLF4J. If you configure timbre to take control of SLF4J, then you can control logging (level, etc.) from timbre without having to mess with other configuration. To do this, make sure you have the following dependencies on your classpath:

[org.slf4j/log4j-over-slf4j "1.7.25"]
[org.slf4j/jul-to-slf4j "1.7.25"]
[org.slf4j/jcl-over-slf4j "1.7.25"]
[com.fzakaria/slf4j-timbre "0.3.7"]

then you can do things like:

(timbre/set-level! :debug)

and see messages from the underlying connection pool and migration libraries.

Adding a New Driver

Fulcro-SQL is built to be customizable. Each of the core processes is defined by a multimethod that dispatches on the :driver key of the database’s config (if provided).

(graphprop→sqlprop* schema prop) - Called after the initial remapping from ::core/graph→sql. This multimethod can remap prop to an alternate form. The default just converts hypens to underscores. (sqlprop→graphprop* schema prop) - Called after the unmapping (map-invert) ::core/graph→sql explicit renames. This multimethod can remap an sqlprop back to an original form. The default just converts underscores back to hypens. (table-for* schema query) - Must return the database table name as a keyword (e.g. :account) for the given (top-level) of a graph query (e.g. [:db/id :account/name {:account/members …​}]). (column-spec* schema sqlprop) - Returns an SQL column selection that will result in clojure.java.jdbc query returning the correct data map key. For example: (column-spec* schema :account/name) ⇒ "account.name AS \"account/name\"" (next-id* db schema table) - Returns the next auto-generated ID for a new row in table. For example, in PostgreSQL on the account table, this would be the result of running SELECT nextval('account_id_seq') AS \"id\".

It is possible that your driver has the exact same logic as some other driver for some of these. In that case you can dispatch to the alternate simply by passing an altered ::sql/driver in schema:

(defmethod next-id* :my-driver [db schema table]
  (next-id* db (assoc schema ::sql/driver :default) table))

Note that all of the examples above are the :default behavior, so if your driver needs to only modify, say, the next ID behavior, then you can choose to leave the others as they are and only defmethod a dispatch for your override(s).

Performance

The algorithm used by this library runs relatively simple queries, and does the joins in-memory. It tries to be relatively efficient by processing a join via a single query that pull rows by the IDs of all of the parent rows that join to it.

Thus, a query like this:

[:db/id :account/name {:account/members [:db/id :member/name]}]

(with member being the table with the FK) will issue:

SELECT id, name FROM account;

to collect all of the ids at that level of the query, and then issue:

SELECT id, name FROM member WHERE account_id IN (...)

to get the joined rows, and then join the results in memory. This is a recursive algorithm with each level passing the row IDs found at the higher level down through the graph of queries. This means that a join three levels deep will only issue three queries independent of the number of rows returned at each level.

Of course, a graph query can have more of a tree-like shape, and each join will result in one query (overall). Technically this means that a graph query can result in an exponential growth of the actual SQL statements; however, in practice a typical graph query will not actually contain that many branches nor be that deep.

It is important for your SQL database to have indexes on all foreign keys. This algorithm assumes it will be fast to run a query with a WHERE x IN set as the only filter criteria. This algorithm also assumes that there is no (practical) limit on the number of things that can be asked for with SQL IN.

Filtering

This library supports arbitrary filtering of the SQL tables that supply data to satisfy the graph query. There is a simple mechanism for common filtering that can easily and safely be used from the client, and the underlying more general (and powerful) mechanism that is not secure for client use, and should be used with care.

Since the easy/secure one just emits a checked version of the latter, and the latter one is also what is used in the API we’ll start by describing the latter.

Fully General Filtering

This filtering mechanism has you write SQL clauses that can appear in a WHERE clause. To prevent SQL injection, the SQL clause is parameterized if you embed ?. The filters are specified as follows:

{ :table [(filter-where "table.x = ?" [3])]
  :other_table [(filter-where "other_table.deleted = false" [])]}

In other words filters are a map keyed by SQL table names (as keywords), whose values are a vector of filters to apply to that table when it is queried. Each filter is defined using filter-where:

(filter-where expr params)

Or

(filter-where expr params min-depth max-depth)

The params must be a vector (but can be empty), and the min/max depth are optional.

For example, given the following query:

[:member/name {:member/billing [:billing/address {:billing/country [:country/name]}]}]

Let’s say :member/billing is a to-one relationship, but you update that table by adding a new row and marking the old one with a deleted flag for auditing purposes. The SQL join will return too many rows without a filter, and the Fulcro client should not have to even know about this server-side concern.

So, to serve this particular query you’d include a filter like this:

{:billing [(filter-where "billing.deleted = false" [])]}

in your call to run-query.

Filter Depth

Specifying a depth for a filter indicates that the filter should only apply within that range of graph depths. They are 1-based numbers, and are inclusive.

So a query has the following depths:

[:member/name {:member/billing [:billing/address {:billing/country [:country/name]}]}]
      1               1                 2                 2                3

Thus, a filter with a min-depth of 4 would never apply to the above query, and a filter on country with a max-depth of 2 would also never apply.

Depth is calculated as the graph is executed, so recursive queries will have an ever-increasing depth. Thus, a query like this:

[:category/name {:category/subcategory ...}]

would start at depth 1, but might go many levels deep. A filter with min-depth of 3 and max-depth of 3 would only apply to the second subcategory, but not any others.

Easy Filters (client-facing)

There are times when it is useful for the Fulcro client to specify filtering parameters. The general mechanism described above allows for arbitrary SQL, so it is unsafe to use from the client; however, simple expressions can be safely encoded using the following EDN scheme: A map whose key is a prop (table/column), and whose value is a map of operator keyword to value:

{:table/column {:op value}}

For example: {:billing/deleted {:eq false}}

Depth can be configured with :min-depth and :max-depth in the argument map: {:billing/deleted {:eq false :min-depth 1 :max-depth 2}} would mean that the filter on billing.deleted = false only applies for the query portions at depths 1 and 2.

Easy filters must be transformed to the general form using filter-params→filters function:

(filter-params->filters schema {:billing/deleted {:eq false}})

which will emit the correct general-purpose filters described in the prior section:

{:billing [(filter-where "billing.deleted = ?" [false])]}

ensuring that the value is parameterized so SQL injection is impossible.

The supported operations of filter-params→filters are:

Table 1. Legal Operations
op SQL Operator

:eq

=

:ne

<>

:gt

>

:ge

>=

:lt

<

:le

:null false

IS NOT NULL

:null true

IS NULL

The values supplied by the client are not transformed in any way; therefore you must ensure that the values incoming from the client are compatible with the column types in your database.

Extending this set is trivial (see the code of filter-params→filters) but is not yet generalized to allow for driver-specific operators. If you’d like to contribute to this mechanism, please join the #fulcro Clojurians Slack channel and discuss it.

Contributing to Fulcro-SQL Development

Please join the #fulcro Slack channel in http://clojurians.slack.com. Discuss how you’d like to help.

Contributions should include tests, and all tests should be passing.

Running tests for development:

  1. Start a REPL

  2. Run (test-suite)

  3. Browse to http://localhost:8888/fulcro-spec-server-tests.html

The tests are encoded via fulcro-spec, but are just plain clj tests, so you can run them however you would normally run clojure tests; however, the output and UI are much better if you use the web-based rendering.

Support

You may be able to get questions answered on the #fulcro Clojurians Slack channel. Paid commercial support and consulting can be obtained from Fulcrologic.