Query Caching in Persistent
Goal:
- Get query caching in
persistent operations.
- ... Such that the cache can be busted by writes
- ... On specific tables
- ... But not accounting for other threads or other servers doing this
Or, we want to have a similar feature to ActiveRecord caching.
Persistent Background
The StatementCache
persistent is, for the most part, implemented as a ReaderT backend, where backend in our case is SqlBackend.
A SqlBackend includes a connStmtMap - a StatementCache which closes over an IORef (Map Text Statement).
The Text contains the query that we are issuing.
That means we can probably hook into this in order to modify our own cache and check whether or not we want to proceed.
Let's trace this and see how get works with the statement cache.
Implementing get
The function get is defined on PersistStoreRead - the actual definition is reproduced here:
class
( Show (BackendKey backend)
, Read (BackendKey backend)
, Eq (BackendKey backend)
, Ord (BackendKey backend)
, PersistCore backend
, PersistField (BackendKey backend)
, A.ToJSON (BackendKey backend)
, A.FromJSON (BackendKey backend)
) =>
PersistStoreRead backend
where
get
:: forall record m
. (MonadIO m, PersistRecordBackend record backend)
=> Key record -> ReaderT backend m (Maybe record)
Note that - unfortunately - we define this as polymorphic on the backend, not the m, meaning we need to make a newtype CachedSqlBackend = CachedSqlBackend SqlBackend to use override this method directly.
The implementation for SqlBackend is here:
instance PersistStoreRead SqlBackend where
get k = do
mEs <- getMany [k]
return $ Map.lookup k mEs
-- inspired by Database.Persist.Sql.Orphan.PersistQuery.selectSourceRes
getMany [] = return Map.empty
getMany ks@(k : _) = do
conn <- ask
let
t = entityDef . dummyFromKey $ k
let
cols = commaSeparated . Foldable.toList . keyAndEntityColumnNames t
let
wher = whereStmtForKeys conn ks
let
sql =
T.concat
[ "SELECT "
, cols conn
, " FROM "
, connEscapeTableName conn t
, " WHERE "
, wher
]
let
parse vals =
case parseEntityValues t vals of
Left s ->
liftIO $
throwIO $
PersistMarshalError ("getBy: " <> s)
Right row -> return row
withRawQuery sql (Foldable.foldMap keyToValues ks) $ do
es <- CL.mapM parse .| CL.consume
return $ Map.fromList $ fmap (\e -> (entityKey e, entityVal e)) es
So we end up calling withRawQuery, which calls rawQueryRes, which does two things of note:
getStmtConn conn sql
stmtQuery stmt vals
getStmtConn looks up the statement in the SqlBackend's StatementCache.
If the statement is present, it returns it.
Otherwise, it does connPrepare conn sql to create an initial Statement, and then adds some extra logic to prevent the Statement from being used after finalized.
Finally, we call hookGetStatement, a user-customizable hook of type SqlBackend -> Text -> Statement -> IO Statement.
So what is connPrepare doing?
Fort postgresql, we define it here, as prepare':
prepare' :: PG.Connection -> Text -> IO Statement
prepare' conn sql = do
let
query = PG.Query (T.encodeUtf8 sql)
return
Statement
{ stmtFinalize = return ()
, stmtReset = return ()
, stmtExecute = execute' conn query
, stmtQuery = withStmt' conn query
}
So "finalize" and "reset" don't do anything.
This is OK because withStmt' doesn't actually create a prepared statement, it executes the query one-off.
Which, y'know, maybe isn't ideal, but hey, whatever.
Trouble
The Text that is stored includes ? placeholders, so we will need to keep track of the [PersistValue] that each Text is called with.
ie, get (OrganizationKey 1) and get (OrganizationKey 2) are going to have the same query text:
SELECT *
FROM organizations
WHERE id = ?
This means we probably want a type like HashMap Text (HashMap [PersistValue] [[PersistValue]]) - ie, a mapping of Text queries, to a mapping of query arguments, to the results of that query.
Summary of Available Touchpoints
- We can make a
newtype over SqlBackend and define separate instances of PersistStoreRead etc.
- But this doesn't help us much in a
PersistentOperation m => ... world, because the backend is often tied to the record, and we'd have to change PersistentOperationSql to be compatible with SqlBackend and not equal to SqlBackend. Maybe not a huge deal, but doesn't feel promising + lots of work.
- We can modify a
SqlBackend directly to enhance it with this caching ability.
- We have
connPrepare which creates Statements, and hookGetStatement which is able to alter a Statement before returning it.
Statement itself can be overriden at the stmtQuery point.
So,
-- called when we actually execute the statement
stmtQuery :: Statement -> forall m. MonadIO m => [PersistValue] -> Acquire (ConduitM () [PersistValue] m ())
-- called when we prepare the statement on first execution
connPrepare :: SqlBackend -> Text -> IO Statement
-- called when we retrieve the statement prior to execution
hookGetStatement :: SqlBackendHooks -> SqlBackend -> Text -> Statement -> IO Statement
A Strategy for Reading Cache
I think we can have hookGetStatement install a cache lookup in stmtQuery.
We have the Text of the query that will be keyed on in hookGetStatement, and stmtQuery receives the [PersistValue] of the arguments to the query.
If we are closing over some Cache type like Map Text (Map [PersistValue] [[PersistValue]]), then we can do a lookup on that Map with the Text query, and then the stmtQuery can do a lookup on the Map with the [PersistValue] arguments.
If that exists, we return it, if not, we actually run the query.
This level of references and mutability means we would probably do best with an StmContainers.Map Text (StmContainers.Map [PersistValue] [[PersistValue]]).
hookGetStatement will look up the mutable Map, then provide the mutable Map to stmtQuery, which can look things up on actual execution.
Writing Cache
Now, actually populating the cache may be more difficult.
stmtQuery returns a Acquire (ConduitM () [PersistValue] m ()).
An Acquire is a type that can be used to provide somewhat manual free/release - for example, see rawQuery.
In rawQueryRes, we do mkAcquire (log >> getStmtConn conn sql) stmtReset.
So stmtReset :: Statement -> IO () is called, but without any further information.
If we want to use this as a point to write to our cache (feels reasonable), we'll need to smuggle that information in somehow.
Can we modify Statement and stmtQuery to save rows as they come through?
We can fmap over an Acquire, allowing us to write ConduitM () [PersistValue] m () -> ConduitT () [PersistValue] m ().
I think we can have something like,
modifyAcquireToSaveRows
:: ([PersistValue] -> [PersistValue] -> IO ())
-- ^ for @args@, save current @row@
-> ([PersistValue] -> IO ())
-- ^ trigger that cache is complete for these arguments
-> Statement
-> Statement
modifyAcquireToSaveRows saveRow signalArgsComplete statement =
statement
{ stmtQuery = \args -> do
src <- stmtQuery statement args
pure
( src
.| saveRowC args
)
}
where
saveRowC args = do
mrow <- await
case mrow of
Nothing -> do
-- query is over, signal we can save for this
signalArgsComplete args
Just row -> do
-- query is ongoing
saveRow args row
yield row
Then, to write cache,
modifyStatementToSaveCache
:: Chan [PersistValue]
-- ^ The accumulated channel of persist values
-> IO QueryFullyCompleted
-- ^ Action to return whether query completed
-> ([[PersistValue]] -> IO ())
-- ^ Action to write accumulated values to cache
-> Statement
-> Statement
modifyStatementToSaveCache values didQueryComplete writeRowsToCache statement =
statement
{ stmtReset = do
compleat <- didQueryComplete
case compleat of
QueryUnfinished -> do
-- don't want to save here, so flush cache
_ <- getChanContents values
pure ()
QueryFinished -> do
rows <- getChanContents values
writeRowsToCache rows
stmtReset statement
}
This leaves a lot of implementation details unsaid.
cachedQueries implementation
I think, with the above, we can write a function like this:
cachedQueries :: (PersistentReadOperationSql m) => m a -> m a
cachedQueries action = do
stuff <- unsafeLiftPersistentReadOperation mkCachingImplements
localSqlBackend
(\sqlBackend -> addRequisiteHooks stuff sqlBackend)
action
This will modify the local SqlBackend with caching inside of the block.
Cache Busting
We definitely want a way to say "don't use cache in this lexical scope."
cachedQueries do
get $ OrganizationKey 10 -- database fetch
get $ OrganizationKey 10 -- cache hit
noCaching do
get $ OrganizationKey 10 -- database fetch
With the above design, cachedQueries is closing over that local state, and it is not made available to the inner action.
So noCaching has no ability to read or modify this - the side channels have all been completely encapsulated.
cachedQueries could provide a handle of sorts, but this breaks the lexical nature of it.
We want to call arbitrary queries, and those arbitrary queries shouldn't have to know if they have a cache or not.
However... SqlBackend does have a connVault, a Vault which allows us to store arbitrary values.
We could store the caching stuff in the Vault, along with a flag indicating whether or not the cache should be used.
Hooks, etc could consult this entry in the Vault.
Hm.
Should noCaching force just a database lookup, or also an update to cache?
Consider:
cachedQueries do
get $ OrganizationKey 10 -- database fetch
-- somewhere else, someone modifies the organization in db
noCaching do
get $ OrganizationKey 10 -- database fetch
get $ OrganizationKey 10 -- cache hit -- but from where?
I think noCaching should probably always hit the database, but also always update the cache.
Don't cache writes!!
We definitely don't want our mechanism here to cache write queries.
Many of them go through stmtExecute, but some will go through stmtQuery.
So we need to have checking that the query text is not an INSERT or UPDATE or DELETE.
Cache Busting on Writes
The above design has a mapping Query -> Arguments -> Results.
But this is no good. Consider:
cachedQueries do
get $ OrganizationKey 10 -- database fetch
update (OrganizationKey 10) [OrganizationName =. "blah"]
get $ OrganizationKey 10 -- cache hit, known stale value
This will cause bugs.
So we want a way to know which tables have been modified by an update query, and then invalidate just those entries in the cache.
One option is to just invalidate the entire cache on every update/insert.
This works and is very easy to implement.
Another, more difficult, option is to only invalidate cached queries that read from tables that have been modified.
To do this, we'd need to parse the query text for tables, and then scan each cached query text for a mention of the table.
If that table is mentioned, we clear the cache for that query.
This may also be somewhat slow to operate.
Query Caching in Persistent
Goal:
persistentoperations.Or, we want to have a similar feature to ActiveRecord caching.
Persistent Background
The
StatementCachepersistentis, for the most part, implemented as aReaderT backend, wherebackendin our case isSqlBackend.A
SqlBackendincludes aconnStmtMap- aStatementCachewhich closes over anIORef (Map Text Statement).The
Textcontains the query that we are issuing.That means we can probably hook into this in order to modify our own cache and check whether or not we want to proceed.
Let's trace this and see how
getworks with the statement cache.Implementing
getThe function
getis defined onPersistStoreRead- the actual definition is reproduced here:Note that - unfortunately - we define this as polymorphic on the backend, not the
m, meaning we need to make anewtype CachedSqlBackend = CachedSqlBackend SqlBackendto use override this method directly.The implementation for
SqlBackendis here:So we end up calling
withRawQuery, which callsrawQueryRes, which does two things of note:getStmtConn conn sqlstmtQuery stmt valsgetStmtConnlooks up the statement in theSqlBackend'sStatementCache.If the statement is present, it returns it.
Otherwise, it does
connPrepare conn sqlto create an initialStatement, and then adds some extra logic to prevent theStatementfrom being used after finalized.Finally, we call
hookGetStatement, a user-customizable hook of typeSqlBackend -> Text -> Statement -> IO Statement.So what is
connPreparedoing?Fort postgresql, we define it here, as
prepare':So "finalize" and "reset" don't do anything.
This is OK because
withStmt'doesn't actually create a prepared statement, it executes the query one-off.Which, y'know, maybe isn't ideal, but hey, whatever.
Trouble
The
Textthat is stored includes?placeholders, so we will need to keep track of the[PersistValue]that eachTextis called with.ie,
get (OrganizationKey 1)andget (OrganizationKey 2)are going to have the same query text:This means we probably want a type like
HashMap Text (HashMap [PersistValue] [[PersistValue]])- ie, a mapping of Text queries, to a mapping of query arguments, to the results of that query.Summary of Available Touchpoints
newtypeoverSqlBackendand define separate instances ofPersistStoreReadetc.PersistentOperation m => ...world, because the backend is often tied to the record, and we'd have to changePersistentOperationSqlto be compatible withSqlBackendand not equal toSqlBackend. Maybe not a huge deal, but doesn't feel promising + lots of work.SqlBackenddirectly to enhance it with this caching ability.connPreparewhich createsStatements, andhookGetStatementwhich is able to alter aStatementbefore returning it.Statementitself can be overriden at thestmtQuerypoint.So,
A Strategy for Reading Cache
I think we can have
hookGetStatementinstall a cache lookup instmtQuery.We have the
Textof the query that will be keyed on inhookGetStatement, andstmtQueryreceives the[PersistValue]of the arguments to the query.If we are closing over some
Cachetype likeMap Text (Map [PersistValue] [[PersistValue]]), then we can do alookupon thatMapwith theTextquery, and then thestmtQuerycan do alookupon theMapwith the[PersistValue]arguments.If that exists, we return it, if not, we actually run the query.
This level of references and mutability means we would probably do best with an
StmContainers.Map Text (StmContainers.Map [PersistValue] [[PersistValue]]).hookGetStatementwill look up the mutableMap, then provide the mutableMaptostmtQuery, which can look things up on actual execution.Writing Cache
Now, actually populating the cache may be more difficult.
stmtQueryreturns aAcquire (ConduitM () [PersistValue] m ()).An
Acquireis a type that can be used to provide somewhat manual free/release - for example, seerawQuery.In
rawQueryRes, we domkAcquire (log >> getStmtConn conn sql) stmtReset.So
stmtReset :: Statement -> IO ()is called, but without any further information.If we want to use this as a point to write to our cache (feels reasonable), we'll need to smuggle that information in somehow.
Can we modify
StatementandstmtQueryto save rows as they come through?We can
fmapover anAcquire, allowing us to writeConduitM () [PersistValue] m () -> ConduitT () [PersistValue] m ().I think we can have something like,
modifyAcquireToSaveRows :: ([PersistValue] -> [PersistValue] -> IO ()) -- ^ for @args@, save current @row@ -> ([PersistValue] -> IO ()) -- ^ trigger that cache is complete for these arguments -> Statement -> Statement modifyAcquireToSaveRows saveRow signalArgsComplete statement = statement { stmtQuery = \args -> do src <- stmtQuery statement args pure ( src .| saveRowC args ) } where saveRowC args = do mrow <- await case mrow of Nothing -> do -- query is over, signal we can save for this signalArgsComplete args Just row -> do -- query is ongoing saveRow args row yield rowThen, to write cache,
modifyStatementToSaveCache :: Chan [PersistValue] -- ^ The accumulated channel of persist values -> IO QueryFullyCompleted -- ^ Action to return whether query completed -> ([[PersistValue]] -> IO ()) -- ^ Action to write accumulated values to cache -> Statement -> Statement modifyStatementToSaveCache values didQueryComplete writeRowsToCache statement = statement { stmtReset = do compleat <- didQueryComplete case compleat of QueryUnfinished -> do -- don't want to save here, so flush cache _ <- getChanContents values pure () QueryFinished -> do rows <- getChanContents values writeRowsToCache rows stmtReset statement }This leaves a lot of implementation details unsaid.
cachedQueriesimplementationI think, with the above, we can write a function like this:
This will modify the local
SqlBackendwith caching inside of the block.Cache Busting
We definitely want a way to say "don't use cache in this lexical scope."
With the above design,
cachedQueriesis closing over that local state, and it is not made available to the inner action.So
noCachinghas no ability to read or modify this - the side channels have all been completely encapsulated.cachedQueriescould provide a handle of sorts, but this breaks the lexical nature of it.We want to call arbitrary queries, and those arbitrary queries shouldn't have to know if they have a cache or not.
However...
SqlBackenddoes have aconnVault, aVaultwhich allows us to store arbitrary values.We could store the caching stuff in the
Vault, along with a flag indicating whether or not the cache should be used.Hooks, etc could consult this entry in the
Vault.Hm.
Should
noCachingforce just a database lookup, or also an update to cache?Consider:
I think
noCachingshould probably always hit the database, but also always update the cache.Don't cache writes!!
We definitely don't want our mechanism here to cache write queries.
Many of them go through
stmtExecute, but some will go throughstmtQuery.So we need to have checking that the query text is not an INSERT or UPDATE or DELETE.
Cache Busting on Writes
The above design has a mapping
Query -> Arguments -> Results.But this is no good. Consider:
This will cause bugs.
So we want a way to know which tables have been modified by an update query, and then invalidate just those entries in the cache.
One option is to just invalidate the entire cache on every update/insert.
This works and is very easy to implement.
Another, more difficult, option is to only invalidate cached queries that read from tables that have been modified.
To do this, we'd need to parse the query text for tables, and then scan each cached query text for a mention of the table.
If that table is mentioned, we clear the cache for that query.
This may also be somewhat slow to operate.