Blueprint: Synchronization for TYPO3 CMS caches
|Proposal||Implement proper synchronization for cache access|
|Participants/Members||Helmut Hummel, Alexander Opitz|
|Status||Draft, Discussion, Voting Phase, Accepted, Declined, Withdrawn|
|Current Progress||Unknown, Started, Good Progress, Bad Progress, Stalled, Review Needed|
|Topic for Gerrit|
- TYPO3 CMS 7/8 LTS
Goals / Motivation
TYPO3 CMS Core completely lacks any solid means of process synchronization upon accessing shared resources. Especially since the increased usage of the caching frame work for vital parts of the Core, issues are frequently reported that race conditions occur.
Currently such race conditions often lead to a complete breakdown of the Core, since certain caches are in a state that prevent proper execution of the Core.
Consequently, it is of utmost importance to fix access to such shared resources, to be handled synchronized in a deterministic manner. Moreover, we should try to minimize the parallel generation of data, which is intended to be cached afterwards anyways. Only one process shall create those data at a time.
Current cache access in Core follows three procedures.
- Try to fetch a cache entry
- If it was a cache hit => return the data
- If it was a cache miss (non-existing or outdated) => Create the data; Store them in the cache
- Try to fetch a cache entry
- If it was a cache hit => modify the data and write them back
- If it was a cache miss (non-existing or outdated) => create the data and write them back.
This procedure shall happen atomically to ensure data freshness during modification in step 2.
Third procedure is flushing some or all cache entries.
Having two processes doing one of these procedures on the same cache or cache entry, is very likely to lead to race conditions. This may result in:
- Inconsistent cache data
- Creation of data in more than one process at the same time
- Writing to the same cache entry at the same time
- Writing outdated data to a cache entry
The academic problem, faced here, is the so called "readers-writers problem", which is described here.
We have to ensure that write access to a certain cache entry is only permitted, if it happens exclusively. Any other process, accessing the same cache entry, has to wait. Furthermore we have to ensure that neither a reader nor a writer starves. So we're facing the third readers-writers problem.
Looking even closer, I realize that for different use cases we even need different synchronization layers. For some cases we need to lock the complete cache, for instance when flushing it (also flushByTag), for some it suffices to lock a single cache entry. It is important to have such a fine grained locking, to avoid unnecessary blocking of processes.
Note: This concept is not capable of synchronizing access to multiple caches, which is for instance currently needed by the class loader. (Classes cache and Core cache) This domain specific handling requires the implementation of synchronization in the application requiring this.
Assuming we may not change our cache frontend implementations, I propose to implement another abstraction layer between application (e.g. PageGenerator) and cache frontend (e.g. StringFrontend).
An example would be a LockedStringFrontend. It takes care of all the locking and uses the StringFrontend to access the actual cache. A prerequisite for this to work is, that LockedStringFrontend is able distinguish a cache hit and cache miss.
The LockedStringFrontend guarantees that:
- No write access is performed without having locked the cache a/o entry beforehand to access it exclusively.
- No read access is performed while a write access is waiting or is in progress.
- No starvation of read or write requests occurs.
- No data is generated more than once if multiple concurrent read accesses to the same entry are a cache miss.
- No application has a possibility to accidentally misuse the API of this class and cause a race condition.
- A cache entry can be replaced atomically.
Numbers 1,2 and 3 can be achieved by simple means of locking as described in the readers-writers problem.
Number 4 can be achieved by completely preventing direct write access and by using callbacks to generate data on demand. A typical call would look like this: LockedStringFrontend::get('entryId', myDataGeneratingCallback)
The aforementioned get() method would work like this:
- Normal (synchronized) read access to cache
- if cache hit, return the data
- if cache miss:
- lock the cache entry
- read the entry again - it might have been generated meanwhile
- if cache hit, unlock the cache and return data
- if cache miss, call the callback to generate the data by the application
- write the generated data to the cache
- unlock the cache
Implementation of number 4 would also guarantee number 5, because no application can write data asynchronously.
Number 6 is a combined locked read and write access. A typical call would look like this: LockedStringFrontend::replace('entryId', myDataModificationCallback)
The aforementioned replace() method would work like this:
- Acquire lock for cache entry
- read access to cache
- call the callback with the result of the read operation
- write the result of the callback into the cache
- unlock the cache entry
As a consequence, all places in Core, which access caches have to be adopted to the API of the Locked*Frontends.
Problem: Any implementation of the readers-writers problem needs shared counters. In PHP we might use shared memory for this, but that is only available on Unix. Files might be used as well, but that seems rather slow.
General speed reduction due to locking.
Issues and reviews
#47712 - major locking refactoring