Introduction to Caching

Includes introduction to caching and different types of caching storage such as on-heap, off-heap and remote storage

Oct 06, 2020

What is Caching?

In computing world, caching is the process of storing a subset of data in highly accessible and high speed layer called cache. It is done in order to access more frequent data quickly and avoid any additional computation that was done to fetch & store previous data. Caching stores the data for a small duration of time and is a trade-off in capacity in favor of high speed.

Hardware beneath a caching layer is a fast access material such as RAM & in-memory engines and is usually used along with a software layer to access the data.

Caching is majorly divided into two types: Local Cache and Remote Cache. Local cache uses the JVM heap for storage and remote (or cluster) cache uses in-memory stores such as Redis, Memcache etc.

What is On-Heap Local Caching?

On-heap caching refers to storing data in java heap and it is managed automatically by Garbage Collectors (GC).

What is Off-Heap Local Caching?

Off-heap caching refers to storing the data outside the heap. These are not automatically handled by GC. Since, it’s stored outside java heap, the data needs to be stored as an array of bytes. Hence, there’s an additional overhead of serializing and de-serializing the data.

What is Remote Caching?

Remote Caching refers to storing data in a cache present in cloud. It helps in building a stronger persistence layer with enhanced performance as data can be retrieved from cloud. Redis and Memcached are two of the popular in-memory caching solutions today.

Benefits of On-heap and Off-heap Caching

On-heap Pros

Objects are allocated and de-allocated automatically due to GC
Faster access to data

On-heap Cons

Increased GC pauses
Since data is stored in JVM memory, if JVM crashes the data is lost and hence, long surviving caches are not possible

Off-heap Pros

Allows caching large data without worrying about GC pauses
Supports adding a persistent layer in-memory to recover from JVM crashes
Several JVMs can share the cached data

Off-heap Cons

The biggest disadvantage of using a off-heap cache is serialization/de-serialization of data. This is an overhead for underlying programs. Since there’s no common data structure, converting the serialized data into respective objects takes additional cost.
Short lived data is better suited for on-heap caches since it allows for automatic GC hence, it’s overhead to identify data that can be put into on-heap.
Manual memory management (issues like memory fragmentation!)

In short, off-heap caches are a better way to store data since it provides a long term solution to storing large data and with large disk sub-systems, you can achieve high IOPS (Input Output Requests Per Second) .

Benefits of Remote Caching

Remote cache clusters can be scaled on demand.
It’s not restricted to one data structure and hence, provides ease of use due to multi-language programming support.
Enhanced performance due to quick access of data since the data is stored in-memory instead of slower access via disk.

Conclusion

Due to an increase in the creation of data over time, the need for more fast and better caching solutions become important. In developing a software, the most important aspect is handling data effectively and quickly and cache provides a way to achieve that at large scale.

If you like the post, share and subscribe to the newsletter to stay up to date with tech/product musings.

(The contents of this blog are of my personal opinion and/or self-reading a bunch of articles and in no way influenced by my employer.)

Aastikta's Newsletter

Discussion about this post