Improving Caching Strategies With SSICLOPS

F-Secure development teams participate in a variety of academic and industrial collaboration projects. Recently, we’ve been actively involved in a project codenamed SSICLOPS. This project has been running for three years, and has been a joint collaboration between ten industry partners and academic entities. Here’s the official description of the project.

The Scalable and Secure Infrastructures for Cloud Operations (SSICLOPS, pronounced “cyclops”) project focuses on techniques for the management of federated cloud infrastructures, in particular cloud networking techniques within software-defined data centres and across wide-area networks. SSICLOPS is funded by the European Commission under the Horizon2020 programme (https://ssiclops.eu/). The project brings together industrial and academic partners from Finland, Germany, Italy, the Netherlands, Poland, Romania, Switzerland, and the UK.

The primary goal of the SSICLOPS project is to empower enterprises to create and operate high-performance private cloud infrastructure that allows flexible scaling through federation with other clouds without compromising on their service level and security requirements. SSICLOPS federation supports the efficient integration of clouds, no matter if they are geographically collocated or spread out, belong to the same or different administrative entities or jurisdictions: in all cases, SSICLOPS delivers maximum performance for inter-cloud communication, enforce legal and security constraints, and minimize the overall resource consumption. In such a federation, individual enterprises will be able to dynamically scale in/out their cloud services: because they dynamically offer own spare resources (when available) and take in resources from others when needed. This allows maximizing own infrastructure utilization while minimizing excess capacity needs for each federation member.

Many of our systems (both backend and on endpoints) rely on the ability to quickly query the reputation and metadata of objects from a centrally maintained repository. Reputation queries of this type are served either directly from the central repository, or through one of many geographically distributed proxy nodes. When a query is made to a proxy node, if the required verdicts don’t exist in that proxy’s cache, the proxy queries the central repository, and then delivers the result. Since reputation queries need to be low-latency, the additional hop from proxy to central repository slows down response times.

In the scope of the SSICLOPS project, we evaluated a number of potential improvements to this content distribution network. Our aim was to reduce the number of queries from proxy nodes to the central repository, by improving caching mechanisms for use cases where the set of the most frequently accessed items is highly dynamic. We also looked into improving the speed of communications between nodes via protocol adjustments. Most of this work was done in cooperation with Deutsche Telecom and Aalto University.

The original implementation of our proxy nodes used a Least Recently Used (LRU) caching mechanism to determine which cached items should be discarded. Since our reputation verdicts have time-to-live values associated with them, these values were also taken into account in our original algorithm.

Hit Rate Results

Initial tests performed in October 2017 indicated that SG-LRU outperformed LRU on our dataset

During the project, we worked with Gerhard Hasslinger’s team at Deutsch Telecom to evaluate whether alternate caching strategies might improve the performance of our reputation lookup service. We found that Score-Gated LRU / Least Frequently Used (LFU) strategies outperformed our original LRU implementation. Based on the conclusions of this research, we have decided to implement a windowed LFU caching strategy, with some limited “predictive” features for determining which items might be queried in the future. The results look promising, and we’re planning on bringing the new mechanism into our production proxy nodes in the near future.

fraction_of_top_k_results_compared_to_cache_hit_rates

SG-LRU exploits the focus on top-k requests by keeping most top-k objects in the cache

The work done in SSICLOPS will serve as a foundation for the future optimization of content distribution strategies in many of F-Secure’s services, and we’d like to thank everyone who worked with us on this successful project!