Posts Tagged ‘Database Caching

Performance Analysis of Cache Policies for Web Servers

Many existing Web servers, e.g., NCSA and Apache, rely on the underlying file system buffer of the operating system to cache recently accessed documents. When a new request arrives, the Web server asksthe operating system to open the file containing the requested document and starts reading it into a temporary memory buffer. After the file has been read, it needs to be closed.

Web Server Caching vs. File System and Database Caching

Traditional file system caches do not perform well for the WWW load [A+95, M96]. The following three differences between the traditional and Web caching account for that:

1. Web data items have a different granularity.File system and database buffers deal with fixed size blocks of data. Web servers always read andcache entire files.  Additionally,non-fixed size data items complicate memory management in Web server caches.

2. Caching is not obligatory in Web servers, i.e., some documents may not be admitted in the cache.A file system/database buffer manager always places requested data blocks in the cache (the cache serves as an interface between the storage subsystem and the application). On the contrary, a Webserver cache manager may choose not to buffer a document if this can increase cache performance(hit rate). This option of not caching some documents combined with a different granularity of data items significantly broadens the variety of cache management policies that can be used in a Webserver.

3. There are no correlated re-reads or sequential scans in Web workloads.Unlike database systems, Web servers never experience sequential scans of a large number of data items. Nor do they have correlated data re-reads. That is, the same user never reads a document twice in a short time interval. This becomes obvious if we take the Web browser cache into account. All documents re-reads are handled there. The Web server never knows about them. (Communicationfailures, dynamically generated documents, and browser cache malfunctioning may result incorrelated re-reads on the Web server).

One of the reasons why correlated rereads are common in traditional caches is the possibility of storing several logical data items, e.g., database records, in the same physical block. As a result, even accesses to different logical records may touch the same buffer page, e.g., during a sequential scan,resulting in artificial page re-reads.

The impact of the absence of correlated re-reads is two-fold. Firstly, a caching algorithm for Web workloads does not need to “factor out locality” [RV90], i.e. eliminate the negative impact of correlated re-reads on cache performance. Repeated (correlated) accesses to a data item from a single application make the cache manager consider the item popular even though this popularity is artificial: all accesses occur in a short time interval and the item is rarely used. In Web servers, on the contrary, multiple accesses in a short time interval do indicate high popularity of a document since the accesses came from different sources.

Secondly, a single access to a document should not be a reason for the cache manager to put the document in the cache since there will be no following, correlated accesses. In other words, the traditional LRU (Least Recently Used) is not suitable for Web document caching.

Additionally, Web servers deal with fewer data items than file systems and databases do. The number of documents stored on a typical Web server rarely exceeds 100,000 whereas files systems and databases have to deal with millions of blocks. As a result, a Web server can afford keeping access statistics for all stored documents while traditional systems cannot.

Keeping the foregoing observations in mind, one can expect that a dedicated document cache would perform better than a file system cache. Such a document cache should use a cache management policy that is more suitable for the WWW load than that of the operating system.


Tags : , , , , ,