Computer Sciences and Information Technology
Topic:
CPSC 332 Project
Type of work:
Assignment
CPSC 332 Project
Summary
“Weaving Relations for Cache Performance” is a research paper written by Ailamaki, DeWitt, Hill, and Skounakis addressing data accesses performance to cache hierarchy. According to the paper, recent studies indicate that modern base workloads are experiencing delays that associated with memory subsystems and processors rather than I/O performance. The commercial DBMSs use N-ary Storage Modelary Storage Model ( NSM) and Decomposition Storage Model (DSM) which cannot perform in the modern database workloads as they cannot conduct efficient data access to the cache hierarchy. Both NSM and DSM create the problem of data missing in the cache hierarchy when running commercial database systems on a modern processor. Another problem addressed in the papers that are associated with the application of NSM and DSM is that only a small fracture of data transfer to the cache ends up being useful to query. The delays in system performance are also as a result of cache that contains useless data being loaded, bandwidth wastefulness, cache being polluted, and the replacement of information that may be required in the future.
The authors of the paper provide a new layout for data records that old be able to solve the cache problems. The Partition Attributes Across (PAX), which combines both the DSM and NSM capabilities, allowing it to eliminate the unnecessary access to main memory. According to the paper, PAX has similarities with NSM in terms of storage of data, which they both store within each page. However, PAX groups all value of the particular attribute on a minipage. To justify why PAX is the best layout for data, the authors conduct an evaluation by comparing PAX against NSM and DSM. The evaluation was conducted b use of predicate selection queries on numeric data and a variety of queries on numeric on top of the Shore storage manager. The authors conduct experiments that involved varying query parameters such as selectivity, predicates’ numbers, projectivity, relation degree, and the distance difference between the projected attribute and the attributed in the predicate. According to the experiment result, the comparison of NSM to PAX provided that PAX incurs 50-70% less second-level cache misses because PAX accesses data when executing a main-memory workload. TAX was also found to execute range selection queries and updates in 17-25% less elapsed time compared to that of NSM. Another finding from the study of the experiment established that TAX executes TPC-H queries involving I/O 11-42% faster than NSM. The paper also compares PAX to DSM.
According to the comparison, DSM execution time increases since it has a high record construction cost, while PAX executes queries faster although as more attributes are involved in the query, its execution remains stable. Other advantages of PAX that are included in the paper include the ability of PAX to be implemented on a DBMS that uses NSM by only making changes on the page-level data manipulation code. PAX can also be used by the storage manager as an alternative data layout or for storing a relation-based solely on the number of attributes. According to the paper, PAX is more suitable for compression algorithms, which tend to work better with vertically partitioned relations and on a per-page basis features that PAX has. Since PAX operates on the page level, it allows it to be used orthogonally to other storage schemes.
In conclusion, the paper is able to show that NSM affects the performance of cache and provide PAX as a new data layout for relational DBMSs. The authors justify PAX by providing its advantages over NSM and DSM. Therefore, the authors justify PAX introduction through its features, capabilities, and advantages that much beyond NSM and DSM.
Evaluation
The paper research question’s significance can be evaluated in several ways. The first way is that the research question is significant due to the following reasons. Modern platforms are being developed with strong decision support systems and spatial applications that create more database workloads. With such a system in place, they require a strong layout for relational DBMSs that can offer flexibility, speed, and features required to support the performance of the cache. According to the traditional layout available, which is DMS and NSM, they do not possess the ability to support cache and optimize I/O performance on a modern system. Therefore, the problem of the research paper is significant and can be solved by PAX, a new layout for data records.
The second justification that the research problem of the paper was significant is the difference in the performance ability of PAX when compared to DMS and NSM. When evaluates the features of PAX, there is a huge difference that exists in terms of operation, capability, flexibility, usability, and advantageous with those of NSM and DMS. The gap of features that makes PAX a superior layout for data records indicates that there was a significant problem cache performance if the NSM layout was being used for relational DBMS in modern platforms. The issues such as loading cache with useless data, waste bandwidth, and possible force replacement of data occurred due to such structure. With PAX in place, the problems associated with NSM and DMS layout can be wiped out as PAX eliminates unnecessary access to the main memory.
The paper incorporates both valid and invalid claims. The claims that are considered valid include the comparison of the elapsed time of DSM, NSM, and PAX as the function of the number of attributes in the query experiment. The experiment validates the claim that when reconstruction cost is low and the query involves one or two attributes DSM tends to perform well. The experiment also indicates that when the number of attributes increases the performance of DSM is deteriorates rapidly while NSM and PAX maintain a stable performance. The stable performance of NSM and PAX is due to attributes of each recording residing n the same page, which eliminates the need for expensive join operation to construct the record. Another claim that is valid is that when NSM is compared with PAX, PAX tends to incur 75% less data cache stall time, while range selection queries and updates on main-memory tables execute in 17-25% less elapsed time. The claim can be justified through the experiment conducted to illustrate NSM vs. PAX impact on cache behavior. The experiment results establish that NSM suffers one cache miss per record while PAX only takes a miss every four records, which allows PAX to save about 75% of the data misses that NSM incurs in the L2 cache. The experiment also shows that PAX is able to run queries faster compared to NSM since it reduces cache delays related to data accesses.
The invalid claim established in the paper is the sensitivity analysis of DSM. The sensitivity analysis experiment provided in the paper only involves PAX and NSM, while the authors are able to provide the analysis of DSM’s sensitivity performance without including it in the experiment. The paper invalidly states that DSM’s performance is about a factor of nine slower than NSM and PAX.
Synthesis
The crux for the research problem that is established in the paper is the poor cache performance of NSM and DSM. Poor performance associated with NSM is that it suffers one cache miss per record. When the data misses in L1, it drops to the L2 cache and organized as an infield cache containing both data and instructions. The L2 organized cache replaces other needed data to accommodate new requests. NSM loads the cache with unreferenced data to evaluate the predicate causing extra instructions misses. The data provided by NSM ends up in a required order thus useless. NSM is also associated with less computation time due to reduced memory-related delays resulting in a waste of bandwidth. The poor performance of DSM is associated with multiple attributes queries that force that DSM to join the participating sub-relations on the surrogate for the reconstruction of a partitioned record. The number of attributes in the result relation increases as DSM spends much time joining sub-relations, which limits the capacity of DSM causing delays.
Another alternative of addressing the research problem is by finding ways in which NSM features can be increased to incorporate important features such as the less elapsed time and ability to incur less data cache stall time. NSM can also be developed to be able to incur less speedup when running TPC-H queries. The addition of the features would enable NSM to perfume better. The improvement of the research result in the paper can be conducted by including each of the three items in the experiment conducted to establish the strengths and weaknesses of each fully. For instance, DSM tends to be left out in most of the experiments conducted for comparison purposes including in updates with various selectivities, speedup, elapsed time, elapsed time per record, and impact on cache behavior experiments.
Looking at the presentation of the paper, some of the critique arguments against the authors’ presentation include lack of an implementation structure. The implementation of the PAX approach also tends to be difficult. The paper does not offer any results concerning the TPC-C (OLTP) workload. For PAX to be considered in the data layout, it should be able to exhibit relatively good performance in both the two types of workloads. The paper experimental setup also tends to be very convenient for PAX.
Doing better than the authors of the paper means that one should include both data layout in the experiments presented in the paper. The paper should also provide specific solute to each problem addressed and recommendations for improving the other data layouts to be able to perform in the modern platforms. The paper presentation flow can also be improved by creating a flow that provided analysis of each factor associated with the research problem and linked with the factors regarding possible solutions to the causes of the problems.

Work Cited
Ailamaki, A., DeWitt, D., Hill, M., & Skounakis, M. Weaving Relations for Cache Performance. VLDB Conference, Roma, Italy. 2001.

Published by
Essays
View all posts