Hybrid Partitioning and Distribution of RDF Data (Record no. 30238)

000 -LEADER
fixed length control field nam a22 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 210204b xxu||||| |||| 00| 0 eng d
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 005.74
Item number PAD
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Padiya, Trupti
245 ## - TITLE STATEMENT
Title Hybrid Partitioning and Distribution of RDF Data
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc Gandhinagar
Name of publisher, distributor, etc Dhirubhai Ambani Institute of Information and Communication Technology
Date of publication, distribution, etc 2018
300 ## - PHYSICAL DESCRIPTION
Extent xvi, 101 p.
500 ## - GENERAL NOTE
General note Bhise, Minal, Thesis supervisor
Student ID No. 201221002
Thesis (Ph.D.) -Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, 2018
520 ## - SUMMARY, ETC.
Summary, etc RDF is a standard model by W3C specifically designed for data interchange on the web. RDF was established and used for the development of the semantic web. However, nowadays RDF data is being used for diverse domains and is not limited to the semantic web. Tremendous increase is witnessed in RDF data due to its applications in various domains. With growing RDF data it is vital to manage this data efficiently. The thesis aims at efficient storage and faster querying of RDF data using various data partitioning techniques. The thesis studies the problem of basic data partitioning techniques for RDF data storage and proposes the use of hybrid data partitioning in centralized and distributed environment as a part of the solution to store and query RDF data. The dissertation emphasizes on efficient data storage and faster query execution for stationary RDF data. It demonstrates basic data partitioning techniques like PT (Property Table), BT (Binary Table), HP (Horizontally Partitioned Table), and use of MV (Materialized Views) over BT. Even though basic data partitioning techniques outperforms TT (Triple Table) they suffer from various performances issues. The thesis gives a detailed insight into advantages and disadvantages of basic data partitioning techniques. Consequently, it proposes hybrid solutions for data partitioning by exploiting the best of available techniques. It proposes three hybrid data partitioning techniques namely DAHP (Data-Aware Hybrid Partitioning), DASIVP (Data-Aware Structure Indexed Vertical Partitioning) and WAHP (Workload-Aware Hybrid Partitioning). DAHP and WAHP are a combination of PT and BT whereas DASIVP combines structure index partitioning with BT. DAHP and DASIVP consider a data-aware approach and WAHP considers a workload-aware approach. Data-aware approach stores RDF data based on how the data is related to each other in the dataset and workload-aware approach stores RDF data based on how the data that is queried together. The thesis demonstrates detailed evaluation of query perform ance and data storage for all the data partitioning techniques. Query performances for these data partitioning techniques are evaluated in terms of QET (Query Execution Time). It calculates break-even point for all the data partitioning techniques. Hybrid data partitioning techniques have shown significant improvement over basic data partitioning techniques. A set of metrics is devised which can help to consider the suitability of given data partitioning technique for a RDF dataset. RDF data has increased to a point where it is difficult to manage this data on a single machine. It is necessary to distribute the data on different nodes and process it in parallel so that efficient query performance can be achieved. Data distribution and parallel processing of queries may generate many intermediate results which will involve communication among nodes. It becomes necessary to minimize inter-node communication among nodes in order to achieve faster execution of queries. This work presents a solution to manage RDF data in a distributed environment using a proposed hybrid technique. The solution aims at efficient RDF data storage and faster query execution by minimizing inter-node communication among nodes. Finally, the dissertation proposes DWAHP (Workload-Aware Hybrid Partitioning and Distribution) which exploits query workload and distributes data among nodes. DWAHP has two phases: Phase 1 considers Workload-Aware Hybrid Partitioning technique which generates workload-aware clusters consisti ng of PT and BT. Phase 2 considers a distribution scheme that distributes data among nodes using an n-hop Property Reachability Matrix. DWAHP Phase 1 helps in reducing number of joins, as it keeps the data which is queried together as a separate partition. DWAHP Phase 2 helps in diminishing inter-node communication among nodes with the use of an n-hop Property Reachability Matrix. The thesis demonstrates DWAHP and analyzes its query performance in terms of query execution time, query cost, storage space, and inter-node communication. Queries on RDF data mostly involve star and linear query patterns. DWAHP manages joins such that it is able to answer all linear and star queries without inter-node communication. DWAHP is compared with a state-of-the-art solution. It outperforms the state-of-the-art solution with 72% of faster query execution time, 61% of reduced query cost by occupying less than one-third of storage space. Increase in RDF data is witnessed as RDF data is being used in diverse domains. Discussed partitioning techniques can be utilized for various RDF stores. Data-aware RDF stores can be utilized for applications when data characteristics are known and workload-aware RDF stores can be utilized when data queries are known in advance.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element DWAHP
Topical term or geographic name as entry element Data aware hybrid partitioning
Topical term or geographic name as entry element RDF data storage
Topical term or geographic name as entry element Data structures and algorithms
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name Bhise, Minal
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier http://drsr.daiict.ac.in/handle/123456789/885
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type Thesis and Dissertations
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Permanent Location Current Location Date acquired Full call number Barcode Date last seen Koha item type
          DAIICT DAIICT 2018-05-01 005.74 PAD T00684 2021-02-04 Thesis and Dissertations

Powered by Koha