Indexing is a technique for improving database efficiency by reducing the number of disk accesses necessary while a query is completed. It is a data structure strategy for fast locating and accessing data in a database. A few database columns are used to generate indexes.
The first column is the Search key, which includes duplicates of the table’s main key or candidate key. These values are saved in the proper array to ensure that the related data may be conveniently retrieved.
The second column is the Data Reference or Pointer, which comprises a collection of pointers containing the location of the disk block containing that specific key value.
Attributes Of Indexing
- Access Types: The type of access, such as value-based search, range access, etc.
- Access Time: The time required to locate a specific data element or collect data items.
- Insertion Time: This is the time it takes to locate the right space and insert new data.
- Deletion Time: The time it takes to locate an item, delete it, and rebuild the index structure.
- Overhead Space: This refers to the extra space needed by the index.
Ordered Index Files:
The indices, in this case, are predicated on a categorized arrangement of the values. These are often faster and more conventional storage mechanisms. Such Ordered or Sequential file structures may store information in a dense or sparse format:
(i) Dense Index:
The data file has an index record for each search key value.
This record provides the search key as well as a link to the initial data record containing the search key value.
(ii) Sparse Index:
Only a few entries in the data file have an index record. As seen, each item corresponds to a block.
To find a record, we seek the index record with the greatest search key value that is less than or equivalent to the search key value we want.
We begin with the record referenced by the index record and work our way through the file (sequentially) until we discover the requested record.
Organization of Hash Files:
Indices are based on values spread equally across a range of buckets. A function known as a hash function determines which buckets a value is allocated to. There are essentially three indexing methods:
- Indexing by Clusters
- Secondary or non-clustered indexing
- Indexing on Multiple Levels