Hashing and Hash Tables

Hashing

Hashing involves generating a unique identifier for a piece of data by applying a hash function to an input (or “key”). The resulting hash code is then used to quickly store or retrieve the data from a hash table. Essentially, hashing creates a distinct label for each piece of data, and the hash code helps point to the location in memory where the data is stored, enabling fast access.

Hash Tables

A hash table is a data structure that stores data in key-value pairs. The key acts as a unique identifier, while the value is the associated data. Hash tables use the hash code to map the key to a specific location in memory, making the process of accessing or updating data efficient. The goal of a hash table is to offer fast access to data by using a key to reference its corresponding value.

Hashing with HashMap

In Java, the HashMap class is the most commonly used implementation of a hash table. It stores data as key-value pairs and leverages hashing to quickly retrieve the associated value for a given key. Here’s how it works:

  • Hashing: When adding a key-value pair to a HashMap, Java calculates the hash code of the key (via the hashCode() method) to determine where the data should be stored in memory.
  • Bucket Assignment: The hash code is used to assign the key-value pair to a “bucket” or “slot” in the hash table.
  • Handling Collisions: If two keys generate the same hash code (a “collision”), Java handles it through chaining (storing multiple values in the same bucket) or open addressing (finding an alternative bucket).

Declaration & Initialization

In Java, you can declare a hash table using the HashMap or Hashtable classes from the java.util package.

You can also initialize with a specific capacity and load factor to control resizing behavior:

Basic Operations

Insertion: Add a new key-value pair.

Deletion: Remove a key-value pair by key.

Traversal: Iterate through all key-value pairs.

Example

In the above example:

  • A HashMap stores fruit names as keys and their quantities as values.
  • Three key-value pairs are inserted using the put() method.
  • The value for the “Apple” key is retrieved using the get() method.
  • The presence of the “Banana” key is confirmed, and the “Orange” entry is removed using the remove() method.

Advantages & Disadvantages

Advantages Disadvantages
Fast Access: Provides O(1) time complexity for search, insertion, and deletion (on average). Collisions: Collisions can reduce performance and need to be handled efficiently.
Efficient Memory Use: Hash tables are space-efficient, especially with open addressing. Order is not preserved: Hash tables do not guarantee any specific order of elements.
Supports a wide range of data types: Keys can be of any data type, as long as they are hashable. Complexity: The underlying implementation can be complex, especially with collision resolution strategies.

Real World Examples

  • Cache Systems: Hash tables are often used in caching systems where fast lookups are necessary. For example, the LRU Cache (Least Recently Used) uses hash tables to store and retrieve the most recently used items.
  • Database Indexing: Many databases use hashing techniques to create indexes for fast data retrieval.
  • Dictionary and Spell Checkers: A dictionary that stores words with their meanings can be implemented using a hash table, allowing quick lookups of words.
  • Routing Tables in Networking: Hash tables are used to quickly find routes in a network based on destination IP addresses.