Home CS61B: Lecture 19
Post
Cancel

CS61B: Lecture 19

Hashing II

  • Two implementations for sets and maps
    • RB Tree based approach:
      • Requires items to be ocmparable and has logarithmic time operations.
    • Hash table based approach: HashSet and HashMap.
      • Constant time operations.
  • Hash Table:
    • Data is converted into a hash code, the has code is reduced to a valid index.
    • Data is stored in a buckey corresponding to that index.
      • Each bucket is a “separate chain” of items.
    • We resize when load factor N/M exceeds some limit.
    • If items are spread out nicely, we get around $\Theta(1)$ average runtime.
  • Values are seemingly placed into random bins in the hashing array.
    • This is due to the fact that the hashing function decides where the indices are.
    • The hashing function is by default based on the memory location of the object.
1
2
Integer x = 5;
x.hashCode();
1
5
  • The goal of a hash function is to try and spread out item evenly.
    • Thus the memory address is effectively random, so items should be evenly distributed.
  • However, sometimes we do not want to use the default hash function.
    • If we consider two objects to be equal based on their value and not by their reference, then it is possible that we cannot find a value in the hashmap as the memory address of our object does not match with the new identical object that we have.
    • We need to override the hashCode to hash a more intrinsic property of the object, something that all equal objects share. Such as the color of a colored number.
  • If two objects are equal they should have the same hashCode so that the hashtable could always find it.
  • If quals is overriden, the hashCode should also be overriden in a consistent manner.
    • If two objects are equals, they must have te same hashCode.
    • If we don’t
      • Contains can’t find objects.
      • Add results in duplicates.

Immutable and Mutable Data Types

  • Immutable data types is a type where an instance cannot change in any way after instantiation.
    • The Integer, String, and Date classes are immutable.
    • the final keyword help the compiler ensure immutability.
    • final is not sufficient nor necessary for a class to be immutable.
  • Advantages of immutability: Less to think about, avoids bugs and makes debugging easier.
  • Disadvantages of immutability: Must create a new object anytime anything changes, inefficient!

Mutable HashSet Keys.

  • What if we change a key in a hashset after it’s been hashed?
  • The moment we change an object’s composition, it would have a different hashcode and might point to a different bin. So we’d ave an issue.
  • Bottom line is to never mutate an object being used as a key, which could give us incorrect results and make the item get lost.
    • We are allowed to change things so long as it does not change the equals() value.
  • In Java, HashSet looks effectively identical to HashMap, except the vlaues of HashMap are ignored in a HashSet.
This post is licensed under CC BY 4.0 by the author.

Hashing

CS61B: Lecture 22