Hashing II
- Two implementations for sets and maps
- RB Tree based approach:
- Requires items to be ocmparable and has logarithmic time operations.
- Hash table based approach: HashSet and HashMap.
- Constant time operations.
- RB Tree based approach:
- Hash Table:
- Data is converted into a hash code, the has code is reduced to a valid index.
- Data is stored in a buckey corresponding to that index.
- Each bucket is a “separate chain” of items.
- We resize when load factor N/M exceeds some limit.
- If items are spread out nicely, we get around $\Theta(1)$ average runtime.
- Values are seemingly placed into random bins in the hashing array.
- This is due to the fact that the hashing function decides where the indices are.
- The hashing function is by default based on the memory location of the object.
1
2
Integer x = 5;
x.hashCode();
1
5
- The goal of a hash function is to try and spread out item evenly.
- Thus the memory address is effectively random, so items should be evenly distributed.
- However, sometimes we do not want to use the default hash function.
- If we consider two objects to be equal based on their value and not by their reference, then it is possible that we cannot find a value in the hashmap as the memory address of our object does not match with the new identical object that we have.
- We need to override the hashCode to hash a more intrinsic property of the object, something that all equal objects share. Such as the color of a colored number.
- If two objects are equal they should have the same hashCode so that the hashtable could always find it.
- If quals is overriden, the hashCode should also be overriden in a consistent manner.
- If two objects are equals, they must have te same hashCode.
- If we don’t
- Contains can’t find objects.
- Add results in duplicates.
Immutable and Mutable Data Types
- Immutable data types is a type where an instance cannot change in any way after instantiation.
- The Integer, String, and Date classes are immutable.
- the
final
keyword help the compiler ensure immutability. - final is not sufficient nor necessary for a class to be immutable.
- Advantages of immutability: Less to think about, avoids bugs and makes debugging easier.
- Disadvantages of immutability: Must create a new object anytime anything changes, inefficient!
Mutable HashSet Keys.
- What if we change a key in a hashset after it’s been hashed?
- The moment we change an object’s composition, it would have a different hashcode and might point to a different bin. So we’d ave an issue.
- Bottom line is to never mutate an object being used as a key, which could give us incorrect results and make the item get lost.
- We are allowed to change things so long as it does not change the equals() value.
- In Java,
HashSet
looks effectively identical toHashMap
, except the vlaues ofHashMap
are ignored in a HashSet.