Java HashSet에서 HashCode를 변경하게 되면...

effective java의 item11인 equals 와 hashCode의 재정의에 대해 보던 중 한가지 의문점이 생겼다.

HashSet안에 넣은 객체의 equals와 hashCode를 수정하게 됬을때 검색이 되나 .. ?

HashSet의 동작방식이 hashCode가 같고 equals가 true로 나오게 되면 가능한 것으로 알고있어서 대충 테스트해보고 되면 넘어가려고 했다.

근데 왠걸... 값과 hashCode를 변경 하면 똑같은 객체로 contains 검색이 안된다.

시나리오는 이랬다.

1. 국어, 수학, 영어 만점을 가진 학생 클래스 (set에 등록)

2. 부정행위로 인해 점수를 반으로 감소시킴

3. 이후 set에 contains로 같은 객체로 존재하는지

hashCode가 변경됬지만 넣어 놓은 객체도 변경이 되었기에 처음에는 검색이 될 줄 알았다.

(물론, 논리적으로 조금 이상하지만 같은 점수면 같은 학생으로 본 것이다.)

Student.class

class Student {

    int kor;
    int math;
    int eng;

    public Student(int kor, int math, int eng) {
        this.kor = kor;
        this.math = math;
        this.eng = eng;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        Student student = (Student) o;
        return kor == student.kor && math == student.math && eng == student.eng;
    }

    @Override
    public int hashCode() {
        return Objects.hash(kor, math, eng);
    }

    @Override
    public String toString() {
        return "Student{" +
            "kor=" + kor +
            ", math=" + math +
            ", eng=" + eng +
            '}';
    }

    public void renewScore() {
        kor = kor / 2;
        math = math / 2;
        eng = eng / 2;
    }
}

Main.class

public class Main {
    public static void main(String[] args) {
        Set<Student> students = new HashSet<>();

        Student stu1 = new Student(100, 100, 100);
        students.add(stu1);
        System.out.println(stu1+", "+stu1.hashCode());

        stu1.renewScore();
        System.out.println(stu1+", "+stu1.hashCode());

        System.out.println(students.contains(stu1));
    }
}

실행결과

객체를 찾을 수 없다.

이에 대해 파해치기 위해 HashSet의 구현을 확인해보았다.

먼저, HashSet은 별도로 구현된 것이 아니라 HashMap을 이용해서 구현되었다.

HashMap의 Key에 Set의 값들을 넣고 Value 부분에는 PRESENT라는 더미 Object를 한개 생성하여 넣는식이었다.

contains 또한 안에 존재하는지 여부를 HashMap의 containsKey를 이용했다.

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();


public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

public boolean contains(Object o) {
    return map.containsKey(o);
}

그렇다면 HashMap의 put과 containsKey는 어떻게 구현되있을까 ?

먼저 put에 대한 내용이다. contains도 매우 유사하다. (어차피 있는지 확인하는 것도 넣는 자리를 확인하는 것과 동일해서....)

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
 * Implements Map.put and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

put을 하면 hash(key)를 이용해 hash를 구한 후 putVal 메소드에 넘겨준다.

putVal에서는 버킷인 Table에서 [ 버킷 길이 - 1 & hash 값 ] 을 통해 인덱스를 구하고 해당 위치에 노드가 존재하는지 확인한다.

그러고서 없다면 새로운 노드를 생성하고 있다면 덮어쓸지 말지에 대한 여부인 onlyIfAbsent를 통해 동작을 결정한다.

대충 이러한 방식으로 사용이 된다.

(물론 값이 존재해서 충돌이 일어나면 LinkedList 형식으로 해당 인덱스에 위치한 노드의 next에 노드를 연결하는 형식으로 저장된다.)

이를 보고 "아 그러면 넣고나서 hashCode를 변경하면 인덱스가 이전 위치에 저장되있기에 검색이 안되서 그러나보다" 싶었지만, 추가적인 이유도 있었다.

인덱스 문제는 예를 들어 처음 저장할때 table[5] 버킷에 저장했는데 hashCode 값이 변경되었으니 찾을때는 table[7]을 찾는 것이다. 그러니 해당 위치의 노드가 없어 false로 없다고 하는 것이다.

그래서 변경 전 hash와 동일하게 나오도록 같은 점수를 가진 Student를 만들어 contains를 시도해보았다.

public class Main {

    public static void main(String[] args) {
        Set<Student> students = new HashSet<>();

        Student stu1 = new Student(100, 100, 100);
        students.add(stu1);
        System.out.println(stu1 + ", " + stu1.hashCode());
        stu1.renewScore();

        Student stu2 = new Student(100, 100, 100);
        System.out.println("stu1 : " + stu1 + ", " + stu1.hashCode());
        System.out.println("stu2 : " + stu2 + ", " + stu2.hashCode());

        System.out.println(students.contains(stu2));
    }
}

하지만 결과는 여전히 false

이전 hashCode였던 129091이 stu2에서도 똑같이 나오는데 검색이 안된다.

그럼 해당하는 버킷의 인덱스로 가도 안된다는 것인데 뭐가 문제였을까

if ((tab = table) != null && (n = tab.length) > 0 &&
    (first = tab[(n - 1) & (hash = hash(key))]) != null) {
    if (first.hash == hash && // always check first node
        ((k = first.key) == key || (key != null && key.equals(k))))
        return first;

위 코드는 containsKey 메소드가 사용하는 getNode에 있는 코드이다.

인덱스를 찾고 해당 노드와 일치하는지 확인해서 일치하면 바로 반환하는 부분이다.

같은 hash를 강제적으로 맞춰서 인덱스를 찾게 했고 버킷인 table내에 해당 노드를 잘 찾았어도 hash는 같을지언정 두번째 조건인 == 은 주소값을 기반으로 비교하기에 false가 뜨고 노드에 값들이 바꿔났기에 equals도 안맞는 이상한 노드가 버킷내에 들어있게 되는 것이었다. or 라서 둘 중 하나라도 만족을 해야하지만 둘다 만족하지 못했다.

이러면 의도치 않게 중복된 값들이 Set 자료형에도 들어가게 될 것 같다. Hash 충돌까지 일으키면서...

Hash 자료형에는 넣고 나서는 hashCode를 변경되지 않도록 하거나 아니면 HashCode가 immutable 하도록 설계를 해야할 것 같다.

어렵다...

제대로 본건지 모르겠다.

'Programming > JAVA' 카테고리의 다른 글

Java Equals 정의시 getClass() or instanceof (0)	2023.01.02
Objects requireNonNull 란? (0)	2022.12.30

'Programming > JAVA' 카테고리의 다른 글

티스토리툴바