Computer Science/Data Structures

🎊Binary Heap

metamong 2024. 7. 17.

🎊 intro

🎊 Priority Queue와 연결지어 설명할 수 있는 Binary Heap data structure에 대해 자세히 알아보자. 앞서 학습한 Priority Queue는 일종의 Abstract Data Structure(ADT)로, binary heap을 추상화한 개념이라 볼 수 있다. 즉, 실제 priority queue는 binary heap이라 생각하면 된다. priority queue의 우선순위가 최대 먼저, 또는 최소 먼저라면 binary heap으로 tree structure를 만들어 tree structure에서 우선순위에 맞게 꺼내준다. 최대 먼저라면 max-heap, 최소 먼저라면 min-heap이라 부른다. (여기서 priority queue는 무조건 binary heap을 이용해서 구현이라고 생각하면 안된다. 주로 binary heap이 많이 쓰일 뿐 다른 heap도 쓰인다.)

🎊 앞서 언급한 Binary Heap은 Heap Data Structure의 일종으로, binary heap외 leftis, binomial, fibonacci, pairing, brodal, rank-pairing, strict fibonacci, 2-3 heap과 같은 다양한 heap 존재. 아래와 같이 이번 포스팅에서 배울 Binary Heap은 아래 표와 같이 deletion / insertion 모두 시간 복잡도 O(logN)이다. O(N)보다 작으므로 효율적이다라는 부분에서 대표 자료구조로 많이 사용.

🎊 tree structure 형태의 binary heap이라고 하였는데, tree중 complete binary tree structure에서 따온 것을 binary heap이라 생각하면 된다. Binary Heap을 만족하기 위해서는 Complete Binary Tree Structure라는 shape을 만족해야 하고(shape property 충족), Heap 성질을 만족해야 한다(heap property 충족)

🎊 ① Shape Property → Complete Binary Tree

🎊 Complete Binary Tree(CBT; 완전 이진 트리)란, root 노드부터 시작하여 왼쪽 자식 노드, 오른쪽 자식 노드 순서대로 데이터가 차례대로 삽입되는 tree를 뜻한다. 'all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right.' 맨 마지막 level을 제외하고 모든 level이 채워져 있고, 마지막 level의 경우 왼쪽부터 자식 노드가 최대한 채워진 tree라 보면 된다. 아래 왼쪽 그림과 같이 Complete Binary Tree를 채울 수 있다. 그렇다면 CBT 정의에 의해 아래 오른쪽 그림에서 CBT는 1, 2, 5, 6임을 알 수 있다.

🎊 ② Heap Property → Max-Heap & Min-Heap

🎊 이제 ①을 통해 Shape Property 조건에 대해 알아보았고, 두번째 조건 Heap Property에 대해서 알아보자. 두 property를 모두 만족해야 Binary Heap이라 할 수 있다. 'Heap property: the key stored in each node is either greater than or equal to (≥) or less than or equal to (≤) the keys in the node's children, according to some total order.'

🎊 max-heap: parent key가 child keys보다 크거나 같은 heap / min-heap: parent key가 child keys보다 작거나 같은 heap. 즉 max-heap은 root node가 가장 큰 값을 가지며(child node 값 보다 모두 크거나 같아야 한다), 가장 값이 큰 데이터가 우선적으로 제거된다. 즉 deletion 진행 시 root node부터 제거된다. min-heap은 그 반대. deletion 차례대로 진행하면서 extract된 원소들 차례대로 배치만 해도 자동으로 정렬 수행. max-heap의 경우 최댓값으로 extraction 진행하므로 내림차순 정렬 / min-heap의 경우 오름차순 정렬 결과.

🎊 max-heap과 min-heap은 따라서 priority queue implementation 시 element를 insertion / 또는 가장 작거나 큰 element를 min-heap과 max-heap에서 deletion하는 두 operation에서 효율적이게 많이 쓰인다.

🎊 해당 정렬은 heapsort sorting algorithm(in-place algorithm)으로, tree structure 내에서 서로 node끼리 상대적으로 위치를 바꾸는 알고리즘이므로 in-place 알고리즘이다. (sorting 과정은 아래 내용 상세히 참조) / 아래 그림 왼쪽 max-heap / 오른쪽 min-heap

🎊 binary heap implementations

🎊 heap 구현은 linked list와 array 두 구조로 구현이 가능한데, linked list는 많은 메모리 차지,주로 array로 구현된다. array의 index를 이용해 heap binary tree에 쌓는 과정을 알아보자. binary heap을 array로 구현하기에 앞서서 child & parent node index를 먼저 알아보자

① child & parent's node index

🎊 상위 node의 index가 i라면(index 0부터 시작 시) 하위 left node의 index는 2i+1, 하위 right node의 index는 2i+2이다. 반대로 자식 node의 index가 x라면 paret node의 index는 floor((x-1)/2)

🎊 실제 heapsort는 위 1차원 array 내의 원소를 swap하는 과정으로 1차원 array 한정 내 sorting하는 알고리즘으로 진행된다(in-place)

② building a heap

🎊 그렇다면 주어진 array를 binary heap 형태(max heap 가정)로 저장하는 과정을 한 단계씩 자세히 살펴보자

① 주어진 array의 중간 원소부터(arr[(len(arr)-2)//2]: index 0 기준) 맨 앞 원소까지 for문 반복문으로 아래 swap하는 과정(heapify())을 돌린다.

Q. 주어진 array의 전체를 돌리지 않고 중간 원소부터 돌리는 이유?

A. child node가 없는 parent node는 heapify 과정(아래로 뻗어나가며 swap 하는 과정)이 필요 없기 때문. child node가 존재하는 parent node부터 돌려야 하므로 해당 parent node의 개수는, 전체 index를 N이라 하면 index N//2인 node에서부터 시작한다.

(참고: child node가 존재하는 node를 non-leaf node, child node가 존재하지 않는 노드를 leaf node. 따라서 non-leaf node부터 iteration 진행)

Q. 주어진 array의 중간 원소부터 첫번째 원소까지 거꾸로 돌리는 이유?

A. subarray 자체를 heap으로 바꾸기 위해서는 해당 subarray 내의 모든 subarray도 heap을 만족시켜야 한다. 하지만 첫번째 원소부터 돌리는 순간, 첫번째 원소를 포함한 전체 array 내에 모든 subarray가 이미 heap임을 보장할 리 없으므로 거꾸로 돌려서 아래에서 위로 올라가며 heap을 만족시키는 과정이 올바르다.

② swap하는 heapify() 과정

(1) largest index 구하기

: max heap을 만든다고 가정했을 때, parent node를 A, 두 child node가 있다고 하면 왼쪽부터 B, C라고 하자. max heap이므로 A의 value >= max(B의 value, C의 value)를 무조건 만족해야 한다. 이 때 해당 부등식을 만족시키지 않는, 즉 B, C중 A의 value보다 더 큰 node가 존재한다면 largest라는 index를 B 또는 C의 index로 바꾼다. (만약 B와 C 둘 다 A의 value보다 value값이 크다면 C의 value를 largest로 배정)

(2) 구한 largest index과 본래 parent_idx와 다르다면?

: largest != parent_idx라면 parent node와 largest를 가리키는 node와의 value swap 진행. 해당 swap이후 largest를 가리키는 본래 child node를 parent node로 정해 재귀적으로 위 (1)과 현재 (2)의 과정 재귀적으로 실행

(3) 재귀적으로 아래 subtree까지 실행함으로써 subtree를 max-heap으로 완성

🎊 ex) array [33, 35, 42, 10, 7, 8, 14, 19, 48]이 주어졌다고 하자. heapify()의 swap 과정 그림 일부 참조.

→ 아래 왼쪽 그림에서 last non-leaf node (index 3) 10부터 reverse iteration 진행. (즉 index 3 - 2 - 1 - 0순으로 swap heapify() 실행)

→ 오른쪽 그림은 index 1일 때의 heapify(). index 1일때 heapify()를 진행할 건데, 이는 곧 index 2를 root node로 가지는 sub-tree와 index3을 root node로 가지는 sub-tree가 이미 완성된 상태.

→ 그 다음 줄 왼쪽 그림은 index 1일 때의 heapify() 그림. parent_idx와 largest가 다르므로 swap 진행

→ 그 다음 줄 오른쪽 그림은 index 0일 때의 heapify() 그림. parent_idx와 largest가 다른 경우가 2번 있으므로 swap 2번 진행

∴ 최종 결과 heapify()의 결과로 [48, 35, 42, 33, 7, 8, 14, 19, 10] max heap 완성

def heapify(arr, n, parent_idx):
    largest = parent_idx
    left = parent_idx*2 + 1
    right = parent_idx*2 + 2

    if left < n and arr[left] > arr[largest]:
        largest = left
    
    if right < n and arr[right] > arr[largest]:
        largest = right
    
    if largest != parent_idx: #if changed
        arr[largest], arr[parent_idx] = arr[parent_idx], arr[largest] #swap
        heapify(arr, n, largest) #if swapped, continue with the lower part

arr = [33, 35, 42, 10, 7, 8, 14, 19, 48]

for i in range((len(arr)-2)//2,-1,-1): #heapify
    heapify(arr, len(arr), i)

print(arr) #[48, 35, 42, 33, 7, 8, 14, 19, 10]

③ Time Complexity: heapfiy() O(N) (각 node별 heapify()를 여러 번 시행)

(1) 총 n개의 node가 있다고 가정. 바닥부터 h=0, 그 다음 층 h=1과 같은 방식으로 진행된다고 가정하면, h라는 층에 존재하는 최대 node 개수는 n/(2^(h+1))

(2) last non-leaf node부터 root-node까지(leaf node는 child node없으므로 heapify() 진행 필요 x) 각각 heapify()가 진행된다. 이 때, heapify()에서 swap을 위한 comparison 횟수는 최대 시작 node의 height라 할 수 있다. 따라서 height h에서의 시간복잡도는 O(h) x ⌈(n/2^(h+1))⌉

(O(h)는 시작 node의 height로 일정 상수 c*h로 표현 가능)

(3) 각 모든 층의 시간복잡도를 구하면, n개의 node에서 층의 개수는 logn이므로 아래와 같이 표현할 수 있다. logn을 ∞로 바꾸면 harmonic progression에 의해서 sigma 값은 2가 된다.

(4) c*n이므로 heapify()의 시간복잡도는 O(n)

③ deletion in a heap → heapsort

🎊이미 완성된 binary heap에서 한 원소를 delete한 다는 것은, 가장 상위의 root node를 빼 온다는 뜻이고, max-heap일 경우 원소의 최댓값, min-heap일 경우 원소의 최솟값을 가져오게 된다. root- node를 빼오고 남은 자리에는, bottom-most right-most element를 빼온다. (array구조상 index가 가장 큰, 맨 오른쪽의 원소). 이를 root node에 올려놓는다. bottom-most right-most element를 빼오는 이유는 array상 맨 오른쪽 원소 빼는 pop() 연산이 O(1) 효율적이기 때문.

🎊 ex) 아래의 max heap에서 root node이자 전체 원소의 최댓값 42를 빼고, bottom-most right-most element 10을 맨 위에 올려 놓는다. 이 때, 항상 max-heap 구조를 유지해야 하므로, 위 ② 과정에서 제시한 heapify() 과정대로 swap 진행.
(새로 바뀐 root-node와 child-node를 비교. swap이 진행되면, 이후 아래 부분까지 대소 관계가 max-heap 구조에 맞지 않으면 계속해서 swap 진행)

🎊 update된 max-heap을 array 형태로 표현하면 [35, 33, 14, 19, 7, 8, 10]이 된다. 이와 같이 계속해서 root-node deletion을 진행하면 제일 큰 수부터 차례대로 delete된다. 따라서, delete된 원소를 차례대로 모으면, sort된 결과가 나온다. 이 sorting을 heapsort(increasing-order)라고 부른다.

def max_heap(arr, n, parent_idx):
    largest = parent_idx
    left = parent_idx*2 + 1
    right = parent_idx*2 + 2

    if left < n and arr[left] > arr[largest]:
        largest = left
    
    if right < n and arr[right] > arr[largest]:
        largest = right
    
    if largest != parent_idx: #if changed
        arr[largest], arr[parent_idx] = arr[parent_idx], arr[largest] #swap
        max_heap(arr, n, largest) #if swapped, continue with the lower part

def heapsort(arr):
    
    n = len(arr)

    for i in range((n-1)//2,-1,-1): #(1) building a max heap O(N)
        max_heap(arr, n, i)

    for i in range(n-1,0,-1): #heapsort (until last single element remained)
        arr[i], arr[0] = arr[0], arr[i] #(2) swap with the last and the first(max) element O(1)
        max_heap(arr,i,0) #(3) after swapped, heapify (start with the first element) O(logN)

arr = [33, 35, 42, 10, 7, 8, 14, 19, 48]
heapsort(arr)
print(arr) #[7, 8, 10, 14, 19, 33, 35, 42, 48]

① 주어진 array를 max heap으로 만들기: heapify (O(N))

② max heap에서 root node 빼고 root node 자리에 bottom-most right-most element 넣기 (O(1))

③ bottom-most right-most element 자리 제외한 나머지 tree heapify (O(logN))

(*max_heap 두번째 인자 i가 node 최댓값 번호이므로 sub-tree 범위 제한 알 수 있음)

④ 위 ②와 ③과정 반복. 반복한 결과 heapsort로 오름차순 또는 내림차순 정렬 완성. 시간 복잡도는 O(N + N*(O(1) + O(logN))) = O(N + N + Nlog(N)) = O(2N + NlogN) = O(NlogN))

ex) 아래 예시에서 ① 과정으로 완성된 max heap에서 max_value를 bottom-most right-most element 자리와 swap. 그 결과 10을 root node에 넣는다(O(1)) 그 다음, root node 10부터 heapify 진행(O(logn)) (아래 오른쪽 그림). 그 결과 그 다음 max value 42를 19와 swap 진행. 남은 node에서 계속 같은 과정 반복.

④ insertion in a heap

🎊 이미 완성된 heap에서 맨 마지막에 새로운 원소를 insertion할 수 있다. 이 때, max-heap의 성질을 유지하기 위해 부모 노드 원소와 계속 비교하며 최대 root node 상위까지 swap 진행. (하단 그림 참조) / 맨 마지막에 insertion 하는 이유는, sub-tree max-heap property를 계속 만족시키기 위함. (하단 그림 51을 넣었을 경우 상위 node 까지 3번 swap)

🎊 코드로 구현 했을 때(index 0 시작 기준), child node의 index가 i라면, parent node의 index는 i//2이다. index로 parent와 child 비교하면서 상위까지 swap 진행.

def insert_key(arr,key):
    arr.append(key)
    size=len(arr)
    i=size

    while(i>=1 and arr[i/2] < arr[i]):
        arr[i],arr[i/2] = arr[i/2],arr[i] #swap
        i = (i/2)
    print(arr)

🎊 따라서 1 node insertion 과정은 O(logn) (swap)

⑤ extract max(or min) from heap

🎊 지금 max-heap을 다루고 있으므로, heap의 원소 중 최댓값은 가장 상위 노드에 있는 값이다. 따라서 맨 위의 노드를 빼고, 그 자리에 가장 끝에 있는 원소를 갖다 준다. (하단 그림 참조). 다만 max heap 성질을 계속 유지해야 하므로 가장 끝에 있는 원소가 root- node가 된 이후 heapify() 위 과정 동일하게 진행. 시간복잡도는 O(1) + O(logn)으로 O(logn)

def delete_max(arr):
    size=len(arr)
    if size < 1:
        print('Error')
    
    max_root = arr[0]
    arr[0] = arr[size]
    size=size-1

    A=max_heapify(arr[:size],1)
    return max_root

🎊Binary Heap Time Complexity

heap youtube

binary heap wiki

저작자표시 비영리 변경금지

'Computer Science > Data Structures' 카테고리의 다른 글

➡️ Linked List (0)	2024.09.26
🌉 Monotonic Stack (0)	2024.09.23
🥪Array & Linked List Intro (2)	2024.01.08
🌳 Binary Tree 🌳 (1)	2023.12.19
📚 Map (0)	2023.08.03