Algorithms

June 05, 2023

An AVL tree (Adelson-Velsky and Landis tree) is a self-balancing binary search tree in computer science.

The term "self-balancing" means that it automatically maintains its height-balanced property where the difference in heights between
the left and the right subtrees for every node is never more than one.

The advantage of an AVL tree comes into play when you're frequently performing lookups, as this data structure guarantees **O(log n)** lookup times.

The balance factor in an AVL tree is a pivotal element that maintains the tree's height-balanced property.

It is calculated by subtracting the height of the right subtree from the height of the left subtree of any given node.

Therefore, for every node in the tree, the balance factor can only be `-1`

, `0`

, or `1`

.

A balance factor of `0`

signifies a perfectly balanced node, with both left and right subtrees having equal heights.

`-1`

or `1`

indicates that one side is heavier, i.e., it has more nodes than the other side, but it is within the acceptable limit.

If the balance factor of any node goes beyond this range, then re-balancing procedures such as rotations are required to restore the AVL property.

Insertions in an AVL tree follow a similar path as in a simple binary search tree (BST), where the new node is placed at the appropriate location based on its value.

However, after every insertion, the tree needs to check whether the AVL property (balance factor of `-1`

, `0`

, or `1`

for every node) still holds.

If the insertion of a new node causes the tree to become unbalanced, then we must perform rotations to bring the tree back into balance.

The type of rotation used depends on whether the imbalance occurred on the left or the right, and whether it was caused by an insertion into the left or right subtree.

Rotations are the primary tool for maintaining balance in an AVL tree.

There are four types of rotations: left rotation, right rotation, left-right rotation, and right-left rotation.

Left and right rotations are used when the balance factor of a node becomes `2`

or `-2`

, indicating that the tree is heavily skewed to one side.

For instance, if a node has a balance factor of `2`

, it means the left subtree is deeper than the right, and a right rotation is needed. On the other hand,
left-right and right-left rotations are used when the tree is skewed but the subtree causing the imbalance is skewed in the opposite direction,
also known as a double rotation.

Deletions in an AVL tree also follow the pattern of deletions in a binary search tree, where you would remove the node and then replace it with its in-order predecessor
or successor to maintain the binary search property. However, similar to insertions, the tree must verify that the AVL property still holds after the deletion.

If not, the tree must perform rotations to restore balance.

This could potentially involve multiple rotations, as a deletion could cause an imbalance at a node higher up in the tree.

Searching in an AVL tree is carried out just like in a binary search tree.

Starting at the root, you compare the value you're searching for with the value of the current node.

If they are equal, you've found the value. If the value you're searching for is less than the current node's value, you move to the left child,
and if it's greater, you move to the right child. You then repeat this process until you've found the value or have reached a null child.

The advantage of searching in an AVL tree over a normal, unbalanced binary search tree is that you are guaranteed that this process will take at most O(log n) steps, where n is the number of nodes in the tree.

This makes searching very efficient, even for large datasets.

Recursion is a powerful tool when dealing with data structures like AVL trees, among other binary trees. This is primarily because these structures have a recursive nature: a tree is a node that links to other smaller trees. As such, operations on the tree often involve performing the same operation on each of its subtrees.

One of the biggest advantages of using recursion is that it tends to simplify code, making it easier to understand. It naturally follows the structure of the tree in a top-down manner and applies operations to the entire structure in a way that is easy to follow. This is especially true when considering tree traversals, such as in-order, pre-order, or post-order traversals, which can be implemented in a very straightforward way with recursion.

Additionally, when we look at more complex operations, such as node insertion or deletion, the utility of recursion becomes even more apparent. In an AVL tree, whenever we insert or delete nodes, it's necessary to check whether the tree remains balanced. If it isn't, we need to rebalance the tree by performing rotations. Starting from the node we've just handled, we can move up the tree, adjusting balance factors and performing rotations as necessary. This type of up-the-tree propagation aligns perfectly with the nature of recursive function calls.

Also, the calculation of the height of a tree or subtree for determining the balancing factor is a task that suits recursion. We can start at a leaf and move up the tree, incrementing the height count by one at each level.

However, it's worth noting that while recursion offers these advantages, it's not without its potential issues. For instance, if you're dealing with a deeply nested tree, recursion could lead to a stack overflow error due to the high number of recursive calls. Each call adds a layer to the system call stack, and this could exhaust the stack space for very deep trees. In such scenarios, an iterative solution might be more suitable, despite potentially being more complex to write and understand. Despite this caveat, recursion remains a popular and powerful approach to handle AVL trees and other similar data structures.

While recursion is intuitive when dealing with tree-based data structures like AVL trees, iteration can often be more performant and efficient, particularly in certain scenarios.

First, it's important to understand that recursion comes with a cost. Each recursive call adds a new layer to the system's call stack. This includes the overhead of function calls, such as storing the return address, local variables, and parameters, which consumes memory. In scenarios where trees are deeply nested, the high number of recursive calls could potentially exhaust the available stack space, leading to a stack overflow error.

Iterative methods, on the other hand, are generally more space-efficient because they do not suffer from the overhead of repeated function calls. Iteration utilizes a loop construct to repeatedly perform operations, which can result in less memory usage compared to a recursive solution. This can be particularly advantageous in scenarios with large datasets or deeply nested trees.

Furthermore, most modern programming languages are optimized for iterative loops, which can make them faster to execute in many cases.

However, it's crucial to note that the transition from recursion to iteration isn't always straightforward, especially in complex tree operations. The control flow that is implicitly handled by the system's call stack in a recursive approach has to be explicitly managed in an iterative approach. This often involves the use of auxiliary data structures like stacks or queues, which can make the code more complex and harder to understand.

In summary, while recursion can make AVL tree operations more intuitive and straightforward, iteration can often provide superior performance and efficiency, particularly with large or deeply nested trees. The decision to use recursion or iteration should therefore take into consideration not only the nature of the problem and the specific requirements of the task at hand, but also the trade-offs in terms of performance, memory usage, and code complexity.

There are several alternatives to AVL trees, each with its own advantages and trade-offs.

Alternatives include:

**Red-Black Tree**: Red-Black Trees are another form of self-balancing binary search trees. They are simpler to implement compared to AVL trees and provide relatively balanced trees (although not as strictly balanced as AVL trees). Each node has an extra bit for denoting the color of the node, either red or black. These colors are used to ensure the tree remains approximately balanced during insertions and deletions.**B-Trees**: B-Trees are well-suited for systems with large amounts of data and particularly for systems that read and write from disk, such as databases or file systems. A B-Tree of order m can have at most m-1 keys and m children. B-Trees reduce the number of traversal steps in finding or inserting entries because each node can hold more than one key.**Splay Trees**: Splay trees are binary search trees that automatically move frequently accessed elements nearer to the root to speed up subsequent accesses to those elements, a process known as splaying. Splay trees don't need to maintain a strict balance and can be efficient in scenarios where a few items are accessed far more often than others.**2-3 Trees**: 2-3 Trees are another form of balanced search trees where every node can have either one or two keys and one, two, or three children, respectively. 2-3 trees ensure that all leaf nodes are at the same level, thereby maintaining balance.**Tries: Tries**(also known as prefix trees) are a tree-like data structure that prove to be very efficient for operations like word search, prefix matching, and so on. Each node in the trie could have as many children as the number of characters of the alphabet.

Orama uses AVL trees to store and perform search operations on numeric data.

With version 1.0.1, Orama switched from a recursive implementation to an iterative implementation of AVL trees.

This change was made to improve performance and reduce memory usage, which was likely to be a bottleneck for large datasets.

You can find the complete implementation of the AVL tree in the Orama repository.

In conclusion, AVL trees, with their characteristic of self-balancing, offer a reliable and efficient means for implementing binary search trees. They ensure optimal performance during lookups, keeping operations within the logarithmic time complexity, even with a large dataset. We delved into the fundamentals of AVL trees, including their balance factor, node insertion, deletion, and rotation procedures, alongside search functionality.