Trie
A Trie, short for retrieval, is a specialized tree-based data structure primarily used for efficient storing, searching, and retrieval of strings over a given alphabet. It excels in scenarios where a large collection of strings needs to be managed and pattern-matching operations need to be performed with optimal efficiency.
A Trie, often referred to as a prefix tree, is constructed to represent a set of strings where each node in the tree corresponds to a single character of a string. The path from the root node to a particular node represents the characters of a specific string. This structural characteristic allows Tries to effectively share common prefixes among strings, leading to efficient storage and retrieval.
Why You Need a Trie Data Structure?
Tries are commonly employed in applications such as spell checking, autocomplete suggestions, and searching within dictionaries or databases. They excel at these tasks because they minimize the search complexity in proportion to the length of the target string, making them significantly more efficient than other data structures like binary search trees.
Representation of Trie Node
The Trie node has an array or list of children nodes, typically of size 26 to represent the English lowercase alphabets (a-z). It can also be implemented with a Hash Map. Additionally, there's a boolean flag isEndOfWord to indicate whether the current node marks the end of a word in the Trie. This is needed to differentiate between words like app and apple, where if apple is inserted then search(app) should return False.
Problems
https://leetcode.com/problems/implement-trie-prefix-tree/
class TrieNode:
def __init__(self):
self.children = {}
self.isEnd = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
currentRoot = self.root
for letter in word:
if letter not in currentRoot.children:
currentRoot.children[letter] = TrieNode()
currentRoot = currentRoot.children[letter]
currentRoot.isEnd = True
def search(self, word: str) -> bool:
currentRoot = self.root
for letter in word:
if letter not in currentRoot.children:
return False
currentRoot = currentRoot.children[letter]
return currentRoot.isEnd
def startsWith(self, prefix: str) -> bool:
currentRoot = self.root
for letter in prefix:
if letter not in currentRoot.children:
return False
currentRoot = currentRoot.children[letter]
return True
Below is the clean way to think about time + space for your exact implementation (TrieNode has a hash map/dict children).
I’ll use:
- L = length of the input string (word or prefix)
- N = number of words inserted
- S = total number of characters inserted across all words =
sum(len(word_i)) - Σ = alphabet size (e.g., 26 lowercase letters) — relevant mostly if you use arrays instead of dicts
Important clarification about startsWith()
startsWith(prefix) must read the prefix to know whether the trie contains that path. So it cannot be better than O(L) in the normal model (you have to inspect L characters). It’s also not O(n) in the number of stored words; it doesn’t scan the trie/word list.
So:
startsWith()is O(L), not O(N).
Method-by-method complexity (your dict-based trie)
1) insert(word)
What it does: Walks L characters, creates nodes only when missing.
Time
- Best:
O(L)
Example: inserting"app"into an empty trie still touchesa,p,p. - Average:
O(L)(assuming average O(1) dict ops) - Worst:
O(L)algorithmically, but with hashing caveat: if dict operations degrade, it can be worse.
Practical interviews typically treat dict access as O(1) average, so worst is still reported asO(L).
Space
- Best (additional space for this insert):
O(1)
Example: trie already contains"apple"and you insert"app"→ you create no new nodes, just setisEnd=True. - Worst (additional space):
O(L)
Example: insert"zoo"when no path exists → you create L new nodes. - Over all inserts (total space of the trie):
O(S)nodes
Because each new character-edge creates a node once.
Tiny example
Insert: ["car", "cat"]
- Nodes:
c -> a -> randc -> a -> tsharec,a - Total nodes created ≈ number of unique prefix characters → ≤ total characters inserted.
2) search(word)
What it does: Walks L characters; fails early if missing; finally checks isEnd.
Time
- Best:
O(1)
Example: search"z"when root has no'z'child → fails immediately. - Average:
O(L) - Worst:
O(L)
Example: search"apple"where all letters exist (either it’s a word, or it’s a prefix but not end).
Space
O(1)auxiliary space (you only keep a pointercurrentRoot)
Example
Trie has "apple".
search("app")traverses 3 chars then returnsFalseifisEndisn’t set at that node.search("apple")traverses 5 chars then returnsTrue.
3) startsWith(prefix)
What it does: Exactly like search, except it doesn’t require isEnd=True at the end.
Time
- Best:
O(1)
Example: prefix starts with a letter not in root → fails immediately. - Average:
O(L) - Worst:
O(L)
Example: prefix exists fully (like"app"in trie containing"apple")
Space
O(1)auxiliary space
Example
Trie has "apple".
startsWith("app")→ True (path exists)startsWith("apz")→ False (fails atz)
Summary table (your implementation)
| Method | Best Time | Avg Time | Worst Time | Aux Space |
|---|---|---|---|---|
insert(word) |
O(L) | O(L) | O(L) | O(1) best / O(L) worst (per insert) |
search(word) |
O(1) | O(L) | O(L) | O(1) |
startsWith(prefix) |
O(1) | O(L) | O(L) | O(1) |
And total trie space after building: O(S) nodes (plus hash maps stored at each node).
Dict-children vs array-children (why Σ sometimes appears)
If instead of children = {} you used a fixed array of size Σ at each node:
- Time still O(L) (index lookup is O(1))
- But space becomes O(S · Σ) in the naive form (every node stores Σ slots), which is huge if Σ is large.
Your dict approach is usually described as: - Space O(S) nodes, but with hash-map overhead.
Fill-in for your note template
- Time:
O(L)→ Reason: each operation walks character-by-character down the trie; dict lookup per character is O(1) average. - Space:
O(S)→ Reason: across all inserts, you create at most one node per new character position across all stored words (shared prefixes reuse nodes). Per single insert, extra space is betweenO(1)andO(L).
If you want, paste the exact prompt you’re answering (LeetCode style vs notes style) and I’ll tailor the wording to match what interviewers expect (some want “worst-case dict lookup caveat,” most don’t).