Heaps' Law
Law
Given the number of unique words as , and the number of tokens as in a corpus,
Explanation
When the corpus is in English, the constant is typically , and about . Heaps’ law is not derived from a mathematical foundation but empirically obtained.
The formula may seem quite complex at first glance, but if both sides are logarithmically transformed, it becomes , and it can be easily seen that there is a linear relationship. represents the intercept, and represents the slope.