Abstract
We investigate the problem of finding the frequent items in a continuous data stream. We present an algorithm called λ-Count for computing frequency counts over a user specified threshold on a data stream. To emphasize the importance of the more recent data items, a fading factor λ is used. Our algorithm can detect ε-approximate frequent items of a data stream using O(logλε) memory space and O(1) time to process each data record. The computation time for answering each query is O(logλ ε), and for answering a query about the frequentness of a given data item is O(1). Experimental study shows that λ-Count outperforms other methods in terms of accuracy, memory requirement, and processing speed.
| Original language | English |
|---|---|
| Pages (from-to) | 1545-1554 |
| Number of pages | 10 |
| Journal | Journal of Computers (Finland) |
| Volume | 7 |
| Issue number | 7 |
| DOIs | |
| State | Published - 2012 |
Keywords
- Data mining
- Data stream
- Frequent items