Helsinki University of Technology



Sama Suomeksi


Front Page

Overview

Schedule

Lectures

Guides

Topics

Teachers


T-106.290 Ohjelmoinnin laboratoriotyöt

Generating Zipf distribution

The Zipf distribution is a good model for many naturally occurring phenomena, such as word frequencies, web site popularity, city or company sizes, sizes of earthquakes, distribution of income, etc.

Expressed mathematically, the probability of the nth most frequent event is proportional to 1/na. Here a is the skewness. If it is zero, the distribution degenerates to a uniform distribution. If a is one, we have the classic Zipfian. Larger values result in greater skew.

Here is one implementation for generating Zipf and some other distributions. Gray et al. give a faster approximate method, but it still involves two calls to computing the Rhieman Zeta function.

The above methods are exact or accurate, but since Zipf's rule is an approximation already by nature, this might not always be needed. Instead you may wish to trade correctness for efficiency with the following code:

    #include <stdio.h>
    #include <malloc.h>
    #include <stdlib.h>
    #include <math.h>


    typedef struct {
      /* Value range is all integers numbers from 1 to N, bounds included. */
      int N;
      /* Skewedness of the distribution.  Must be zero for uniform
         distributions, or positive for increasing skewedness. */
      double a;
      /* Values computed once for all random numbers for the given `N' and `a'. */
      double c1, c2;
    } zipf_t;


    /* Allocate a `zipf_t' and precompute values into it. */
    zipf_t *zipf_init(int N, double a)
    {
      zipf_t *z;
    
      z = malloc(sizeof(zipf_t));
      if (z == NULL) {
        fprintf(stderr, "zipf_init: malloc failed.\n");
        exit(1);
      }
      z->N = N;

      if (fabs(a) < 0.0001)
        z->a = 0;
      else if (fabs(a - 1) < 0.0001) {
        z->a = 1;
        z->c1 = log(N + 1);
      } else {
        z->a = a;
        /* Below pow(x,y) has been rewritten to exp(y*log(x)) in order to
           allow non-integer y. */
        z->c1 = exp((1-a) * log(N+1)) - 1;
        z->c2 = 1 / (1-a);
      }

      return z;
    }


    /* Given the struct the `zipf_init' returns, `zipf' produces
       approximately Zipfian-distributed pseudo-random numbers in the
       range from 1 to N, bounds included. */
    int zipf(zipf_t *z)
    {
      int r;
      double x;

      if (z->a == 0)
        return 1 + (random() % z->N);
      do {
        x = random() / (double) RAND_MAX;
        if (z->a == 1)
          x = exp(x * z->c1);
        else
          x = exp(z->c2 * log(x * z->c1 + 1));
        r = (int) x;
      } while (r <= 1 || r > z->N);	/* Hedging. */
      return r;
    }


Course email: cessu@cs.hut.fi
Kurssin newsgroup: opinnot.tik.labratyot
This page has been last updated on 2005-01-11.