Blog is available in backup mode, go to github-pages for newer posts

How to write composable iterables

This article implies that you have basic knowledge of what are generators. If you have little idea of what they are, there are a lot of articles to get basic knowledge. For intro one can start from MDN Guide on Iterators and Generators. Dr. Axel Rauschmayer has very deep explanation of generators. There are a lot of other good articles as well, with varying level of depth, but almost all of them are focused on first step of the topic: direct usage. Also generators can be used for async programming, which can be summarized in co library intro, but that's another story.

So if we're not using cool features of generators, then what's the buzz? Why not just taking well-known arrays, which has a lot of built-in methods and even more ready-to-use features in libraries like lodash? The key feature of both iterators and generators I'm going to focus on is their ability to be lazy, which gives us 2 benefits: - no memory consumption for intermediate results - ability to work with theoretically infinite iterables

While the idea isn't new at all even in JavaScript: there were attempts to implement possibly infinite streams before, they weren't backed by standard language primitives.

I'd also like to state some obvious thing about RxJS. Just in case. Maybe, you've heard that there's RxJS library and they also have unbound streams. So the statement is that these streams are push-based, while iterables are pull-based. With RxJS streams consumer sits and waits when producer will send them next event. With iterable next item can be pulled whenever consumer wants. If you'd like to learn more about it, there's a great article: A General Theory of Reactivity by Kris Kowal. I'll reference it later as "gtor".

So we have neither standard functions nor well-established library for operations. Actually functify looks pretty nice as well as trine, but the latter uses non-standard syntax.

I'd like to go further and discuss how can we compose generators. Because any things in programming are nothing but nice toys if they can't be composed. For example, callbacks are hard to compose — that's why Promise was introduced into ecmascript. Nonetheless as a basic recap (especially if you digressed for reading other sources). Generators

So, basics and composition.

The most basic generator is a range, which exists in e.g. Python, Ruby, Haskell and probably bunch of other languages. Let's write range implementation (or copy it from gtor ;-)

function* range(start = 0, stop = Infinity, step = 1) {
  while (start <= stop) {
    yield start;
    start += step;
  }
}

The problem is you cannot iterate over the same generator twice:

var r = range(1, 3)
undefined
for (let x of r) console.log(x)
1
2
3
undefined
for (let x of r) console.log(x)
undefined

What's worse, if you pass single range instance to 2 different consumers, they will interfere with each other.

You might want to write an iterator, which doesn't have* such problem but that might be much more code with deeper nesting even using shortcuts:

function range(start = 0, stop = Infinity, step = 1) {
  var current;
  return {
    [Symbol.iterator]() {
      current = start - step;
      return this;
    },
    next () {
      current += step;
      return current <= stop ? { value: current } : { done: true };
    }
  };
}

So are generators inherently worse? Actually, no. Generators actually have two types/states: generator function and generator object. Function is a function and object is an iterator with next method. Initially you declare a generator function, but calling it return generator object, which cannot be iterated over again.

Iterators are actually very similar in structure: there is a method under Symbol.iterator and there's an object with next method, returned from that function. What I'm going at is that it's hard to make such a mistake with iterators. In order to misuse, developer needs to write expression: someIterable[Symbol.iterator](), which is impossible to write by accident.

So probably better way to write range using generator would be:

function range(start = 0, stop = Infinity, step = 1) {
  return function* () {
    var current = start;
    while (current <= stop) {
      yield current;
      current += step;
    }
  };
}

but then if we want to use construction in-line, we'll need to write clumsy for (let x of range(1, 3)()). Both ways cause problems, so are we stuck with iterators? Generator objects can't be replayed and it's really to mix up generator function and generator object unless we're using TypeScript or something similar.

We can always pass non-activated generators and init them only upon actual usage as with utility generators like range, we can use bind, which works perfectly well on them.

Nonetheless, it might be interesting to have a function, which can replay non-replayable generator objects and it turns out it is very similar to traditional memoize wrapper, which is useful when there become expensive computation or what are not.

function memoize (gen) {
  var cache = [];
  var it = gen();
  return function* () {
    for (var i = 0; true; i++) {
      if (i < cache.length) {
        yield cache[i];
      } else {
        let { value, done } = it.next();
        if (done) return value;
        cache.push(value);
        yield value;
      }
    }
  }
}

function filter (fn, gen) {
  return function* () {
    for (let item of gen())
      if (fn(item)) yield item;
  }
}

How effective can be memoize? Well, let's use it to implement a generator of prime numbers.

function isPrime(n) {
  for (let p of primes()) {
    if (p*p > n) return true;
    if (n % p === 0) return false;
  }
}
var primes = memoize(function* () {
  yield 2;
  yield* filter(isPrime, range.bind(null, 3, 2));
});

function isPrime(n) {
  for (let c = 2; c*c <= n; c++) {
    if (n % c === 0) return false;
  }
  return true;
}
function* primes () {
  yield 2;
  yield* filter(isPrime, range.bind(null, 3, Infinity, 2));
};

limit	dumb	memo - 0	memo - 1
1000	76± 8	2.6±0.6	0.11±0.03
2000	119±11	5.9±1.3	0.44±0.05
3000	169±23	10.6±2.2	0.71±0.08
4000	184±18	16.3±3.6	0.96±0.09
5000	218±21	20.8±4.3	1.23±0.10
6000	227±10	24.7±4.9	1.45±0.13
7000	250±13	29.4±5.0	1.67±0.15
8000	283±24	35.7±4.6	1.88±0.18
9000	313±25	42.8±2.7	2.07±0.19
10000	303± 7	48.6±5.2	2.28±0.21