Wednesday, August 24, 2011

Identifying producer/consumer scenarios in the wild

I've been working on a data import process the past couple of days, trying to solve some memory issues (OOMEs).  Essential we have a reader (the producer) and a writer (the consumer).  The writer part of this scenario operates much slower than the reader part.  The reader part is implemented as an iterator, so it only produces enough work for the writer to consume.  As this design evolved over time, parallel execution of the writer was added, in an effort to speed up the overall writing process.  The coordination of the parallelization is an ExecutorService implementation.  With this executor service now in place, the iteration of the reader can operate independently of the writer.  Thus, the consumer now starts creating lots of tasks that are then submitted to the executor service, where they queue up.  The executor service is not bounded, so it just keeps accepting tasks.  This wouldn't be a problem if the number of tasks were small and the memory footprint of those tasks was low, but that is not our situation.  Thus, we keep blowing out our Java VM process with OOMEs.  We're in the process of fixing this issue, using a bounded concurrent collection to handle the buffering of items between the reader and the executor service and ultimately the writer.



  1. It's interesting to hear how others are solving this problem.

    I'm actually working on something very similar sounding at work that I'm just digging into (batch loading a bunch of products and rates into a database from a CSV file). We've got a reader that takes a CSV file and I'm planning on having it store the results in a list in Redis. Redis has a built in blocking pop operation that will pop the next item off of a list (or wait if nothing is on the list). It's guaranteed to give the item to only one worker, so it makes producer/consumer very easy.

    Longer term, we're looking at implementing RabbitMQ as a messaging infrastructure (and might run something like Celery on top of it) to solve some other problems, and we might migrate this to using that, but for now, I think Redis blocking lists will work for us.

  2. Ted, thanks for the comment. I really dig how your using Redis to solve a design problem in your space, and it's interesting that both our solutions incorporate similar design structures to keep the consumers well fed. It will be interesting to see how RabbitMQ figures into your solution at work. Perhaps a presentation at some point?