Problems with Mongoid and Sidekiq- Brainstorming

30 Jan 2013

A few weeks back, we started slowly upgrading all of our queues at Gogobot to work with Sidekiq.

Posts on how awesome the experience was and how much better Sidekiq is from Resque coming soon, though with all the good came some bad.

Summary of the solution

With Sidekiq, we are processing around 25X more jobs than what we were doing with Resque, processing around 15,000,000 jobs per day, at paces of over 1K per second at times (at peak we go up well past that)

This is how many jobs we processed today…

Sidekiq history graph for today

And this is a snapshot of our realtime graph

Realtime graph snapshot

On the MongoDB side we are working with Mongoid and we have a shared environment, 9 shards with 3 replicas in each shard, all running through 2 routers.

Our production mongoid config looks like this

production:
  op_timeout: 3
  connection_timeout: 3
  sessions:
    default:
      hosts:
        - HOST_NAME:27017 #Single router R0
      username: USER_NAME
      password: PASSWORD
      database: DATABASE_NAME
      options:
        consistency: :eventual

We are using latest versions of all relevant gems (Sidekiq, Mongoid, Moped, Redis)

All seems fine right? What’s the problem?

The problem is that we have too many connections opening and closing to our mongo instances. (~25-40 new connections per second).

Each time a job is picked up, a connection to Mongo is opened and when the job is done, this connection is closed (using Kiqstand middleware).

This is causing huge loads on our router server, and causing mongo to run out of file descriptors at times.

SO?

More then anything, this post is a callout for discussion with anyone using similar solution with similar scale and can assist, I know I would love to brainstorm on how to solve this problem.

Avi Zurel

Problems with Mongoid and Sidekiq- Brainstorming

Summary of the solution

All seems fine right? What’s the problem?

SO?

Related Posts

clone and pull all of your company's repositories in a single command 20 Mar 2017

The Creative Network - Live streaming a full stack web app 02 Mar 2017

Configuring multiple reportes for Apache Flink 27 Feb 2017