Normal unnamed


New York, NY

About MongoDB Engineering

MongoDB (from "humongous") is an open-source document database, and the leading NoSQL database. MongoDB grew out of real needs facing developers building the next generation of applications. In addition to helping our users building fantastic apps, we love participating in the broader open source community. We work on community open source projects as part of our day-to-day activities. We all benefit from the interaction with and feedback from the community.

Small 4388fa83ca32d8417f8f9d9f62849240 bigger Valeri Karpov on

Solving the 4 Problem Categories of Web Development with the MEAN Stack

If you're building a new web application, the problems that you run into generally fit into one of 4 categories:

1) Prototyping: I want to build something quickly.

2) Adapting: I want to be able to easily iterate on my code base. 

3) Testing: I want to make sure that the app works. 

4) Scaling: I want to utilize server resources efficiently.

In this presentation, Valeri Karpov from MongoDB will provide a high-level overview of how the MEAN stack, and in particular Node.js and npm, solve each of these pain points. This talk was recorded at the node.js meetup at Pivotal Labs. This talk was recorded at the NodeJS meetup at Pivotal Labs.


Small andre spiegel André Spiegel on

Tracking Twitter Followers with MongoDB by André Spiegel

(Contributor article “Tracking Twitter Followers with MongoDB by André Spiegel,” Consulting Engineer at MongoDB. Originally appeared on MongoDB blog

As a recently hired engineer at MongoDB, part of my ramping-up training is to create a number of small projects with our software to get a feel for how it works, how it performs, and how to get the most out of it. I decided to try it on Twitter. It’s the age-old question that plagues every Twitter user: who just unfollowed me? Surprising or not, Twitter won’t tell you that. You can see who’s currently following you, and you get notified when somebody new shows up. But when your follower count drops, it takes some investigation to figure out who you just lost.

I’m aware there’s a number of services that will answer that question for you. Well, I wanted to try this myself.

The Idea and Execution

The basic idea is simple: You have to make calls to Twitter’s REST API to retrieve, periodically, the follower lists of the accounts you want to monitor. Find changes in these lists to figure out who started or stopped following the user in question. There are two challenging parts:

  1. When you talk to Twitter, talk slowly, lest you hit the rate limit.

  2. This can get big. Accounts can have millions of followers. If the service is nicely done, millions of users might want to use it.

The second requirement makes this a nice fit for MongoDB.

The program, which I called “followt” and wrote in Java, can be found on github. For this article, let me just summarize the overall structure:

  • The scribe library proved to be a great way to handle Twitter’s OAuth authentication mechanism.

  • Using GET followers/ids, we can retrieve the numeric ids of 5,000 followers of a given account per minute. For large accounts, we need to retrieve the full list in batches, potentially thousands of batches in a row.

  • The numeric ids are fine for determining whether an account started or stopped following another. But if we want to display the actual user names, we need to translate those ids to screen names, using GET users/lookup. We can make 180 of these calls per 15 minute window, and up to 100 numeric ids can be translated in each call. In order to make good use of the 180 calls we’re allowed, we have to make sure not to waste them for individual user ids, but to batch as many requests into each of these as we can. The class net.followt.UserDB in the application implements this mechanism, using a BlockingQueue for user ids.
Small sandeep parikh Sandeep Parikh on

Schema Design for Time Series Data in MongoDB by Sandeep Parikh

(Contributor article “Schema Design for Time Series Data in MongoDB by by Sandeep Parikh," Solutions Architect at MongoDB and Kelly Stirman, Director of Product Marketing at MongoDB. Originally appeared on MongoDB blog)

Data as Ticker Tape

New York is famous for a lot of things, including ticker tape parades.

For decades the most popular way to track the price of stocks on Wall Street was through ticker tape, the earliest digital communication medium. Stocks and their values were transmitted via telegraph to a small device called a “ticker” that printed onto a thin roll of paper called “ticker tape.” While out of use for over 50 years, the idea of the ticker lives on in scrolling electronic tickers at brokerage walls and at the bottom of most news networks, sometimes two, three and four levels deep.

Today there are many sources of data that, like ticker tape, represent observations ordered over time. For example:

  • Financial markets generate prices (we still call them “stock ticks”).

  • Sensors measure temperature, barometric pressure, humidity and other environmental variables.

  • Industrial fleets such as ships, aircraft and trucks produce location, velocity, and operational metrics.

  • Status updates on social networks.

  • Calls, SMS messages and other signals from mobile devices.

  • Systems themselves write information to logs.

This data tends to be immutable, large in volume, ordered by time, and is primarily aggregated for access. It represents a history of what happened, and there are a number of use cases that involve analyzing this history to better predict what may happen in the future or to establish operational thresholds for the system.

Time Series Data and MongoDB

Time series data is a great fit for MongoDB. There are many examples of organizations using MongoDB to store and analyze time series data. Here are just a few:

  • Silver Spring Networks, the leading provider of smart grid infrastructure, analyzes utility meter data in MongoDB.

  • EnerNOC analyzes billions of energy data points per month to help utilities and private companies optimize their systems, ensure availability and reduce costs.

  • Square maintains a MongoDB-based open source tool called Cube for collecting timestamped events and deriving metrics.

  • Server Density uses MongoDB to collect server monitoring statistics.

  • Skyline Innovations, a solar energy company, stores and organizes meteorological data from commercial scale solar projects in MongoDB.

  • One of the world’s largest industrial equipment manufacturers stores sensor data from fleet vehicles to optimize fleet performance and minimize downtime.

In this post, we will take a closer look at how to model time series data in MongoDB by exploring the schema of a tool that has become very popular in the community: MongoDB Management Service (MMS). MMS helps users manage their MongoDB systems by providing monitoring, visualization and alerts on over 100 database metrics. Today the system monitors over 25k MongoDB servers across thousands of deployments. Every minute thousands of local MMS agents collect system metrics and ship the data back to MMS. The system processes over 5B events per day, and over 75,000 writes per second, all on less than 10 physical servers for the MongoDB tier.

Unknown author on

QAing New Code with MMS: Map/Reduce vs. Aggregation Framework by Alex Giamas

(Contributor article by Alex Giames, Co-Founder and CTO of CareAcross. Originally appeared on MongoDB blog)

When releasing software, most teams focus on correctness, and rightly so. But great teams also QA their code for performance. MMS Monitoring can also be used to quantify the effect of code changes on your MongoDB database. Our staging environment is an exact mirror of our production environment, so we can test code in staging to reveal performance issues that are not evident in development. We take code changes to staging, where we pull data from MMS to determine if feature X will impact performance.

As a working example, we can use MMS to calculate views across a day using both Map/Reduce and the aggregation framework to compare on their performance and how they affect overall DB performance.

Our test data consists of 10M entries in a collection named views in the database named CareAcross with entries of the following style:


userId: “userIdName”, date: ISODate(“2013-08-28T00:00:01Z”), url: “urlEntry”,

Using a simple map reduce operation we can sum on our documents values and calculate the sum per userId:
 db.views.mapReduce(function () {emit(this.userId, 1)}, function (k,v) {return Array.sum(v)}, {out:"result"})

The equivalent operation using Aggregation framework looks like this:
db.views.aggregate({$group: {_id:"$userId", total:{$sum:1}}})

The mapReduce function hits the server at 18:54. The aggregation command hits the server at 19:01.

If we compare these two operations across our data set we will get the following metrics from MMS:

Small f6532748ccec0e2a5cfd76256b28d997 Sam Helman on

How we use Go and MongoDB

In this talk, we'll hear from Sam Helman, Software Engineer at MongoDB (formerly 10gen), on how MongoDB is integrating Go into their new and existing cloud tools. Some of the tools leveraging Go include the backup capabilities in MongoDB Management Service and a continuous integration tool.  They see using Go as an opportunity to experiment with new technologies and create a better product for end users. This talk was recorded at the MongoDB User Group meetup at MongoDB.


MongoDB found using the Go language to be extremely satisfying. Between the lightweight syntax, the first-class concurrency and the well documented, idiomatic libraries such as mgo, Go is a great choice for writing anything from small scripts to large distributed applications. In this talk, Sam will go through how the team has integrated Go and why Go and MongoDB are a great match for cloud services.


Unknown author on

MongoDB New York City 2013 (with a HUGE g33ktalk discount)


MongoNYC 2013 is coming in 2 days!

MongoNYC brings together developers, IT professionals and executive decision makers across the MongoDB community for a one-day conference dedicated to the leading NoSQL database. At MongoNYC, you learn development and operations best practices, discover how other businesses are benefiting from MongoDB and network with MongoDB users and ecosystem partners.

Don't miss out!

Use promo code “G33kTalk50″ to save 50%!

Unknown author on

Top Big Data skills? MongoDB and Hadoop

(Contributor article by 10gen, originally appeared on 10gen Blog)

According to new research from the UK’s Sector Skills Council for Business and Information Technology, the organization responsible for managing IT standards and qualifications, Big Data is a big deal in the UK, and MongoDB is one of the top Big Data skills in demand.  This meshes withSiliconAngle Wikibon research I highlighted earlier, detailing Hadoop and MongoDB as the top-two Big Data technologies.

It also jibes with JasperSoft data that shows MongoDB as one of its top Big Data connectors:

Source: Jaspersoft 2012

MongoDB is a fantastic operational data store.  As soon as one remembers that Big Data is a question of both storage and processing, it makes sense that the top operational data store would be MongoDB, given its flexibility and scalability.  Foursquare is a great example of a customer using MongoDB in this way.

On the data processing side, a growing number of enterprises use MongoDB both to store and process log data, among other data analytics workloads.  Some use MongoDB with its built-in MapReduce functionality, while others choose to use the Hadoop connector or MongoDB’s Aggregation Framework to avoid MapReduce.

Whatever the method or use case, the great thing about Big Data technologies like MongoDB and Hadoop is that they’re open source, so the barriers to download, learn, and adopt them are negligible.  Given the huge demand for Big Data skills, both in the UK and globally, according to data from Dice and, it’s time to download MongoDB and get started on your next Big Data project.

More on big data here
More from 10gen here

Want to hear from more top engineers?

Our weekly email contains the best software development content and interviews with top CTOs. Enter your email address now to stay in the loop.

Unknown author on

The ‘middle class’ of Big Data

(Contributor article by 10gen, originally appeared on 10gen Blog.)

So much is written about Big Data that we tend to overlook a simple fact: most data isn’t big at all. As Bruno Aziza writes in Forbes, “it isn’t so” that “you have to be Big to be in the Big Data game,” echoing a similar sentiment from ReadWrite’s Brian Proffitt.  Large enterprise adoption of Big Data technologies may steal the headlines, but it’s the “middle class” of enterprise data where the vast majority of data, and money, is.

There’s a lot of talk about zettabytes and petabytes of data, but as EMA Research highlights in a new study, “Big Data’s sweet spot starts at 110GB and the most common customer data situation is between 10 to 30TB.”

Small? Not exactly But Big? No, not really.

Join Us