group() vs aggregation framework vs MapReduce in mongodb
The group() command, Aggregation Framework and MapReduce are collectively aggregation features of MongoDB. group(): Group Performs simple aggregation operations on a collection documents. Group is similar to GROUP_BY in mysql. Output format : Returns result set inline. Sharding: Its not support in shared environment. Limitations:
- Will not group into a result set with more than 20,000 keys.(from mongo 2.2 version, in before versions limit is up to 10,000 keys)
- Results must fit within the limitations of a BSON document (currently 16MB).
- Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
MapReduce():
- Can be used for incremental aggregation over large collections.
- There have been significant improvements in Map/Reduce in MongoDB version 2.4. The SpiderMonkey JavaScript engine has been replaced by the V8 JavaScript engine, and there is no longer a global JavaScript lock, which means that multiple Map/Reduce threads can run concurrently.
Output format: MapReduce provides inline, new collection, merge, replace, reduce output options. Sharding: Its supports for both shared and non-shared collections as input and output.If output collection does not exists then MapReduce creates and shards the collection on _id field. Limitations:
- In MapReduce inline output collection we can’t perform find(), sort(), limit() operations.
- A single emit can only hold half of MongoDB’s maximum BSON document size (16MB).
- The Map/Reduce engine is still considerably slower than the aggregation framework, for two main reasons: (1)The JavaScript engine is interpreted, while the Aggregation Framework runs compiled C++ code.(2)The JavaScript engine still requires that every document being examined get converted from BSON to JSON; if you’re saving the output in a collection, the result set must then be converted from JSON back to BSON.
Aggregation Framework:
- New feature in the MongoDB 2.2.0 production release
- Uses a “pipeline” approach where objects are transformed as they pass through a series of pipeline operators such as match, project, sort, group, limit, skip, unwind and geonear.
Output format: Returns result set inline. Sharding: Its supports for both shared and non-shared input collections.When operating with shared collections,It push all operations up to first $group or $sort to all shards,The remaining operations from first $group or $sort are run as second pipeline on shared results.
- Designed with specific goals of improving performance and usability.
- Pipeline operators can be repeated as needed.
- Aggregation frame work is 10 times faster than MapReduce.
Limitations:
- If any single aggregation operation consumes more than 10 percent of system RAM
- Output from the pipeline cannot exceed the BSON document size limit.
- The aggregation pipeline cannot operate on values of the following types: symbol, Minkey, Maxkey, DBRef, Code, CodeWScope