group() vs aggregation framework vs MapReduce in mongodb
The group() command, Aggregation Framework and MapReduce are collectively aggregation features of MongoDB. group(): Group Performs simple aggregation operations on a collection documents. Group is similar to GROUP_BY in mysql. Output format : Returns result set inline. Sharding: Its not support in shared environment. Limitations:
- Will not group into a result set with more than 20,000 keys.(from mongo 2.2 version, in before versions limit is up to 10,000 keys)
- Results must fit within the limitations of a BSON document (currently 16MB).
- Can be used for incremental aggregation over large collections.
Output format: MapReduce provides inline, new collection, merge, replace, reduce output options. Sharding: Its supports for both shared and non-shared collections as input and output.If output collection does not exists then MapReduce creates and shards the collection on _id field. Limitations:
- In MapReduce inline output collection we can’t perform find(), sort(), limit() operations.
- A single emit can only hold half of MongoDB’s maximum BSON document size (16MB).
- New feature in the MongoDB 2.2.0 production release
- Uses a “pipeline” approach where objects are transformed as they pass through a series of pipeline operators such as match, project, sort, group, limit, skip, unwind and geonear.
Output format: Returns result set inline. Sharding: Its supports for both shared and non-shared input collections.When operating with shared collections,It push all operations up to first $group or $sort to all shards,The remaining operations from first $group or $sort are run as second pipeline on shared results.
- Designed with specific goals of improving performance and usability.
- Pipeline operators can be repeated as needed.
- Aggregation frame work is 10 times faster than MapReduce.
- If any single aggregation operation consumes more than 10 percent of system RAM
- Output from the pipeline cannot exceed the BSON document size limit.
- The aggregation pipeline cannot operate on values of the following types: symbol, Minkey, Maxkey, DBRef, Code, CodeWScope