Apache Hadoop
Apache Hadoop is a open source software framework that supports data-intensive distributed applications licensed under the Apache v2 license.
Disco is a lightweight, open-source framework for distributed computing based on the MapReduce paradigm and written in Python.
Disco is an implementation of mapreduce for distributed computing. Disco supports parallel computations over large data sets, stored on an unreliable cluster of computers, as in the original framework created by Google. This makes it a perfect tool for analyzing and processing large data sets, without having to worry about difficult technicalities related to distribution such as communication protocols, load balancing, locking, job scheduling, and fault tolerance, which are handled by Disco.
cluster distributed python mapreduce
Apache Hadoop is a open source software framework that supports data-intensive distributed applications licensed under the Apache v2 license.
Free Open Source Mac OS X Windows Linux
Apache Spark™ is a fast and general engine for large-scale data processing. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Free Open Source Mac OS X Windows Linux
Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.
Free Open Source Mac OS X Windows Linux BSD
Amazon Kinesis services make it easy to work with real-time streaming data in the AWS cloud.
Commercial Web
HPCC Systems offers an open source cluster computing platform used to solve Big Data problems. Its unique architecture and simple yet powerful data programming language...
Free Open Source Linux