Hadoop Distributed File System (HDFS)- HadoopDistributed File System (HDFS) is a block-structured, distributed file system.
Hadoop Distributed Cache
Hadoop Distributed Cache – Distributed Cache is a Hadoop feature that helps cache files needed by applications.
Pig & Hive in Hadoop
Pig & Hive in Hadoop
Pig – is an Apache open-source project and one of the components of the Hadoop eco-system.
Pig – is a high-level data flow scripting language and runs on the Hadoopclusters.
Pig – uses HDFS for storing and retrieving data and Hadoop MapReduce for processing Big Data.
Hive – is a data warehouse system for Hadoop.
Hive – facilitates ad hoc queries and aids analysis of data sets stored in Hadoop.
Hive – provides an SQL like language called HiveQL(HQL)
Hadoop Cloudera
Hadoop Cloudera
Cloudera – is a commercial vendor for deploying Hadoop in an enterprise.
Cloudera – offers ClouderaManager for system management, ClouderaNavigator for data management.
Hadoop ZooKeeper
Hadoop ZooKeeper
Hadoop ZooKeeper is an open source and high performance co ordination service for distributed applications.
Hadoop Pivotal HD
Hadoop Pivotal HD
Hadoop Pivotal HD is a commercially supported, enterprise capable distribution of Hadoop and it aims to accelerate data analytics projects.
Hadoop Sqoop
Hadoop Sqoop
Hadoop Sqoop is an Apache Hadoop ecosystem project. Sqoop’s responsibility is to import or export operations across relational databases.
Apache Oozie
Apache Oozie
Apache Oozie is a workflow scheduler system used to manage Apache Hadoop jobs/MapReduce jobs
Apache Mahout
Apache Mahout
Apache Mahout is library of machine learning algorithams, helps in clustering and Clustering allows the system to group various entities into separate clusters or groups based on certain characteristics or features.
Apache Cassandra
Apache Cassandra
Apache Cassandra Apache Cassandra is an open source, freely distributed, high-performance, extremely scalable, and fault-tolerant post relational database