Technology, Blog – Page 29 – Azure, RPA, AI, Selenium, Angular, API

December 30, 2016February 23, 2020

Hadoop MapReduce

Hadoop MapReduce – Hadoop MapReduce is the main core components of Hadoop and is a programming model Hadoop MapReduce helps implementation for processing and generating large data sets, it uses parallel and distributed algorithms on a cluster. Hadoop MapReduce can handle large scale data: petabytes, exabytes.
Mapreduce framework converts each record of input into a key/value pair.

December 30, 2016February 23, 2020

Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS)- HadoopDistributed File System (HDFS) is a block-structured, distributed file system.

December 30, 2016February 23, 2020

Hadoop Distributed Cache

Hadoop Distributed Cache – Distributed Cache is a Hadoop feature that helps cache files needed by applications.

December 30, 2016February 23, 2020

Pig & Hive in Hadoop

Pig – is an Apache open-source project and one of the components of the Hadoop eco-system.
Pig – is a high-level data flow scripting language and runs on the Hadoopclusters.
Pig – uses HDFS for storing and retrieving data and Hadoop MapReduce for processing Big Data.

Hive – is a data warehouse system for Hadoop.
Hive – facilitates ad hoc queries and aids analysis of data sets stored in Hadoop.
Hive – provides an SQL like language called HiveQL(HQL)

December 30, 2016February 23, 2020

Hadoop Cloudera

Cloudera – is a commercial vendor for deploying Hadoop in an enterprise.
Cloudera – offers ClouderaManager for system management, ClouderaNavigator for data management.

December 30, 2016February 23, 2020

Hadoop ZooKeeper

Hadoop ZooKeeper is an open source and high performance co ordination service for distributed applications.

December 30, 2016February 23, 2020

Hadoop Pivotal HD

Hadoop Pivotal HD is a commercially supported, enterprise capable distribution of Hadoop and it aims to accelerate data analytics projects.

December 30, 2016February 23, 2020

Hadoop Sqoop

Hadoop Sqoop is an Apache Hadoop ecosystem project. Sqoop’s responsibility is to import or export operations across relational databases.

December 30, 2016February 23, 2020

Apache Oozie

Apache Oozie is a workflow scheduler system used to manage Apache Hadoop jobs/MapReduce jobs

December 30, 2016February 23, 2020

Apache Mahout

Apache Mahout is library of machine learning algorithams, helps in clustering and Clustering allows the system to group various entities into separate clusters or groups based on certain characteristics or features.