Sharath Chandran – Page 25 – Technology, Blog

December 30, 2016February 23, 2020

JSON(JavaScript Object Notation) vs XML(extensible Markup Language)

JSON is more compact, readable and is an open standard format that uses human-readable text. JSON transimission is much more faster than XML. It is easy to use with Java script.
XML is a markup language is both human-readable and machine-readable, it defines a set of rules for encoding documents in a format, it is also aiming usability across Internet. XML has more data and bulky and slower than JSON.

JSON got a very simple syntax and could be easily learned. But the XML format can be determined by the XML DTD or XML Schema. JSON has the data-exchange format which is getting more popular as it is more JavaScript applications possible format.

December 30, 2016February 23, 2020

Java Advanced definitions

The data structures provided by the Java utility package are the following interface and classes.

Enumeration – Enumeration interface feature is used for retrieving successive elements from a data structure.
BitSet – BitSet classes implement a group of bits or flags.
Vector – Vector class is similar to Java array however it has an ability to add new elements when required.
Stack – Stack class features ‘last in first out’ stack of elements.

Properties – Properties are subclass of Hash table. it is meant to maintain lists of values in which both key & Value are Strings.

Hashtable – Hashtable class is meant to organize data based on some user defined key structure.

Dictionary – Dictionary class is an abstract class that defines a data structure for mapping keys to values.

December 30, 2016June 15, 2020

Hadoop Big Data quick summary

Hadoop – is a Java based programming framework that supports the processing of large data sets in a distributed computing environment
Hadoop – is based on Google File System (GFS)
Hadoop – uses thousands of nodes this is the key to improve performance.
Hadoop – is a Distributed File System or HDFS, which enables fast data transfer among the nodes.
Hadoop Configuration – has got the three modes of Hadoop configuration – Standalone, pseudo distributed, and fully distributed.
Hadoop MapReduce – Hadoop MapReduce is the core components of Hadoop and is a programming model and helps implementation for processing and generating large data sets, it uses parallel and distributed algorithms on a cluster. it can handle large scale data: petabytes, exabytes.
Mapreduce framework converts each record of input into a key/value pair.
Ubuntu Server – Ubuntu is a leading open-source platform. it helps in utilizing the infrastructure to users when they want to deploy a cloud, a web farm, or a Hadoop cluster.
HadoopDistributed File System (HDFS)- HadoopDistributed File System (HDFS) is a block-structured, distributed file system.
Distributed Cache – Distributed Cache is a Hadoop feature that helps cache files needed by applications.

Pig – is an Apache open-source project and one of the components of the Hadoop eco-system.
Pig – is a high-level data flow scripting language and runs on the Hadoopclusters.
Pig – uses HDFS for storing and retrieving data and Hadoop MapReduce for processing Big Data.

Hive – is a data warehouse system for Hadoop.
Hive – facilitates ad hoc queries and aids analysis of data sets stored in Hadoop.
Hive – provides an SQL like language called HiveQL(HQL)

Apache HBase – is a distributed, column oriented database.
Apache HBase – is built on top of HDFS.
Apache HBase – is an open-source, distributed, versioned, non relational database system.
Apache HBase – has two types of Nodes. 1. Master and 2. Region Server.

Cloudera – is a commercial vendor for deploying Hadoopin an enterprise.
Cloudera – offers ClouderaManager for system management, ClouderaNavigator for data management.

ZooKeeper – is an open source and high performance co ordination service for distributed applications.

Pivotal HD – is a commercially supported, enterprise capable distribution of Hadoop and it aims to accelerate data analytics projects.

Sqoop – Sqoop is an Apache Hadoop ecosystem project. Sqoop’s responsibility is to import or export operations across relational databases.

Apache Oozie – is a workflow scheduler system used to manage Apache Hadoop jobs/MapReduce jobs

Mahout – is library of machine learning algorithams, helps in clustering and Clustering allows the system to group various entities into separate clusters or groups based on certain characteristics or features.

Apache Cassandra – Apache Cassandra is an open source, freely distributed, high-performance, extremely scalable, and fault-tolerant post relational database.
Apache Spark – is a powerfull open source processing engine and general MapReduce like engine used for large-scale data processing.

Apache Ambari – Apache Ambari is a completely open operational tool or framework for provisioning, managing, and monitoring Apache Hadoop clusters.
Kerberos – is a third party authentication mechanism. It has a database of the users/services and their respective Kerberos passwords.

Java quick reference – Please click here

December 30, 2016February 23, 2020

Hadoop MapReduce

Hadoop MapReduce – Hadoop MapReduce is the main core components of Hadoop and is a programming model Hadoop MapReduce helps implementation for processing and generating large data sets, it uses parallel and distributed algorithms on a cluster. Hadoop MapReduce can handle large scale data: petabytes, exabytes.
Mapreduce framework converts each record of input into a key/value pair.

December 30, 2016February 23, 2020

Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS)- HadoopDistributed File System (HDFS) is a block-structured, distributed file system.

December 30, 2016February 23, 2020

Hadoop Distributed Cache

Hadoop Distributed Cache – Distributed Cache is a Hadoop feature that helps cache files needed by applications.

December 30, 2016February 23, 2020

Pig & Hive in Hadoop

Pig – is an Apache open-source project and one of the components of the Hadoop eco-system.
Pig – is a high-level data flow scripting language and runs on the Hadoopclusters.
Pig – uses HDFS for storing and retrieving data and Hadoop MapReduce for processing Big Data.

Hive – is a data warehouse system for Hadoop.
Hive – facilitates ad hoc queries and aids analysis of data sets stored in Hadoop.
Hive – provides an SQL like language called HiveQL(HQL)

December 30, 2016February 23, 2020

Hadoop Big Data quick summary

Pig – is an Apache open-source project and one of the components of the Hadoop eco-system.Pig – is a high-level data flow scripting language and runs on the Hadoopclusters.Pig – uses HDFS for storing and retrieving data and Hadoop MapReduce for processing Big Data.

Apache Cassandra – Apache Cassandra is an open source, freely distributed, high-performance, extremely scalable, and fault-tolerant post relational database.Apache Spark – is a powerfull open source processing engine and general MapReduce like engine used for large-scale data processing.

Java quick reference – Please click here

Hadoop Cloudera

Hadoop ZooKeeper

Hadoop Pivotal HD

Pig – is an Apache open-source project and one of the components of the Hadoop eco-system.
Pig – is a high-level data flow scripting language and runs on the Hadoopclusters.
Pig – uses HDFS for storing and retrieving data and Hadoop MapReduce for processing Big Data.

Apache Cassandra – Apache Cassandra is an open source, freely distributed, high-performance, extremely scalable, and fault-tolerant post relational database.
Apache Spark – is a powerfull open source processing engine and general MapReduce like engine used for large-scale data processing.