Skip to content

This repository have scripts and documents for installation of different Bigdata components.

Notifications You must be signed in to change notification settings

ideepu/Bigdata-installations

Repository files navigation

Bigdata-installations

This repository have scripts and documents for installation of different Bigdata components.

Apache Cassandra

The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

Apache Cassandra installation steps

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Cassandra/CassandraInstallation

Download the document here

To setup apache cassandra in cluster mode see the documentation

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Cassandra/Cassandra%20Cluster%20setup

Download the document here

See the Apache Cassandra offical document

http://cassandra.apache.org/doc/latest/

Apache Hadoop

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Apache Hadoop installation steps

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Hadoop/HadoopInstallation.sh

Apache Hadoop multinode setup document

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Hadoop/Hadoop_multinode_cluster_installation

Apache NiFi

Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. An easy to use, powerful, and reliable system to process and distribute data.

Apache NiFi installatin steps

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20NiFi/nifi%20installation.sh

Apache NiFi conversions procedures and stpes

https://github.com/PradeepTammali/Bigdata-installations/tree/master/Apache%20NiFi

Invoking Apache NiFi processors and groups from curl commands

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20NiFi/nifi_curl_commands

Apache Spark

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Apache Spark installation steps

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Spark/SparkInstallation.sh

Apache Spark Multinode cluster setup

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Spark/spark_multinode_cluster_installation

Apache Zookeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

Apache Zookeeper installation script

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Zookeeper/zookeeper-installation.sh

Apache Zookeeper deamon script

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Zookeeper/zookeeper.sh

Apache Kafka

Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.

Apache Kafka installation script

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Kafka/kafka-installation.sh

Apache Kafka deamon script

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Kafka/kafka.sh

Apache Kafka cluster setup steps

https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Kafka/kafka_cluster.txt

Apache Hive

The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

Apache Hive installation steps

https://github.com/PradeepTammali/Bigdata-installations/blob/master/HiveInstallation.sh

MongoDB

MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemata.

MongoDB installation steps

https://github.com/PradeepTammali/Bigdata-installations/blob/master/MongodbInstallation.sh

Scala

Scala is a general-purpose programming language providing support for functional programming and a strong static type system. Designed to be concise, many of Scala's design decisions aimed to address criticisms of Java.

Scala installation steps

https://github.com/PradeepTammali/Bigdata-installations/blob/master/ScalaInstallation.sh

Java

Java is a general-purpose computer-programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible.

Java installation script

https://github.com/PradeepTammali/Bigdata-installations/blob/master/JavaInstallation.sh

Releases

No releases published

Packages

No packages published

Languages