This repository have scripts and documents for installation of different Bigdata components.
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
Apache Cassandra installation steps
Download the document here
To setup apache cassandra in cluster mode see the documentation
Download the document here
See the Apache Cassandra offical document
http://cassandra.apache.org/doc/latest/
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Apache Hadoop installation steps
Apache Hadoop multinode setup document
Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. An easy to use, powerful, and reliable system to process and distribute data.
Apache NiFi installatin steps
Apache NiFi conversions procedures and stpes
https://github.com/PradeepTammali/Bigdata-installations/tree/master/Apache%20NiFi
Invoking Apache NiFi processors and groups from curl commands
https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20NiFi/nifi_curl_commands
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
Apache Spark installation steps
Apache Spark Multinode cluster setup
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
Apache Zookeeper installation script
Apache Zookeeper deamon script
https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Zookeeper/zookeeper.sh
Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Apache Kafka installation script
Apache Kafka deamon script
https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Kafka/kafka.sh
Apache Kafka cluster setup steps
https://github.com/PradeepTammali/Bigdata-installations/blob/master/Apache%20Kafka/kafka_cluster.txt
The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
Apache Hive installation steps
https://github.com/PradeepTammali/Bigdata-installations/blob/master/HiveInstallation.sh
MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemata.
MongoDB installation steps
https://github.com/PradeepTammali/Bigdata-installations/blob/master/MongodbInstallation.sh
Scala is a general-purpose programming language providing support for functional programming and a strong static type system. Designed to be concise, many of Scala's design decisions aimed to address criticisms of Java.
Scala installation steps
https://github.com/PradeepTammali/Bigdata-installations/blob/master/ScalaInstallation.sh
Java is a general-purpose computer-programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible.
Java installation script
https://github.com/PradeepTammali/Bigdata-installations/blob/master/JavaInstallation.sh