Building and Running the Project

These instructions will work using the files at this commit:

git checkout 800b1f59edaa20a9b65f32a815605307e1102baa

First, you need to download the small sample of the stack overflow data that can be found here:

https://drive.google.com/open?id=0B0uip08Km2LPVTFTRFhrdHF2WW8

Put it in a directory at the project's root called ./stackoverflow_dataset

Next, the following programs need to be installed on your system (homebrew was used for easy installation on OSX)

Spark:

brew install apache-spark

Scala:

brew install scala

Maven:

brew install maven

To build and run the project locally you need to set versions in the pom.xml file to match those of the programs installed on your system. the following lines need to be updated in the pom.xml file:

<dependencies>
  <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.11.8</version>                     <<<<
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>      <<<<
    <version>2.0.2</version>                      <<<<
  </dependency>
</dependencies>

running:

spark-shell

should give you output that will tell you your versions similar to this:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.2
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
Type in expressions to have them evaluated.
Type :help for more information.

Having edited this pom.xml file, run the following from the root of the project to compile:

mvn clean package

This should run successfully (and will probably download and install a whole bunch of stuff the first time you run it...)

To run the compiled application:

cd target
spark-submit --class ClusterSOData.Main --master local KMeans-0.0.1.jar

That should run without errors, producing an output folder. Check that something has been generated by running:

cat output/part-00000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes.mkdn

Notes.mkdn

Building and Running the Project

Files

Notes.mkdn

Latest commit

History

Notes.mkdn

File metadata and controls

Building and Running the Project