Vagrant-PySpark

Vagrant-PySpark is a Vagrant box that can be provisioned with any Spark version, ready to run Spark jobs (included PySpark) and unit testing for PySpark.

It is intended to be used only for development and testing with small data sets.

Set up

To start and provision the vagrant box you must set a file (ansible/variables.yml) with required variables:

Scala version
Spark version
Hadoop version

Versions must match with the one provided here:

For Scala: https://www.scala-lang.org/download/
For Spark and Hadoop: http://spark.apache.org/downloads.html

Variable file should contain following variables:

scala:
  version: 2.11.8
spark:
  version: 2.1.0
hadoop:
  version: 2.7

You can find examples for Spark 1.6.3 and 2.1.0 in this repo:

You can create a symbolic link to use them:

ln -s vars/vars_spark_2.1.0.yml ansible/variables.yml

If you use other versions, PRs are welcome with your version setup.

Required

How to use

Clone your projects

Set up the Vagrant box and clone your projects inside to run your jobs and tests.

Sync you projects folder

You can fork this repo and extend the Vagrant file to sync your projects folder in the Vagrant box. It will allow you to have all your changes immediately available to run in the Vagrant box.

config.vm.synced_folder "/Project/path/in/host/machine", "/Destination/in/vagrant/box"

Copy this project

You can copy this project inside your Spark project and have all together.

PySpark Unit Testing

You can find good explanation and examples here

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ansible		ansible
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Vagrantfile		Vagrantfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vagrant-PySpark

Set up

Required

How to use

Clone your projects

Sync you projects folder

Copy this project

PySpark Unit Testing

About

Releases

Packages

Contributors 2

License

javibravo/vagrant-pyspark

Folders and files

Latest commit

History

Repository files navigation

Vagrant-PySpark

Set up

Required

How to use

Clone your projects

Sync you projects folder

Copy this project

PySpark Unit Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages