Skip to content

tubemogul/puppet-druid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Puppet Forge latest release Puppet Forge downloads Puppet Forge score

Table of Contents

  1. Overview
  2. Module Description
  3. Setup
  4. Usage
  5. Reference
  6. Limitations
  7. Development

Overview

Puppet module to manage Druid based on the Imply.io stack. This module manage all the Druid daemons and Pivot.

Some modifications will be implemented to support the Druid.io version in a near future.

Module Description

This module will deploy the Imply.io tarball (See: http://imply.io/download) and will give you the possibility to start the different Druid services but also Pivot.

More information about the Imply.io bundle here: http://imply.io.

Setup

What druid affects

Files managed by this module:

  • Deploy the imply tarball using Archive: puppet-archive
  • Modify configuration in (by default): /opt/imply/conf
  • Manage all Druid and Pivot services: /etc/init.d/druid-*

If asked, the module will also deploy Java and Nodejs.

Setup Requirements

Usage

Deploy the version 1.1.0 of the Imply bundle:

class { 'druid':
  imply_version => '1.1.0'
}

If you also want to install Java:

class { 'druid':
  install_java => true,
}

By default, the package 'openjdk-8-jdk' from the PPA ppa:openjdk-r/ppa' will be deployed. You can override this configuration.

Configure a Master node:

class { 'druid': }
class { 'druid::coordinator': }
class { 'druid::overlord': }

Configure a Data node:

class { 'druid': }
class { 'druid::middle_manager': }
class { 'druid::historical': }

Configure a Query node:

class { 'druid': }
class { 'druid::broker': }
class { 'druid::pivot': }

By default the class druid::pivot will not deploy Nodejs. You can use another Puppet module to deploy it before starting Pivot or use the install_nodejs parameter:

class { 'druid::pivot':
  install_nodejs => false,
}

Here is an example with MySQL as a Metadata Storage and Statsd emitter for the performance metrics:

class { 'druid':
  java_classpath_extensions => [
    'io/druid/extensions/mysql-metadata-storage/0.8.2/mysql-metadata-storage-0.8.2.jar',
    'mysql/mysql-connector-java/5.1.34/mysql-connector-java-5.1.34.jar'
  ],
  common_config => {
    'extensions'          => {
      'localRepository' => 'dist/druid/extensions-repobla',
      'coordinates'     => [],
    },
    'metadata' => {
      'storage' => {
        'type'      => 'mysql',
        'connector' =>  {
          'connectURI' => 'jdbc:mysql://db.example.com:3306/druid',
          'user'       => 'foo'
          'password'   => 'bar',
        }
      },
    },
    'emitter' => 'statsd',
    'emitter.statsd.hostname' => 'localhost',
    'emitter.statsd.port'     => 8125,
  }
}

Deploy the coordinator with some specific configuration:

class { 'druid::coordinator':
  config => {
    'coordinator' => {
      'period'                => 'PT30S',
      'period.indexingPeriod' => 'PT900S',
    }
  }
}

Logstash:

This module allows one to optionally add a second log4j2 appender that writes to a json_lines enabled logstash TCP socket (using https://github.com/DNSBelgium/log4j-jsonevent-layout)

class { 'druid':
  logstash_server      => 'log-endpoint.server.rocks',
  logstash_port        => 4561,
  logstash_user_fields => "servertype:druid, ip:${::ipaddress}",
}

It is recommended to add: -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector to your jvm options when using the logstash output so as to avoid blocking.

Reference

Public Classes

Class: druid

Fetch and deploy the Imply.io tarball.

This class will also deploy the Druid common configuration. See: Configuring Druid

Parameters within druid:

imply_version

Version of the Imply.io tarball to deploy. See: http://imply.io/download

Default: 1.2.1

install_method

Define the installation method. For now, only tarball is supported.

Default: tarball

install_dir

Where to deploy the tarball

Default: /opt

install_link

Name of the destination link

Default: imply

install_java

If true, the module will try to install Java. This parameter is used with java_ppa and java_package.

For the moment, only Debian link distribution are supported.

Default: false

java_ppa

Define the name of the Ubuntu PPA which will be used to deploy Java

Requirement: $osfamily == Debian

Default: ppa:openjdk-r/ppa

java_package

Package name of Java

Default: openjdk-8-jdk

java_home

Java Home directory

Default: /usr/lib/jvm/java-8-openjdk-amd64

config_dir

Druid configuration directory

Default: /opt/imply/conf/druid

dist_dir

Druid distribution directory

Default: /opt/imply/dist/druid

user

Druid username

Default: druid

group

Druid group name

Default: druid

enable_service

If true, the module will start the Druid services and restarts them when configuration changes are applied

Default: true

java_classpath

Define where Druid will find all the JAR

Default: /opt/imply/dist/druid/lib/*

java_classpath_extensions

Define the list of Java extensions to load at the Druid services start

Example, If you want to use MySQL as your metadata storage:

druid::java_classpath_extensions:
  - 'io/druid/extensions/mysql-metadata-storage/0.8.2/mysql-metadata-storage-0.8.2.jar'
  - 'mysql/mysql-connector-java/5.1.34/mysql-connector-java-5.1.34.jar'

Default: []

log_dir

Log directory

Default: /var/log/druid

common_config

Hash defining the Druid Common configuration

See: http://druid.io/docs/latest/configuration/index.html

Default: {}

logstash_server

Hostname or IP of a logstash server listening for json_lines via TCP. Enables the appender when defined.

Default: undef

logstash_port

Port for the above server.

Default: 4561

logstash_user_fields

String of key:value pairs separated by commas. Allows one to define custom fields in the json being sent to logstash. Ex: "hostname:druidbox01, region:us-east-1"

Default: ''

Class: druid::coordinator, druid::overlord, druid::historical, druid::middle_manager, druid::broker

Each Druid Node (See http://druid.io/docs/latest/design/design.html) has its own Puppet class.

Each of these classes will use the Puppet Type druid::node to define the configuration and the daemon to start.

Parameters within druid::coordinator, druid::overlord, druid::historical, druid::middle_manager, druid::broker`:

service

Name of the Druid Node

Default: name of the class. For druid::coordinator, $service == 'coordinator'

host

Listening host of the Druid Node

Default: localhost

port

Listening port of the Druid Node

Default: 8083

java_opts

Java options for the Java daemon

Default: []

config

Hash defining the configuration of the Druid Node

Default: {}

Class: druid::pivot

This class will deployed and configure pivot.

Parameters within druid::pivot`:

home_dir

String setting the home directory for the imply-ui distribution

Default: /opt/imply/dist/imply-ui

config_dir

String setting the configuration directory of the imply-ui distribution

Default: /opt/imply/conf/pivot

state_store

Hash defining the configuration of the state storage options for the imply-ui

Default: {}

port

Port of Pivot

Default: 9095

broker_host

Broker host used by Pivot

Default: localhost:8082

enable_stdout_log

Print logs to stdout

Default: true

enable_file_log

Enable file logging

Default: true

log_dir

Location for Pivot log files

Default: /var/log/pivot

pivot_license_source

Location for Pivot license source

Default: undef

max_workers

Max number of worker processes

Default: 0

use_segment_metadata

If true, use a segment metadata query instead of a GET request to /druid/v2/datasources to determine datasource dimensions and metrics.

Default: false

source_list_refresh_interval

Check for new dataSources periodically. Set to 0 to disable background introspection

Default: 0

source_list_refresh_onload

Checks for new dataSources every time Pivot is loaded

Default: false

install_nodejs

If true, the module will install NodeJS

Default: false

nodejs_version

Version of NdeJS to install

Default: latest

Limitations

This module has only been tested with Ubuntu 14.04 and Puppet 3.8.x but should work with any other Linux distribution.

Since the module uses a Launchpad PPA if java_ppa is not set as undef, you will have to change the default value if you are not on Debian-like OS.

Development

See CONTRIBUTING.md