AISPProperties

AISP Configuration Properties

The core library uses a number of Java system properties to modify behavior. Mostly these would not need to be modified, but in some cases (i.e. OOM exception), they may need to be. These properties are referred to as AISP properties because in addition to setting them as system properties using -D options to the JVM, their values can be defined in a caa.properties file available in the AISP_HOME or current directory.

Feature Extraction

data.processor.cache.enabled - controls whether or not results produced by AbstractCachingWindowProcessor and AbstractCachingMultiFeatureProcessor and their sub-classes (primarily feature extractors) cache the results of their computations in a global memory-based cache. This is important to cache the results of individual feature computations, for example.
This should generally be set to true all the time in order to improve performance and avoid re-computation of features.
Default is true.
feature.extraction.batched.threads - controls the number of threads used to compute the feature grams in a batch of data windows (see also feature.iterable.batch_size This would normally be the number of available processors on the machine, but in some cases of large numbers of processors (i.e. Power with 160 processors), the JVM has encountered OOM exceptions.
Feature gram extraction on a window is already parallel and so parallelism within the batch is less important. If you are receiving OOM exceptions you can either reduce this value (1 in the extreme) or use the JVM's -Xss option to reduce the per thread stack size. Default is 8.
feature.extraction.number_of_threads - deprecated in favor of feature.extraction.batched.threads.
feature.fft.max_peak_to_noise_floor_ratio - Specifies the maximum peak to noise floor ratio of FFT power coefficients. Any coefficient that is too small such that the ratio of the peak value to the value of the coefficient exceeds the maximum will be rounded up to satisfy the maximum ratio. This prevents the classifiers from capturing the noise floor difference of different training samples, which could have an impact with different bits per sample and different sampling rates. The value set must be the same for training and classification. Default is 1000000.
feature.fft.max_sampling_rate - Defines the default target sampling rate used in FFT computation if not defined in feature extractors using FFT (MFCC, MFFB, FFT, etc). If not defined by the feature extractor or other caller, the ExtendedFFT class will re-sample, if needed, to this default rate. The value set must be the same for training and classification. Default is 44100.
feature.fft.pad_to_power_of_two - effectively controls whether data on which FFTs are computed will be padded to power of 2. In this case, Apache Math library is used to compute the FFT. The value set must be the same for training and classification. Default is true.
feature.iterable.batch_size - Controls the size of the batch of data windows on which feature grams are computed and processed. The batch of data windows are all brought into memory during processing. As such a larger batch size will require more memory. Default is 8 * Number of Cores.
feature.iterable.streaming.enabled - Controls whether or not all features computed for a training request are hard referenced instead of soft referenced via a cache where they might be evicted during training, requiring (automatic) re-computation.
A value of true generally requires more heap memory during training than would be required if caching is used but should generally provide better performance without relying on very large cache (i.e. heap). Default is false.
labeled.feature.gram.caching.enabled - used to control whether or not memory and disk caching can be used during feature gram extraction (e.g., model training). Some models/classifiers provide constructors (CNN, DCASE) that enable the control over caching. However, with recent changes to hard referencing of features during training, this is largely unneeded, but could still be used in very limited memory runtimes. In this case, set this property to true AND request caching in your classifier. Default is false.
labeled.feature.iterable.useSpark - deprecated until Spark support is deemed useful.

Modeling

GMM

classifiers.gmm.unknown_threshold_coefficient - defines the DEFAULT_UNKNOWN_THRESH_COEFF used by the GMMClassifier when a value is not provided through a constructor. This defines the threshold used during classification as a lower bound of the average log density below which data is classified as unknown by the model. Default is 0.

DCASE and CNN

classifiers.cnn.batch_size - Defines the default batch size for learning when it is not specified via a constructor. Default is 32.
classifiers.cnn.verbose - if set to true, then additional logging on training progress of the DCASE and CNN models will be provided on standard out. Default is false.

Nearest Neighbor (KNN)

classifiers.nn.max_list_size - defines the default maximum number of features kept in the model, when not specified in a constructor. Default is 1000.
classifiers.nn.stddev_factor - defines the default multiplier on the standard deviation of feature distances to use when not specified in a constructor. This multiple controls the identification of outliers. Default is 3.
classifiers.nn.reduction_ratio - defines the default compression ratio when not specified in a constructor. This is used when the number of features exceeds the maximum list size for the model to merge features. Default is 0.5.

Miscellaneous

org.eng.aisp.util.ClassFinder.classpath - adds 1 or more class paths to the paths searched by the utility ClassFinder class.
This class is used by the JavaScriptFactories class to find classes relevant to the JavaScript files defining models. In general, it does not need to be set, but does in cases where the ClassLoader automatically appends class paths internally, as is the case with the class loader used by Tomcat. Default is null.
logging.enabled- controls whether or not the AISPLogger outputs messages to standard output/error.
multikeycache.enabled - controls whether or not the AbstractMultiCache and sub-classes actually do any caching or not. This might be disabled for inferencing-only systems that do not use more than a single model, but this is largely used only for testing the impact of caching, and in general, should be enabled. Default is true.
httputil.timeout.seconds - the network timeout in seconds when using the HttpUtil class to make http requests. Default is 30.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly