Skip to content

API Docs - v1.0.15

Streamingml

AMRulesRegressor (Stream Processor)

This extension performs regression tasks using the AMRulesRegressor algorithm.

Syntax

streamingml:AMRulesRegressor(<STRING> model.name, <INT|FLOAT|FLOAT|DOUBLE> model.feature)

QUERY PARAMETERS

Name Description Default Value Possible Data Types Optional Dynamic
model.name The name of the model to be used for prediction. STRING No No
model.feature The feature vector for the regression analysis. INT
FLOAT
FLOAT
DOUBLE
No No
Extra Return Attributes
Name Description Possible Types
prediction The predicted value. DOUBLE
meanSquaredError The MeanSquaredError of the predicting model. DOUBLE

Examples EXAMPLE 1

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double);

from StreamA#streamingml:AMRulesRegressor('model1',  attribute_0, attribute_1, attribute_2, attribute_3) 
select attribute_0, attribute_1, attribute_2, attribute_3, prediction, meanSquaredError insert into OutputStream;

This query uses an AMRules model named model1 that is used to predict the value for the feature vector represented by attribute_0, attribute_1, attribute_2, and attribute_3. The predicted value along with the MeanSquaredError and the feature vector are output to a stream named OutputStream. The resulting definition of the OutputStream stream is as follows:
(attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, prediction double, meanSquaredError double).

clusTree (Stream Processor)

This extension performs clustering on a streaming data set. Initially a micro cluster model is generated using the ClusTree algorithm, and weighted k-means is periodically applied to micro clusters to generate a macro cluster model with the required number of clusters. Data points can be of any dimensionality, but the dimensionality should be constant throughout the stream. Euclidean distance is taken as the distance metric.

Syntax

streamingml:clusTree(<INT> no.of.clusters, <INT> max.iterations, <INT> no.of.events.to.refresh.macro.model, <INT> max.height.of.tree, <INT> horizon, <DOUBLE|FLOAT|INT|LONG> model.features)

QUERY PARAMETERS

Name Description Default Value Possible Data Types Optional Dynamic
no.of.clusters The assumed number of natural clusters (numberOfClusters) in the data set. INT No No
max.iterations The number of times the process should be iterated. The process iterates until the number specified for this parameter is reached, or until iterating the process does not result in a change in the centroids. 40 INT Yes No
no.of.events.to.refresh.macro.model The number of new events that should arrive in order to recalculate the k-means macro cluster centers. 100 INT Yes No
max.height.of.tree This defines the maximum number of levels that should exist in the ClusTree. The maximum number of levels is calculated as 3^<VALUE_SPECIFIED> (e.g., If 10 is specified, there can be a maximum of 3^10 micro clusters in the micro cluster). It is recommended to set the value within the 5-8 range because a lot of micro-clusters can consume a lot of memory, and as a result, creating the macro cluster model will take longer. 8 INT Yes No
horizon This controls the decay of weights of old micro-clusters to manage the concept drift. If horizon is set as 1000, then a micro cluster that has not been recently updated loses its weight by half after 1000 events. 1000 INT Yes No
model.features This is a variable length argument. Depending on the dimensionality of data points, you receive coordinates as features along each axis. DOUBLE
FLOAT
INT
LONG
No No
Extra Return Attributes
Name Description Possible Types
euclideanDistanceToClosestCentroid This represents the Euclidean distance between the current data point and the closest centroid. DOUBLE
closestCentroidCoordinate This is a variable length attribute. Depending on the dimensionality(d) closestCentroidCoordinate1 is returned to closestCentroidCoordinated that are the d `dimensional coordinates of the closest centroid from the model to the current event. This is the prediction result, and this represents the cluster towhich the current event belongs. DOUBLE

Examples EXAMPLE 1

@App:name('ClusTreeTestSiddhiApp') 
define stream InputStream (x double, y double);
@info(name = 'query1') 
from InputStream#streamingml:clusTree(2, 10, 20, 5, 50, x, y) 
select closestCentroidCoordinate1, closestCentroidCoordinate2, x, y 
insert into OutputStream;

This query creates a Siddhi application named ClusTreeTestSiddhiApp, and it accepts 2D inputs of doubles. The query named query1 creates a ClusTree model. It also creates a k-means model after the first 20 events, and refreshes it after every 20 events. Two macro clusters are created, and the process is not iterated more than 10 times. The maximum height of tree is set to 5, and therefore, a maximum of 3^5 micro clusters are generated from the Clus Tree. The horizon is set to 50, and therefore, the weight of each micro cluster that is not updated reduces by half after every 50 events.

EXAMPLE 2

@App:name('ClusTreeTestSiddhiApp') 
define stream InputStream (x double, y double);
@info(name = 'query1') 
from InputStream#streamingml:ClusTree(2, x, y) 
select closestCentroidCoordinate1, closestCentroidCoordinate2, x, y 
insert into OutputStream;

This query does not include hyper parameters. Therefore, the default values mentioned above are applied. This mode of querying is recommended if you are not familier with ClusTree/KMeans algorithms.

hoeffdingTreeClassifier (Stream Processor)

This extension performs classification using the Hoeffding Adaptive Tree algorithm for evolving data streams that use ADWIN to replace branches with new ones.

Syntax

streamingml:hoeffdingTreeClassifier(<STRING> model.name)

QUERY PARAMETERS

Name Description Default Value Possible Data Types Optional Dynamic
model.name The name of the model to be used for prediction. STRING No No
Extra Return Attributes
Name Description Possible Types
prediction The predicted class label. STRING
confidenceLevel The probability of the prediction. DOUBLE

Examples EXAMPLE 1

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double);

from StreamA#streamingml:hoeffdingTreeClassifier('model1',  attribute_0, attribute_1, attribute_2, attribute_3) 
select attribute_0, attribute_1, attribute_2, attribute_3, prediction, predictionConfidence insert into OutputStream;

This query uses a Hoeffding Tree model named model1 to predict the label of the feature vector represented by attribute_0, attribute_1, attribute_2, and attribute_3 attributes. The predicted label (String/Bool) along with the prediction confidence and the feature vector are output to the OutputStream stream. The expected definition of the OutputStream is as follows:(attribute_0 double, attribute_1 double, attribute_2
 double, attribute_3 double, prediction string,
confidenceLevel double).

updateAMRulesRegressor (Stream Processor)

This extension performs the build/update of the AMRules Regressor model for evolving data streams.

Syntax

streamingml:updateAMRulesRegressor(<STRING> model.name, <DOUBLE> split.confidence, <DOUBLE> tie.break.threshold, <INT> grace.period, <INT> change.detector, <INT> anomaly.detector, <DOUBLE|FLOAT|LONG|INT> model.features)

QUERY PARAMETERS

Name Description Default Value Possible Data Types Optional Dynamic
model.name The name of the model to be built/updated. STRING No No
split.confidence This is a Hoeffding Bound parameter. It defines the percentage of error that to be allowed in a split decision. When the value specified is closer to 0, it takes longer to output the decision. min:0 max:1 1.0E-7D DOUBLE Yes No
tie.break.threshold This is a Hoeffding Bound parameter. It specifies the threshold below which a split must be forced to break ties. min:0 max:1 0.05D DOUBLE Yes No
grace.period This is a Hoeffding Bound parameter. The number of instances a leaf should observe between split attempts. 200 INT Yes No
change.detector The Concept Drift Detection methodology to be used. The possible values are as follows.
 0:NoChangeDetection
1:ADWINChangeDetector
 2:PageHinkleyDM
2:PageHinkleyDM INT Yes No
anomaly.detector The Anomaly Detection methodology to be used. The possible values are as follows:0:NoAnomalyDetection
1:AnomalinessRatioScore
2:OddsRatioScore
2:OddsRatioScore INT Yes No
model.features The features of the model that should be attributes of the stream. DOUBLE
FLOAT
LONG
INT
No No
Extra Return Attributes
Name Description Possible Types
meanSquaredError The current Mean Squared Error of the model DOUBLE

Examples EXAMPLE 1

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );

from StreamA#streamingml:updateAMRulesRegressor('model1',) 
select attribute_0, attribute_1, attribute_2, attribute_3, meanSquaredError insert into OutputStream;

In this query, an AMRulesRegressor model named 'model1' is built/updated using attribute_0, attribute_1, attribute_2, and attribute_3 attributes as features, and attribute_4 as the target_value. The accuracy of the evaluation is output to the OutputStream stream.

EXAMPLE 2

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );

from StreamA#streamingml:updateAMRulesRegressor('model1', 1.0E-7D, 0.05D, 200, 0, 0) 
select attribute_0, attribute_1, attribute_2, attribute_3, meanSquaredError insert into OutputStream;

In this query, an AMRulesRegressor model named model1 is built/updated with a split confidence of 1.0E-7D, a tie break threshold of 0.05D, and a grace period of 200. The Concept Drift Detection and Anomaly Detection methodologies used are NoChangeDetection and NoAnomalyDetection respectively. attribute_0, attribute_1, attribute_2, and attribute_3 are used as features, and attribute_4 is used as the target value. The meanSquaredError is output to the OutputStream stream.

updateHoeffdingTree (Stream Processor)

This extension performs the build/update of Hoeffding Adaptive Tree for evolving data streams that use ADWIN to replace branches for new ones.

Syntax

streamingml:updateHoeffdingTree(<STRING> model.name, <INT> no.of.classes, <INT> grace.period, <INT> split.criterion, <DOUBLE> split.confidence, <DOUBLE> tie.break.threshold, <BOOL> binary.split, <BOOL> pre.prune, <INT> leaf.prediction.strategy, <DOUBLE|INT> model.features)

QUERY PARAMETERS

Name Description Default Value Possible Data Types Optional Dynamic
model.name The name of the model to be built/updated. STRING No No
no.of.classes The number of class labels in the datastream. INT No No
grace.period The number of instances a leaf should observe between split attempts. A minimum and a maximum value should be specified. e.g., min:0, max:2147483647. 200 INT Yes No
split.criterion The split criterion to be used. Possible values are as follows:
0:InfoGainSplitCriterion
1:GiniSplitCriterion
0:InfoGainSplitCriterion INT Yes No
split.confidence The amount of error that should be allowed in a split decision. When the value specified is closer to 0, it takes longer to output the decision. 1e-7 DOUBLE Yes No
tie.break.threshold The threshold at which a split must be forced to break ties. A minimum value and a maximum value must be specified. e.g., min:0.0D, max:1.0D 0.05D DOUBLE Yes No
binary.split If this parameter is set to true, onlybinary splits are allowed. false BOOL Yes No
pre.prune If this parameter is set to true, pre-pruning is allowed. false BOOL Yes No
leaf.prediction.strategy This specifies the leaf prediction strategy to be used. Possible values are as follows:
0:Majority class
1:Naive Bayes
2:Naive Bayes Adaptive.
2:Naive Bayes Adaptive INT Yes No
model.features The features of the model that should be attributes of the stream. DOUBLE
INT
No No
Extra Return Attributes
Name Description Possible Types
accuracy The accuracy evaluation of the model(Prequnetial Evaluation) DOUBLE

Examples EXAMPLE 1

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );

from StreamA#streamingml:updateHoeffdingTree('model1', 3) 
select attribute_0, attribute_1, attribute_2, attribute_3, accuracy insert into OutputStream;

This query builds/updates a HoeffdingTree model named model1 under 3 classes using attribute_0, attribute_1, attribute_2, and attribute_3 as features, and attribute_4 as the label. The accuracy evaluation is output to the OutputStream stream

EXAMPLE 2

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );

from StreamA#streamingml:updateHoeffdingTree('model1', 3, 200, 0, 1e-7, 1.0D, true, true, 2) 
select attribute_0, attribute_1, attribute_2, attribute_3, accuracy insert into OutputStream;

This query builds/updates a Hoeffding Tree model named model1 with a grace period of 200, an information gain split criterion of 0, 1e-7 of allowable error in split decision, 1.0D of breaktie threshold, allowing only binary splits. Pre-pruning is disabled, and Naive Bayes Adaptive is used as the leaf prediction strategy. 'attribute_0', attribute_1, attribute_2, and attribute_3 are used as features, and attribute_4 as the label. The accuracy evaluation is output to the OutputStream stream.