

API Docs - v1.0.14

Streamingml

AMRulesRegressor (Stream Processor)

This extension performs regression tasks using the AMRulesRegressor algorithm.

Syntax

streamingml:AMRulesRegressor(<STRING> model.name, <INT|FLOAT|FLOAT|DOUBLE> model.feature)

QUERY PARAMETERS

Name	Description	Default Value	Possible Data Types	Optional	Dynamic
model.name	The name of the model to be used for prediction.		STRING	No	No
model.feature	The feature vector for the regression analysis.		INT FLOAT FLOAT DOUBLE	No	No

Extra Return Attributes

Name	Description	Possible Types
prediction	The predicted value.	DOUBLE
meanSquaredError	The `MeanSquaredError` of the predicting model.	DOUBLE

Examples EXAMPLE 1

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double);

from StreamA#streamingml:AMRulesRegressor('model1',  attribute_0, attribute_1, attribute_2, attribute_3) 
select attribute_0, attribute_1, attribute_2, attribute_3, prediction, meanSquaredError insert into OutputStream;

This query uses an AMRules model named model1 that is used to predict the value for the feature vector represented by attribute_0, attribute_1, attribute_2, and attribute_3. The predicted value along with the MeanSquaredError and the feature vector are output to a stream named OutputStream. The resulting definition of the OutputStream stream is as follows:
(attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, prediction double, meanSquaredError double).

clusTree (Stream Processor)

This extension performs clustering on a streaming data set. Initially a micro cluster model is generated using the ClusTree algorithm, and weighted k-means is periodically applied to micro clusters to generate a macro cluster model with the required number of clusters. Data points can be of any dimensionality, but the dimensionality should be constant throughout the stream. Euclidean distance is taken as the distance metric.

Syntax

streamingml:clusTree(<INT> no.of.clusters, <INT> max.iterations, <INT> no.of.events.to.refresh.macro.model, <INT> max.height.of.tree, <INT> horizon, <DOUBLE|FLOAT|INT|LONG> model.features)

QUERY PARAMETERS

Name	Description	Default Value	Possible Data Types	Optional	Dynamic
no.of.clusters	The assumed number of natural clusters (`numberOfClusters`) in the data set.		INT	No	No
max.iterations	The number of times the process should be iterated. The process iterates until the number specified for this parameter is reached, or until iterating the process does not result in a change in the centroids.	40	INT	Yes	No
no.of.events.to.refresh.macro.model	The number of new events that should arrive in order to recalculate the k-means macro cluster centers.	100	INT	Yes	No
max.height.of.tree	This defines the maximum number of levels that should exist in the ClusTree. The maximum number of levels is calculated as `3^<VALUE_SPECIFIED>` (e.g., If 10 is specified, there can be a maximum of 3^10 micro clusters in the micro cluster). It is recommended to set the value within the 5-8 range because a lot of micro-clusters can consume a lot of memory, and as a result, creating the macro cluster model will take longer.	8	INT	Yes	No
horizon	This controls the decay of weights of old micro-clusters to manage the concept drift. If horizon is set as `1000`, then a micro cluster that has not been recently updated loses its weight by half after 1000 events.	1000	INT	Yes	No
model.features	This is a variable length argument. Depending on the dimensionality of data points, you receive coordinates as features along each axis.		DOUBLE FLOAT INT LONG	No	No

Extra Return Attributes

Name	Description	Possible Types
euclideanDistanceToClosestCentroid	This represents the Euclidean distance between the current data point and the closest centroid.	DOUBLE
closestCentroidCoordinate	This is a variable length attribute. Depending on the dimensionality(`d`) `closestCentroidCoordinate1` is returned to `closestCentroidCoordinated that are the` d `dimensional coordinates of the closest centroid from the model to the current event. This is the prediction result, and this represents the cluster towhich the current event belongs.	DOUBLE

Examples EXAMPLE 1

@App:name('ClusTreeTestSiddhiApp') 
define stream InputStream (x double, y double);
@info(name = 'query1') 
from InputStream#streamingml:clusTree(2, 10, 20, 5, 50, x, y) 
select closestCentroidCoordinate1, closestCentroidCoordinate2, x, y 
insert into OutputStream;

This query creates a Siddhi application named ClusTreeTestSiddhiApp, and it accepts 2D inputs of doubles. The query named query1 creates a ClusTree model. It also creates a k-means model after the first 20 events, and refreshes it after every 20 events. Two macro clusters are created, and the process is not iterated more than 10 times. The maximum height of tree is set to 5, and therefore, a maximum of 3^5 micro clusters are generated from the Clus Tree. The horizon is set to 50, and therefore, the weight of each micro cluster that is not updated reduces by half after every 50 events.

EXAMPLE 2

@App:name('ClusTreeTestSiddhiApp') 
define stream InputStream (x double, y double);
@info(name = 'query1') 
from InputStream#streamingml:ClusTree(2, x, y) 
select closestCentroidCoordinate1, closestCentroidCoordinate2, x, y 
insert into OutputStream;

This query does not include hyper parameters. Therefore, the default values mentioned above are applied. This mode of querying is recommended if you are not familier with ClusTree/KMeans algorithms.

hoeffdingTreeClassifier (Stream Processor)

This extension performs classification using the Hoeffding Adaptive Tree algorithm for evolving data streams that use ADWIN to replace branches with new ones.

Syntax

streamingml:hoeffdingTreeClassifier(<STRING> model.name)

QUERY PARAMETERS

Name	Description	Default Value	Possible Data Types	Optional	Dynamic
model.name	The name of the model to be used for prediction.		STRING	No	No

Extra Return Attributes

Name	Description	Possible Types
prediction	The predicted class label.	STRING
confidenceLevel	The probability of the prediction.	DOUBLE

Examples EXAMPLE 1

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double);

from StreamA#streamingml:hoeffdingTreeClassifier('model1',  attribute_0, attribute_1, attribute_2, attribute_3) 
select attribute_0, attribute_1, attribute_2, attribute_3, prediction, predictionConfidence insert into OutputStream;

This query uses a Hoeffding Tree model named model1 to predict the label of the feature vector represented by attribute_0, attribute_1, attribute_2, and attribute_3 attributes. The predicted label (String/Bool) along with the prediction confidence and the feature vector are output to the OutputStream stream. The expected definition of the OutputStream is as follows:(attribute_0 double, attribute_1 double, attribute_2
double, attribute_3 double, prediction string,
confidenceLevel double).

updateAMRulesRegressor (Stream Processor)

This extension performs the build/update of the AMRules Regressor model for evolving data streams.

Syntax

streamingml:updateAMRulesRegressor(<STRING> model.name, <DOUBLE> split.confidence, <DOUBLE> tie.break.threshold, <INT> grace.period, <INT> change.detector, <INT> anomaly.detector, <DOUBLE|FLOAT|LONG|INT> model.features)

QUERY PARAMETERS

Name	Description	Default Value	Possible Data Types	Optional	Dynamic
model.name	The name of the model to be built/updated.		STRING	No	No
split.confidence	This is a Hoeffding Bound parameter. It defines the percentage of error that to be allowed in a split decision. When the value specified is closer to 0, it takes longer to output the decision. min:0 max:1	1.0E-7D	DOUBLE	Yes	No
tie.break.threshold	This is a Hoeffding Bound parameter. It specifies the threshold below which a split must be forced to break ties. min:0 max:1	0.05D	DOUBLE	Yes	No
grace.period	This is a Hoeffding Bound parameter. The number of instances a leaf should observe between split attempts.	200	INT	Yes	No
change.detector	The Concept Drift Detection methodology to be used. The possible values are as follows. `0`:NoChangeDetection `1`:ADWINChangeDetector `2`:PageHinkleyDM	2:PageHinkleyDM	INT	Yes	No
anomaly.detector	The Anomaly Detection methodology to be used. The possible values are as follows:`0`:NoAnomalyDetection `1`:AnomalinessRatioScore `2`:OddsRatioScore	2:OddsRatioScore	INT	Yes	No
model.features	The features of the model that should be attributes of the stream.		DOUBLE FLOAT LONG INT	No	No

Extra Return Attributes

Name	Description	Possible Types
meanSquaredError	The current Mean Squared Error of the model	DOUBLE

Examples EXAMPLE 1

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );

from StreamA#streamingml:updateAMRulesRegressor('model1',) 
select attribute_0, attribute_1, attribute_2, attribute_3, meanSquaredError insert into OutputStream;

In this query, an AMRulesRegressor model named 'model1' is built/updated using attribute_0, attribute_1, attribute_2, and attribute_3 attributes as features, and attribute_4 as the target_value. The accuracy of the evaluation is output to the OutputStream stream.

EXAMPLE 2

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );

from StreamA#streamingml:updateAMRulesRegressor('model1', 1.0E-7D, 0.05D, 200, 0, 0) 
select attribute_0, attribute_1, attribute_2, attribute_3, meanSquaredError insert into OutputStream;

In this query, an AMRulesRegressor model named model1 is built/updated with a split confidence of 1.0E-7D, a tie break threshold of 0.05D, and a grace period of 200. The Concept Drift Detection and Anomaly Detection methodologies used are NoChangeDetection and NoAnomalyDetection respectively. attribute_0, attribute_1, attribute_2, and attribute_3 are used as features, and attribute_4 is used as the target value. The meanSquaredError is output to the OutputStream stream.

updateHoeffdingTree (Stream Processor)

This extension performs the build/update of Hoeffding Adaptive Tree for evolving data streams that use ADWIN to replace branches for new ones.

Syntax

streamingml:updateHoeffdingTree(<STRING> model.name, <INT> no.of.classes, <INT> grace.period, <INT> split.criterion, <DOUBLE> split.confidence, <DOUBLE> tie.break.threshold, <BOOL> binary.split, <BOOL> pre.prune, <INT> leaf.prediction.strategy, <DOUBLE|INT> model.features)

QUERY PARAMETERS

Name	Description	Default Value	Possible Data Types	Optional	Dynamic
model.name	The name of the model to be built/updated.		STRING	No	No
no.of.classes	The number of class labels in the datastream.		INT	No	No
grace.period	The number of instances a leaf should observe between split attempts. A minimum and a maximum value should be specified. e.g., `min:0, max:2147483647`.	200	INT	Yes	No
split.criterion	The split criterion to be used. Possible values are as follows: `0`:InfoGainSplitCriterion `1`:GiniSplitCriterion	0:InfoGainSplitCriterion	INT	Yes	No
split.confidence	The amount of error that should be allowed in a split decision. When the value specified is closer to 0, it takes longer to output the decision.	1e-7	DOUBLE	Yes	No
tie.break.threshold	The threshold at which a split must be forced to break ties. A minimum value and a maximum value must be specified. e.g., `min:0.0D, max:1.0D`	0.05D	DOUBLE	Yes	No
binary.split	If this parameter is set to `true`, onlybinary splits are allowed.	false	BOOL	Yes	No
pre.prune	If this parameter is set to `true`, pre-pruning is allowed.	false	BOOL	Yes	No
leaf.prediction.strategy	This specifies the leaf prediction strategy to be used. Possible values are as follows: `0`:Majority class `1`:Naive Bayes `2`:Naive Bayes Adaptive.	2:Naive Bayes Adaptive	INT	Yes	No
model.features	The features of the model that should be attributes of the stream.		DOUBLE INT	No	No

Extra Return Attributes

Name	Description	Possible Types
accuracy	The accuracy evaluation of the model(Prequnetial Evaluation)	DOUBLE

Examples EXAMPLE 1

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );

from StreamA#streamingml:updateHoeffdingTree('model1', 3) 
select attribute_0, attribute_1, attribute_2, attribute_3, accuracy insert into OutputStream;

This query builds/updates a HoeffdingTree model named model1 under 3 classes using attribute_0, attribute_1, attribute_2, and attribute_3 as features, and attribute_4 as the label. The accuracy evaluation is output to the OutputStream stream

EXAMPLE 2

define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );

from StreamA#streamingml:updateHoeffdingTree('model1', 3, 200, 0, 1e-7, 1.0D, true, true, 2) 
select attribute_0, attribute_1, attribute_2, attribute_3, accuracy insert into OutputStream;

This query builds/updates a Hoeffding Tree model named model1 with a grace period of 200, an information gain split criterion of 0, 1e-7 of allowable error in split decision, 1.0D of breaktie threshold, allowing only binary splits. Pre-pruning is disabled, and Naive Bayes Adaptive is used as the leaf prediction strategy. 'attribute_0', attribute_1, attribute_2, and attribute_3 are used as features, and attribute_4 as the label. The accuracy evaluation is output to the OutputStream stream.