API Docs - v2.0.0
Streamingml
AMRulesRegressor (Stream Processor)
This extension performs regression tasks using the AMRulesRegressor
algorithm.
Syntax
streamingml:AMRulesRegressor(<STRING> model.name, <INT|FLOAT|FLOAT|DOUBLE> model.feature)
QUERY PARAMETERS
Name | Description | Default Value | Possible Data Types | Optional | Dynamic |
---|---|---|---|---|---|
model.name | The name of the model to be used for prediction. | STRING | No | No | |
model.feature | The feature vector for the regression analysis. | INT FLOAT FLOAT DOUBLE |
No | No |
Name | Description | Possible Types |
---|---|---|
prediction | The predicted value. | DOUBLE |
meanSquaredError | The MeanSquaredError of the predicting model. |
DOUBLE |
Examples EXAMPLE 1
define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double);
from StreamA#streamingml:AMRulesRegressor('model1', attribute_0, attribute_1, attribute_2, attribute_3)
select attribute_0, attribute_1, attribute_2, attribute_3, prediction, meanSquaredError insert into OutputStream;
This query uses an AMRules
model named model1
that is used to predict the value for the feature vector represented by attribute_0
, attribute_1
, attribute_2
, and attribute_3
. The predicted value along with the MeanSquaredError
and the feature vector are output to a stream named OutputStream
. The resulting definition of the OutputStream
stream is as follows:
(attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, prediction double, meanSquaredError double).
clusTree (Stream Processor)
This extension performs clustering on a streaming data set. Initially a micro cluster model is generated using the ClusTree algorithm, and weighted k-means is periodically applied to micro clusters to generate a macro cluster model with the required number of clusters. Data points can be of any dimensionality, but the dimensionality should be constant throughout the stream. Euclidean distance is taken as the distance metric.
Syntax
streamingml:clusTree(<INT> no.of.clusters, <INT> max.iterations, <INT> no.of.events.to.refresh.macro.model, <INT> max.height.of.tree, <INT> horizon, <DOUBLE|FLOAT|INT|LONG> model.features)
QUERY PARAMETERS
Name | Description | Default Value | Possible Data Types | Optional | Dynamic |
---|---|---|---|---|---|
no.of.clusters | The assumed number of natural clusters (numberOfClusters ) in the data set. |
INT | No | No | |
max.iterations | The number of times the process should be iterated. The process iterates until the number specified for this parameter is reached, or until iterating the process does not result in a change in the centroids. | 40 | INT | Yes | No |
no.of.events.to.refresh.macro.model | The number of new events that should arrive in order to recalculate the k-means macro cluster centers. | 100 | INT | Yes | No |
max.height.of.tree | This defines the maximum number of levels that should exist in the ClusTree. The maximum number of levels is calculated as 3^<VALUE_SPECIFIED> (e.g., If 10 is specified, there can be a maximum of 3^10 micro clusters in the micro cluster). It is recommended to set the value within the 5-8 range because a lot of micro-clusters can consume a lot of memory, and as a result, creating the macro cluster model will take longer. |
8 | INT | Yes | No |
horizon | This controls the decay of weights of old micro-clusters to manage the concept drift. If horizon is set as 1000 , then a micro cluster that has not been recently updated loses its weight by half after 1000 events. |
1000 | INT | Yes | No |
model.features | This is a variable length argument. Depending on the dimensionality of data points, you receive coordinates as features along each axis. | DOUBLE FLOAT INT LONG |
No | No |
Name | Description | Possible Types |
---|---|---|
euclideanDistanceToClosestCentroid | This represents the Euclidean distance between the current data point and the closest centroid. | DOUBLE |
closestCentroidCoordinate | This is a variable length attribute. Depending on the dimensionality(d ) closestCentroidCoordinate1 is returned to closestCentroidCoordinated that are the d `dimensional coordinates of the closest centroid from the model to the current event. This is the prediction result, and this represents the cluster towhich the current event belongs. |
DOUBLE |
Examples EXAMPLE 1
@App:name('ClusTreeTestSiddhiApp')
define stream InputStream (x double, y double);
@info(name = 'query1')
from InputStream#streamingml:clusTree(2, 10, 20, 5, 50, x, y)
select closestCentroidCoordinate1, closestCentroidCoordinate2, x, y
insert into OutputStream;
This query creates a Siddhi application named ClusTreeTestSiddhiApp
, and it accepts 2D inputs of doubles. The query named query1
creates a ClusTree model. It also creates a k-means model after the first 20 events, and refreshes it after every 20 events. Two macro clusters are created, and the process is not iterated more than 10 times. The maximum height of tree is set to 5, and therefore, a maximum of 3^5 micro clusters are generated from the Clus Tree. The horizon is set to 50, and therefore, the weight of each micro cluster that is not updated reduces by half after every 50 events.
EXAMPLE 2
@App:name('ClusTreeTestSiddhiApp')
define stream InputStream (x double, y double);
@info(name = 'query1')
from InputStream#streamingml:ClusTree(2, x, y)
select closestCentroidCoordinate1, closestCentroidCoordinate2, x, y
insert into OutputStream;
This query does not include hyper parameters. Therefore, the default values mentioned above are applied. This mode of querying is recommended if you are not familier with ClusTree/KMeans algorithms.
hoeffdingTreeClassifier (Stream Processor)
This extension performs classification using the Hoeffding Adaptive Tree algorithm for evolving data streams that use ADWIN
to replace branches with new ones.
Syntax
streamingml:hoeffdingTreeClassifier(<STRING> model.name)
QUERY PARAMETERS
Name | Description | Default Value | Possible Data Types | Optional | Dynamic |
---|---|---|---|---|---|
model.name | The name of the model to be used for prediction. | STRING | No | No |
Name | Description | Possible Types |
---|---|---|
prediction | The predicted class label. | STRING |
confidenceLevel | The probability of the prediction. | DOUBLE |
Examples EXAMPLE 1
define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double);
from StreamA#streamingml:hoeffdingTreeClassifier('model1', attribute_0, attribute_1, attribute_2, attribute_3)
select attribute_0, attribute_1, attribute_2, attribute_3, prediction, predictionConfidence insert into OutputStream;
This query uses a Hoeffding Tree model named model1
to predict the label of the feature vector represented by attribute_0
, attribute_1
, attribute_2
, and attribute_3 attributes. The predicted label (String/Bool
) along with the prediction confidence and the feature vector are output to the OutputStream
stream. The expected definition of the OutputStream
is as follows:(attribute_0 double, attribute_1 double, attribute_2
double, attribute_3 double, prediction string,
confidenceLevel double).
updateAMRulesRegressor (Stream Processor)
This extension performs the build/update of the AMRules Regressor model for evolving data streams.
Syntax
streamingml:updateAMRulesRegressor(<STRING> model.name, <DOUBLE> split.confidence, <DOUBLE> tie.break.threshold, <INT> grace.period, <INT> change.detector, <INT> anomaly.detector, <DOUBLE|FLOAT|LONG|INT> model.features)
QUERY PARAMETERS
Name | Description | Default Value | Possible Data Types | Optional | Dynamic |
---|---|---|---|---|---|
model.name | The name of the model to be built/updated. | STRING | No | No | |
split.confidence | This is a Hoeffding Bound parameter. It defines the percentage of error that to be allowed in a split decision. When the value specified is closer to 0, it takes longer to output the decision. min:0 max:1 | 1.0E-7D | DOUBLE | Yes | No |
tie.break.threshold | This is a Hoeffding Bound parameter. It specifies the threshold below which a split must be forced to break ties. min:0 max:1 | 0.05D | DOUBLE | Yes | No |
grace.period | This is a Hoeffding Bound parameter. The number of instances a leaf should observe between split attempts. | 200 | INT | Yes | No |
change.detector | The Concept Drift Detection methodology to be used. The possible values are as follows.0 :NoChangeDetection1 :ADWINChangeDetector 2 :PageHinkleyDM |
2:PageHinkleyDM | INT | Yes | No |
anomaly.detector | The Anomaly Detection methodology to be used. The possible values are as follows:0 :NoAnomalyDetection1 :AnomalinessRatioScore2 :OddsRatioScore |
2:OddsRatioScore | INT | Yes | No |
model.features | The features of the model that should be attributes of the stream. | DOUBLE FLOAT LONG INT |
No | No |
Name | Description | Possible Types |
---|---|---|
meanSquaredError | The current Mean Squared Error of the model | DOUBLE |
Examples EXAMPLE 1
define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );
from StreamA#streamingml:updateAMRulesRegressor('model1',)
select attribute_0, attribute_1, attribute_2, attribute_3, meanSquaredError insert into OutputStream;
In this query, an AMRulesRegressor model named 'model1' is built/updated using attribute_0
, attribute_1
, attribute_2
, and attribute_3
attributes as features, and attribute_4
as the target_value. The accuracy of the evaluation is output to the OutputStream stream.
EXAMPLE 2
define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );
from StreamA#streamingml:updateAMRulesRegressor('model1', 1.0E-7D, 0.05D, 200, 0, 0)
select attribute_0, attribute_1, attribute_2, attribute_3, meanSquaredError insert into OutputStream;
In this query, an AMRulesRegressor
model named model1
is built/updated with a split confidence of 1.0E-7D, a tie break threshold of 0.05D, and a grace period of 200. The Concept Drift Detection and Anomaly Detection methodologies used are NoChangeDetection
and NoAnomalyDetection
respectively. attribute_0
, attribute_1
, attribute_2
, and attribute_3
are used as features, and attribute_4
is used as the target value. The meanSquaredError
is output to the OutputStream
stream.
updateHoeffdingTree (Stream Processor)
This extension performs the build/update of Hoeffding Adaptive Tree for evolving data streams that use ADWIN
to replace branches for new ones.
Syntax
streamingml:updateHoeffdingTree(<STRING> model.name, <INT> no.of.classes, <INT> grace.period, <INT> split.criterion, <DOUBLE> split.confidence, <DOUBLE> tie.break.threshold, <BOOL> binary.split, <BOOL> pre.prune, <INT> leaf.prediction.strategy, <DOUBLE|INT> model.features)
QUERY PARAMETERS
Name | Description | Default Value | Possible Data Types | Optional | Dynamic |
---|---|---|---|---|---|
model.name | The name of the model to be built/updated. | STRING | No | No | |
no.of.classes | The number of class labels in the datastream. | INT | No | No | |
grace.period | The number of instances a leaf should observe between split attempts. A minimum and a maximum value should be specified. e.g., min:0, max:2147483647 . |
200 | INT | Yes | No |
split.criterion | The split criterion to be used. Possible values are as follows:0 :InfoGainSplitCriterion1 :GiniSplitCriterion |
0:InfoGainSplitCriterion | INT | Yes | No |
split.confidence | The amount of error that should be allowed in a split decision. When the value specified is closer to 0, it takes longer to output the decision. | 1e-7 | DOUBLE | Yes | No |
tie.break.threshold | The threshold at which a split must be forced to break ties. A minimum value and a maximum value must be specified. e.g., min:0.0D, max:1.0D |
0.05D | DOUBLE | Yes | No |
binary.split | If this parameter is set to true , onlybinary splits are allowed. |
false | BOOL | Yes | No |
pre.prune | If this parameter is set to true , pre-pruning is allowed. |
false | BOOL | Yes | No |
leaf.prediction.strategy | This specifies the leaf prediction strategy to be used. Possible values are as follows:0 :Majority class 1 :Naive Bayes2 :Naive Bayes Adaptive. |
2:Naive Bayes Adaptive | INT | Yes | No |
model.features | The features of the model that should be attributes of the stream. | DOUBLE INT |
No | No |
Name | Description | Possible Types |
---|---|---|
accuracy | The accuracy evaluation of the model(Prequnetial Evaluation) | DOUBLE |
Examples EXAMPLE 1
define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );
from StreamA#streamingml:updateHoeffdingTree('model1', 3)
select attribute_0, attribute_1, attribute_2, attribute_3, accuracy insert into OutputStream;
This query builds/updates a HoeffdingTree model named model1
under 3 classes using attribute_0
, attribute_1
, attribute_2
, and attribute_3
as features, and attribute_4
as the label. The accuracy evaluation is output to the OutputStream
stream
EXAMPLE 2
define stream StreamA (attribute_0 double, attribute_1 double, attribute_2 double, attribute_3 double, attribute_4 string );
from StreamA#streamingml:updateHoeffdingTree('model1', 3, 200, 0, 1e-7, 1.0D, true, true, 2)
select attribute_0, attribute_1, attribute_2, attribute_3, accuracy insert into OutputStream;
This query builds/updates a Hoeffding Tree model named model1
with a grace period of 200, an information gain split criterion of 0, 1e-7 of allowable error in split decision, 1.0D of breaktie threshold, allowing only binary splits. Pre-pruning is disabled, and Naive Bayes Adaptive
is used as the leaf prediction strategy. 'attribute_0', attribute_1
, attribute_2
, and attribute_3
are used as features, and attribute_4
as the label. The accuracy evaluation is output to the OutputStream stream.