Classification¶
The documentation of the classification module.
The pyts.classification
module includes classification algorithms.
-
class
pyts.classification.
KNNClassifier
(n_neighbors=1, weights=u'uniform', algorithm=u'auto', leaf_size=30, p=2, metric=u'minkowski', metric_params=None, n_jobs=1, **kwargs)[source]¶ k nearest neighbors classifier.
Parameters: - n_neighbors : int, optional (default = 1)
Number of neighbors to use.
- weights : str or callable, optional (default = ‘uniform’)
weight function used in prediction. Possible values:
- ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
- ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
- [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
- algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
Algorithm used to compute the nearest neighbors.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
- leaf_size : int, optional (default = 30)
Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
- metric : string or DistanceMetric object (default = ‘minkowski’)
the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics. ‘dtw’ and ‘fast_dtw’ are also available.
- p : integer, optional (default = 2)
Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
- metric_params : dict, optional (default = None)
Additional keyword arguments for the metric function.
- n_jobs : int, optional (default = 1)
The number of parallel jobs to run for neighbors search. If
n_jobs=-1
, then the number of jobs is set to the number of CPU cores. Doesn’t affectfit()
method.
Methods
fit
(X, y)Fit the model according to the given training data. get_params
([deep])Get parameters for this estimator. predict
(X)Predict the class labels for the provided data. score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. set_params
(**params)Set the parameters of this estimator. -
fit
(X, y)[source]¶ Fit the model according to the given training data.
Parameters: - X : array-like, shape = [n_samples, n_features]
Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples]
Class labels for each data sample.
Returns: - self : object
Returns self.
-
class
pyts.classification.
SAXVSMClassifier
(n_bins=4, quantiles=u'empirical', window_size=4, numerosity_reduction=True, use_idf=True, smooth_idf=True, sublinear_tf=False)[source]¶ Classifier based on SAX-VSM representation and tf-idf statistics.
Parameters: - n_bins : int (default = 4)
Number of bins (also known as the size of the alphabet).
- quantiles : {‘gaussian’, ‘empirical’} (default = ‘empirical’)
The way to compute quantiles. If ‘gaussian’, quantiles from a gaussian distribution N(0,1) are used. If ‘empirical’, empirical quantiles are used.
- window_size : int (default = 4)
Size of the window (i.e. the size of each word).
- numerosity_reduction : bool (default = True)
If True, delete all but one occurence of back to back identical occurences of the same words.
- use_idf : bool (default = True)
Enable inverse-document-frequency reweighting.
- smooth_idf : bool (default = True)
Smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.
- sublinear_tf : bool (default = False)
Apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).
Attributes: - vocabulary_ : dict
A mapping of feature indices to terms.
- tfidf_ : sparse matrix, shape = [n_classes, n_words]
Term-document matrix
- idf_ : array, shape = [n_features], or None
The learned idf vector (global term weights) when
use_idf=True
, None otherwise.- stop_words_ : set
- Terms that were ignored because they either:
- occurred in too many documents (max_df)
- occurred in too few documents (min_df)
- were cut off by feature selection (max_features).
This is only available if no vocabulary was given.
Methods
fit
(X, y)Fit the model according to the given training data. get_params
([deep])Get parameters for this estimator. predict
(X)Predict the class labels for the provided data. score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. set_params
(**params)Set the parameters of this estimator.
-
class
pyts.classification.
BOSSVSClassifier
(n_coefs, window_size, norm_mean=True, norm_std=True, n_bins=4, quantiles=u'empirical', variance_selection=False, variance_threshold=0.0, numerosity_reduction=True, smooth_idf=True, sublinear_tf=True)[source]¶ Bag-of-SFA Symbols in Vector Space.
Parameters: - n_coefs : None or int (default = None)
The number of Fourier coefficients to keep. If
n_coefs=None
, all Fourier coefficients are returned. Ifn_coefs
is an integer, then_coefs
most significant Fourier coefficients are returned ifanova=True
, otherwise the firstn_coefs
Fourier coefficients are returned. A even number is required (for real and imaginary values) ifanova=False
.- window_size : int
Window length to use to extracte sub time series.
- norm_mean : bool (default = True)
If True, center the data before scaling. If
norm_mean=True
andanova=False
, the first Fourier coefficient will be dropped.- norm_std : bool (default = True)
If True, scale the data to unit variance.
- n_bins : int (default = 4)
The number of bins. Ignored if
quantiles='entropy'
.- quantiles : {‘gaussian’, ‘empirical’} (default = ‘gaussian’)
The way to compute quantiles. If ‘gaussian’, quantiles from a gaussian distribution N(0,1) are used. If ‘empirical’, empirical quantiles are used.
- variance_selection : bool (default = False)
If True, the Fourier coefficients with low variance are removed.
- variance_threshold : float (default = 0.)
Fourier coefficients with a training-set variance lower than this threshold will be removed. Ignored if
variance_selection=False
.- numerosity_reduction : boolean (default = True)
whether or not numerosity reduction is applied. When the same word occurs several times in a row, only one instance of this word is kept if
numerosity_reduction=True
, otherwise all instances are kept.- smooth_idf : boolean, default=True
smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.
- sublinear_tf : boolean, default=False
apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).
Attributes: - vocabulary_ : dict
A mapping of features indices to terms.
Methods
fit
(X, y[, overlapping])Fit the model according to the given training data. get_params
([deep])Get parameters for this estimator. predict
(X)Transform the provided data. score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels. set_params
(**params)Set the parameters of this estimator. -
fit
(X, y, overlapping=True)[source]¶ Fit the model according to the given training data.
Parameters: - X : array-like, shape = [n_samples, n_features]
Training vector, where n_samples in the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples]
Class labels for each data sample.
- overlapping : bool (default = False)
If True, overlapping windows are used for the training phase.
Returns: - self : object