Bag of WordsΒΆ

This example shows how you can transform a quantized time series (i.e. a time series represented as a sequence of letters) into a bag of words using pyts.bow.BOW.

Out:

Original time series:
['a' 'd' 'a' 'c' 'a' 'b' 'd' 'b' 'd' 'b' 'c' 'b' 'a' 'a' 'd' 'd' 'c' 'c'
 'a' 'a' 'c' 'b' 'b' 'd' 'a' 'a' 'd' 'a' 'a' 'd']


Bag of words without numerosity reduction:
{a, d, a, c, a, b, d, b, d, b, c, b, a, a, d, d, c, c, a, a, c, b, b, d, a, a, d, a, a, d}


Bag of words with numerosity reduction:
{a, d, a, c, a, b, d, b, d, b, c, b, a, d, c, a, c, b, d, a, d, a, d}

import numpy as np
from pyts.bow import BOW

# Parameters
n_samples = 100
n_features = 30
n_bins = 4
window_size = 1
alphabet = np.array([chr(i) for i in range(97, 97 + n_bins)])

# Toy dataset
rng = np.random.RandomState(41)
X = alphabet[rng.randint(n_bins, size=(n_samples, n_features))]

# Bag-of-words transformation
bow = BOW(window_size, numerosity_reduction=False)
X_bow = bow.fit_transform(X)
bow_num = BOW(window_size, numerosity_reduction=True)
X_bow_num = bow_num.fit_transform(X)

print("Original time series:")
print(X[0])
print("\n")
print("Bag of words without numerosity reduction:")
print(''.join(["{", X_bow[0].replace(" ", ", "), "}"]))
print("\n")
print("Bag of words with numerosity reduction:")
print(''.join(["{", X_bow_num[0].replace(" ", ", "), "}"]))

Total running time of the script: ( 0 minutes 0.061 seconds)

Gallery generated by Sphinx-Gallery