API Reference¶

TensorLDA¶

class tensor_lda.tensor_lda.TensorLDA(n_components=10, alpha0=0.1, max_iter=1000, max_inference_iter=1000, n_restart=10, converge_tol=0.0001, inference_converge_tol=1e-06, inference_step_size=0.001, verbose=0, smooth_param=0.01, random_state=None)[source]¶

Latent Dirichlet Allocation with tensor decomposition

Parameters:

n_components : int, optional (default=10)

Number of topics.

alpha0 : double, optional (default=0.1)

Sum of topic prior alpha.

max_iter : integer, optional (default=100)

The maximum number of iterations.

max_inference_iter : integer, optional (default=1000)

The maximum number of inference iterations.

converge_tol : float, optional (default=1e-4)

Convergence tolarence in training step.

verbose : int, optional (default=0)

Verbosity level.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

[1] “Tensor Decompositions for Learning Latent Variable Models”,: Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky, 2014
[2] “Scalable Moment-Based Inference for Latent Dirichlet Allocation”,: Chi Wang, Xueqing Liu, Yanglei Song, and Jiawei Han, 2014
[3] “Tensor Decompositions and Applications”, Tamara G. Kolda,: Brett W. Bader, 2009

Attributes

n_components_	(int) The effective number of components. The number must be smaller or equal to the number of features.
components_	(array, [n_components, n_features]) Tarameters for topic word distribution.
alpha_	(array, [n_components])

Methods

fit(X, y=None)[source]¶

Learn model for the data X with tensor decomposition method

Parameters:

X : array-like or sparse matrix, shape=(n_samples, n_features)

Document word matrix.

y : Ignored

Returns:

self

transform(X)[source]¶

Transform data X according to the fitted model.

Parameters:

X : array-like or sparse matrix, shape=(n_samples, n_features)

Document word matrix.

Returns:

doc_topic_distr : shape=(n_samples, n_topics)

document topic distribution for X.

LDA Sample Generator¶

class tensor_lda.utils.sample_generator.LdaSampleGenerator(n_topics, n_words, min_doc_size, mean_doc_size, doc_topic_prior, topic_word_prior, random_state=None)[source]¶

Generate LDA samples

Parameters:

n_topics : int

Number of topics

n_words : int

Number of words in corpus

min_doc_size : int

Min word count in a document

mean_doc_size : int

Mean word count in a document

doc_topic_prior : double

Uniform Dirichlet prior of a document

topic_word_prior : double

Uniform Dirichlet prior of a topic

mean_doc_size: int

Mean Value if word count in each document

Attributes

topic_word_distr_

(array, [n_topics, n_words]) Topic word distribution.

Methods

generate_documents(n_docs)[source]¶

Generate Random doc-words Matrix

Parameters:

n_docs : int

number of documents

Utilities¶

Utility functions for tensor operations

tensor_lda.utils.tensor_utils.khatri_rao_prod(a, b)[source]¶

Khatri-Rao product

Generate Khatri-Rao product from 2 2-D matrix.

Parameters:

a : 2D array, shape (n, k)

first matrix

b : 2D array, shape (m, k)

second matrix

Returns:

matrix : 2D array, shape (n * m, k)

Khatri-Rao product of a and b

tensor_lda.utils.tensor_utils.rank_1_tensor_3d(a, b, c)[source]¶

Generate a 3-D tensor from 3 1-D vectors

Generate a 3D tensor from 3 rank one vectors a, b, and c. The returned 3-D tensor is in unfolded format.

Parameters:

a : array, shape (n,)

first rank one vector

b : array, shape (n,)

second rank one vector

c : array, shape (n,)

thrid rank one vector

Returns:

tensor: array, (n, n * n)

3D tensor in unfolded format. element (i, j, k) will map to (i, (n * k) + j)

tensor_lda.utils.tensor_utils.tensor_3d_from_matrix_vector(b, a)[source]¶

Generate 3-D tensor from 2-D matrix and 1-D vector

This function is similar to tensor_3d_from_vector_matrix function. The only difference is the first argument is 2-D matrix and the second element is 1-D vector.

Parameters:

b : array, shape (m, n)

2-D matrix

a : array, shape (p,)

vector

Returns:

tensor : array, shape (m, n * p)

3D tensor in unfolded format.

tensor_lda.utils.tensor_utils.tensor_3d_from_vector_matrix(a, b)[source]¶

Generate 3-D tensor from 1-D vector and 2-D matrix

Generate a 3D tensor from a 1-D vector a and 2-D matrix b. The returned 3-D tensor is in unfolded format.

Parameters:

a : array, shape (m,)

1-D vector

b : 2-D array, shape (n, p)

2-D matrix

Returns:

tensor: array, (m, n * p)

3D tensor in unfolded format.

tensor_lda.utils.tensor_utils.tensor_3d_permute(tensor, tensor_shape, a, b, c)[source]¶

Permute the mode of a 3-D tensor

This is a slow implementation to generate 3-D tensor permutations.

Parameters:

tensor : 2D array, shape (n, m * k)

3D tensor in unfolded format

tensor_shape : int triple

Shape of the tensor. Since tensor is in unfolded format. We need it’s real format to calculate permutation.

a : int, {1, 2, 3}

new first index

}

b : int, {1, 2, 3}

new second index

c : int, {1, 2, 3}

new thrid order index

tensor_lda.utils.tensor_utils.tensor_3d_prod(tensor, a, b, c)[source]¶

Calculate product of 3D tensor with matrix on each dimension

TODO: move it to test

Parameters:

tensor : 3D array, shape (n1, n2, n3)

a : array, (n1, m)

b : array, (n2, n)

c : array, (n3, p)

Returns:

t_abc : array, (m, n, p)

tensor(a, b, c)