API Reference¶
TensorLDA¶
-
class
tensor_lda.tensor_lda.
TensorLDA
(n_components=10, alpha0=0.1, max_iter=1000, max_inference_iter=1000, n_restart=10, converge_tol=0.0001, inference_converge_tol=1e-06, inference_step_size=0.001, verbose=0, smooth_param=0.01, random_state=None)[source]¶ Latent Dirichlet Allocation with tensor decomposition
Parameters: n_components : int, optional (default=10)
Number of topics.
alpha0 : double, optional (default=0.1)
Sum of topic prior alpha.
max_iter : integer, optional (default=100)
The maximum number of iterations.
max_inference_iter : integer, optional (default=1000)
The maximum number of inference iterations.
converge_tol : float, optional (default=1e-4)
Convergence tolarence in training step.
verbose : int, optional (default=0)
Verbosity level.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
References
- [1] “Tensor Decompositions for Learning Latent Variable Models”,
- Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky, 2014
- [2] “Scalable Moment-Based Inference for Latent Dirichlet Allocation”,
- Chi Wang, Xueqing Liu, Yanglei Song, and Jiawei Han, 2014
- [3] “Tensor Decompositions and Applications”, Tamara G. Kolda,
- Brett W. Bader, 2009
Attributes
n_components_ (int) The effective number of components. The number must be smaller or equal to the number of features. components_ (array, [n_components, n_features]) Tarameters for topic word distribution. alpha_ (array, [n_components]) Methods
LDA Sample Generator¶
-
class
tensor_lda.utils.sample_generator.
LdaSampleGenerator
(n_topics, n_words, min_doc_size, mean_doc_size, doc_topic_prior, topic_word_prior, random_state=None)[source]¶ Generate LDA samples
Parameters: n_topics : int
Number of topics
n_words : int
Number of words in corpus
min_doc_size : int
Min word count in a document
mean_doc_size : int
Mean word count in a document
doc_topic_prior : double
Uniform Dirichlet prior of a document
topic_word_prior : double
Uniform Dirichlet prior of a topic
mean_doc_size: int
Mean Value if word count in each document
Attributes
topic_word_distr_ (array, [n_topics, n_words]) Topic word distribution. Methods
Utilities¶
Utility functions for tensor operations
-
tensor_lda.utils.tensor_utils.
khatri_rao_prod
(a, b)[source]¶ Khatri-Rao product
Generate Khatri-Rao product from 2 2-D matrix.
Parameters: a : 2D array, shape (n, k)
first matrix
b : 2D array, shape (m, k)
second matrix
Returns: matrix : 2D array, shape (n * m, k)
Khatri-Rao product of a and b
-
tensor_lda.utils.tensor_utils.
rank_1_tensor_3d
(a, b, c)[source]¶ Generate a 3-D tensor from 3 1-D vectors
Generate a 3D tensor from 3 rank one vectors a, b, and c. The returned 3-D tensor is in unfolded format.
Parameters: a : array, shape (n,)
first rank one vector
b : array, shape (n,)
second rank one vector
c : array, shape (n,)
thrid rank one vector
Returns: tensor: array, (n, n * n)
3D tensor in unfolded format. element (i, j, k) will map to (i, (n * k) + j)
-
tensor_lda.utils.tensor_utils.
tensor_3d_from_matrix_vector
(b, a)[source]¶ Generate 3-D tensor from 2-D matrix and 1-D vector
This function is similar to tensor_3d_from_vector_matrix function. The only difference is the first argument is 2-D matrix and the second element is 1-D vector.
Parameters: b : array, shape (m, n)
2-D matrix
a : array, shape (p,)
vector
Returns: tensor : array, shape (m, n * p)
3D tensor in unfolded format.
-
tensor_lda.utils.tensor_utils.
tensor_3d_from_vector_matrix
(a, b)[source]¶ Generate 3-D tensor from 1-D vector and 2-D matrix
Generate a 3D tensor from a 1-D vector a and 2-D matrix b. The returned 3-D tensor is in unfolded format.
Parameters: a : array, shape (m,)
1-D vector
b : 2-D array, shape (n, p)
2-D matrix
Returns: tensor: array, (m, n * p)
3D tensor in unfolded format.
-
tensor_lda.utils.tensor_utils.
tensor_3d_permute
(tensor, tensor_shape, a, b, c)[source]¶ Permute the mode of a 3-D tensor
This is a slow implementation to generate 3-D tensor permutations.
Parameters: tensor : 2D array, shape (n, m * k)
3D tensor in unfolded format
tensor_shape : int triple
Shape of the tensor. Since tensor is in unfolded format. We need it’s real format to calculate permutation.
a : int, {1, 2, 3}
new first index
}
b : int, {1, 2, 3}
new second index
c : int, {1, 2, 3}
new thrid order index
-
tensor_lda.utils.tensor_utils.
tensor_3d_prod
(tensor, a, b, c)[source]¶ Calculate product of 3D tensor with matrix on each dimension
TODO: move it to test
Parameters: tensor : 3D array, shape (n1, n2, n3)
a : array, (n1, m)
b : array, (n2, n)
c : array, (n3, p)
Returns: t_abc : array, (m, n, p)
tensor(a, b, c)