Tensor network implementations for transformers and LLMs

kevinkawchak · March 25, 2024, 6:44am

Are there examples for implementing Tensor networks into transformers and LLMs?

miles · March 25, 2024, 7:04am

This nice recent paper by Jordan Taylor shows how components of neural architectures such as transformers can be understood in a tensor network formalism and notation:
https://arxiv.org/abs/2402.01790

There are also quite a few papers in the machine learning community exploring the compression of neural architectures by using the “tensor train” (a.k.a. MPS) concept and other related concepts like the ‘CP decomposition’ of tensors. Here are some representative papers:

If you search for terms like “tensor train” and “neural” or “tensorized” and “neural” together on Google Scholar you can find many more, or by searching through the references in papers like those above.

kevinkawchak · March 30, 2024, 2:24am

Thank you, what are the main challenges to develop explainability metrics of tensor network generative ai models?

Papers:
[01] Ran, S., et al. 2023. https://spj.science.org/doi/full/10.34133/icomputing.0061
[02] Tomut, A. et al. 2024. [2401.14109] CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks
[03] Abronin, V., et al. 2024. [2401.16367] TQCompressor: improving tensor decomposition methods in neural networks via permutations

miles · March 30, 2024, 4:06am

What would be some explainability metrics you are thinking of?

To my mind, the field of machine learning is still in the early days of developing the right theory and concepts I.e. still figuring what are even the right questions to be asking and a framework that can yield clear answers.

My belief is that tensor networks can help with this. For example, a well defined question could be “for a particular supervised learning problem, how many samples from the true distribution are needed to reach a prediction accuracy of X%?”. We took an early stab at this question here: [1910.07425] Modeling Sequences with Quantum States: A Look Under the Hood But there is still much to do to generalize that approach and understand what properties of distributions are the relevant ones.

Another opinion I have, separate from tensor networks or architectures, is that the ML field can only make deep progress by studying problems where the true distribution is know or can be estimated. Learning from “found” data like text from the internet yields impressive demos but I question how much scientific progress is being made by doing that. I would withdraw my skepticism if the data was analyzed so that researchers could characterize the distribution the data comes from but I don’t think that’s generally the case.

So I think data from scientific domains (eg samples from a quantum system or classical stat mech system) are a potential gold mine for ML research and should be used even more in studies. This is already happening which is good news I think.

Topic		Replies	Views
Tensor Network solvers ITensor Julia Questions	1	71	April 13, 2024
Strange compilation behavior for the function MPS ITensor Julia Questions	15	603	February 13, 2023
Difference between Time Evolution using MPS and Tensor Networks (ITensor vs. ITensorNetworks) ITensor Julia Questions	2	172	May 9, 2024
Benchmarking DRMG with small 2D system ITensor Julia Questions	1	298	August 17, 2022
Sampling bitstring probabilities of ITensorNetworks ITensor Julia Questions	4	200	April 29, 2024

Tensor network implementations for transformers and LLMs

Related topics