Tensor network implementations for transformers and LLMs

Are there examples for implementing Tensor networks into transformers and LLMs?

This nice recent paper by Jordan Taylor shows how components of neural architectures such as transformers can be understood in a tensor network formalism and notation:
https://arxiv.org/abs/2402.01790

There are also quite a few papers in the machine learning community exploring the compression of neural architectures by using the “tensor train” (a.k.a. MPS) concept and other related concepts like the ‘CP decomposition’ of tensors. Here are some representative papers:

If you search for terms like “tensor train” and “neural” or “tensorized” and “neural” together on Google Scholar you can find many more, or by searching through the references in papers like those above.

1 Like

Thank you, what are the main challenges to develop explainability metrics of tensor network generative ai models?

Papers:
[01] Ran, S., et al. 2023. https://spj.science.org/doi/full/10.34133/icomputing.0061
[02] Tomut, A. et al. 2024. [2401.14109] CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks
[03] Abronin, V., et al. 2024. [2401.16367] TQCompressor: improving tensor decomposition methods in neural networks via permutations

What would be some explainability metrics you are thinking of?

To my mind, the field of machine learning is still in the early days of developing the right theory and concepts I.e. still figuring what are even the right questions to be asking and a framework that can yield clear answers.

My belief is that tensor networks can help with this. For example, a well defined question could be “for a particular supervised learning problem, how many samples from the true distribution are needed to reach a prediction accuracy of X%?”. We took an early stab at this question here: [1910.07425] Modeling Sequences with Quantum States: A Look Under the Hood But there is still much to do to generalize that approach and understand what properties of distributions are the relevant ones.

Another opinion I have, separate from tensor networks or architectures, is that the ML field can only make deep progress by studying problems where the true distribution is know or can be estimated. Learning from “found” data like text from the internet yields impressive demos but I question how much scientific progress is being made by doing that. I would withdraw my skepticism if the data was analyzed so that researchers could characterize the distribution the data comes from but I don’t think that’s generally the case.

So I think data from scientific domains (eg samples from a quantum system or classical stat mech system) are a potential gold mine for ML research and should be used even more in studies. This is already happening which is good news I think.