Retrieve bipartite reduced density matrix

(1) To clarify my question, I knew of (currently) a way Julia construct reduced density matrix of any arbitrary site that is exponential in the number of sites ‘left’ (like in this example, two-site reduced density matrix that is not too expensive to compute):

In C++: ITensor

Constructing EE in arbitrary sites: How do I evaluate Block Entanglement for sub-blocks that do not extend to the end of the lattice? - ITensor Support Q&A

In Julia version, the following discussion link: Two-site reduced density matrix

For this method, am I right to presume that if I want to, say, compute an expectation value like Tr_A ( rho_A Q), where A is the remains site/index in the reduced density matrix rho_A (in the two sites example above, that would be the two remaining sites) that is not traced out and Q is an MPO acting on sites contained in A, all I need to do is to make sure their site indices match in the iTensor Julia algorithm?

(2) In the computation of the bipartite (left/right partition) entanglement entropy, all you need is to cast the MPS into mixed canonical form (choosing a gauge center), and then perform SVD on the gauge center link index and obtain the spectrum of the non-negative singular values in the SVD decomposition, like in this code below:

C++ version: ITensor

Is there a way to take ‘advantage’ of the fact that if all we are interested is bipartition, we can ‘extract’ the MPS from the U matrix (for left partition) or the V matrix (for the right partition) in the MPS code:

U, S, V = svd( psi, {indices of U} )

and then construct rho_{left} out of say, eigenstates contained in U coupled with the non-singular values in S (since doing the above operation is essentially Schmidt decomposition)? Is Julia iTensor smart enough to retain index information (if I, say, construct reduced density matrix out of U and S) of the original indices in the MPS psi so that we can do operation like Tr_left (rho_left Q)?

(3) If approach outlined in Q(2) works, will it be classically efficient (that is, not scaling exponentially like one in Q(1)?