The issues of frequent recompilation and low computational efficiency in ITensor.jl

I would suggest looking into profiling your code to understand what is taking the largest amount of time and reduce the question of your biggest slowdown into a minimal example.
Another thing to keep in mind is that for loops (e.g. for alpha in alphas and for theta in thetas) will not automatically parallelize.

The last technical comment is that you can also pass kwargs to states, for example

ITensors.state(::StateName"FUp",   ::SiteType"Replica"; phi) = [1.,1.,1.,0.,0.,1.] #phi ignored but needed
ITensors.state(::StateName"FDownk", ::SiteType"Replica"; phi) = [1.0, 0.0, 0.0, exp(1im*phi), exp(-1im*phi), 1.0]
# ...
# ex:
bndk = MPS([state(sites[n],(n ≤ NA ? "FDownk" : "FUp"); phi=2pi*k/(NA+1)) for n=1:N])

and productMPS has been replaced with just MPS