npsm 새물리 New Physics : Sae Mulli

pISSN 0374-4914 eISSN 2289-0041


Research Paper

New Phys.: Sae Mulli 2022; 72: 487-494

Published online July 31, 2022

Copyright © New Physics: Sae Mulli.

Minimal Neural Network to Learn the Metal-insulator Transition in the Dynamical Mean-field Theory

Hyejin Kim*, Dongkyu Kim, Dong-Hee Kim

Department of Physics and Photon Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea

Correspondence to:*E-mail:

Received: April 21, 2022; Revised: June 15, 2022; Accepted: June 19, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

We present a minimal neural network model to learn classifying the metallic and insulating phases from the real-frequency hybridization function computed in the dynamical mean-field theory for the repulsive Hubbard model at half-filling. The resulting neural network discriminates the phases essentially by reading the presence of the quasiparticle peak. The pattern observed in the weight matrix of neural connectivity allows us to write down a simple form of an indicator that can precisely detect the transition point only with the bath parameters building the Anderson impurity model. The proposed transition indicator is very sensitive to the emergence of zero energy orbital in the quantum bath. We demonstrate the accuracy of the indicator in the discrete bath description with a few orbitals for the exact diagonalization solver.

Keywords: Neural network, Machine learning, Metal-insulator transition, Dynamical mean-field theory

Machine learning with a neural network has attracted increasing attention in various fields of science, including condensed matter and statistical physics[1-5]. The neural network usually works as a black-box model where many hidden variables inside are trained with large input data samples to produce the outputs of desired predictions. The applicability of the data-driven approach has been extensively examined in various subjects, such as classifying phases of matter[6-12], accelerating numerical simulations[13-17], and approximating quantum wave functions[18-21]. On the other hand, dealing with the lack of transparency in the black-box model is another important direction of research. Understanding how the machine interprets the data and what particular information it extracts from the data can potentially help to gain physical insight from such data-driven predictions. For instance, previous works studied the physical justification of machine predictions[22-27], the characterization of the phases of matter[28-31], and the extraction of an order parameter for phase transitions[32-38]. In this paper, we attempt to interpret the machine learning of a metal-insulator transition trained with the data of the dynamical mean-field theory for the repulsive Hubbard model at the half-filling.

The dynamical mean-field theory (DMFT)[39] maps a lattice model of the Hubbard Hamiltonian into the Anderson impurity model (AIM) with a quantum bath determined in a self-consistent way. DMFT provides an exact solution in the limit of infinite dimensions, having successfully described the phase diagram of a metal-insulator transition in the single-band repulsive Hubbard model. Several machine learning schemes have been applied to DMFT and AIM[40-43]. In particular, very high accuracy in classifying the metallic and insulating phases was reported in the previous supervised learning of the bath parameters representing the bare hybridization function[41]. The previous work used the exact diagonalization to solve AIM. The machine classification was up to 99.6% accurate in the previous work, even though the exact diagonalization considers a few bath orbitals projecting a quantum bath in limited spectral resolution.

The metal-insulator transition in the infinite dimensions has been already well established in DMFT with the known indicators, such as the double occupancy and the emergence of the quasiparticle peak in the spectral gap. Performing machine learning for phase classification in such thoroughly studied phenomena may not be expected to reveal any new physics of the phenomena. However, from the point of view of machine learning, it can be still a good example system to study how the machine understands the phenomena. Our aim in this study is to open a black box to see inside how the data-driven predictions mimic the known physics of the phenomena. In particular, we want to see how the neural network extracts a relevant spectral feature to make sharp detection of a phase transition point and to propose a simple order-parameter-like quantity based on the observed mechanism of the machine prediction.

Our strategy is to downsize the hidden layer of the feed-forward neural network to find the minimal representation of machinery[26,27]. We employ the real-frequency hybridization function as a training dataset computed using the numerical renormalization group solver. It turns out that the phase classification does not lose its accuracy even when we decrease the size of the hidden layer to the minimum, which eventually becomes equivalent to the logistic regression. The pattern observed in the weight matrix of the neural connectivity indicates that the trained neural network mainly detects the existence of the quasiparticle peak in the spectral data given as an input. By analyzing the network output function, we find that a simple transition indicator can be written in terms of bath parameters, which is directly applicable to the discrete-orbital formulation of AIM for the exact diagonalization. The transition points identified by this simple indicator agree well with the phase diagram constructed using the conventional quantity of double occupancy. The indicator inspired by machine learning is very sensitive to the presence of zero energy orbital in the quantum bath, explaining its ability to discriminate the phases across the metal-insulator transition.

1. Metal-insulator transition at infinite dimensions

First, let us briefly describe the DMFT phase diagram for the half-filled single-band Hubbard model with repulsive interactions at infinite dimensions [39,44]. The Hamiltonian of the single-band Hubbard model can be written as

H=t i,j,σ(ciσcjσ+cjσciσ)+Uin in iμ iσniσ,

where σ{,} denotes the spin index of two-component fermions, the interaction strength U is positive, and t is the hopping strength. The chemical potential µ is fixed at U/2 because we only consider the half-filled system in this study. The system undergoes the first-order phase transition at low temperatures, where it is metallic at weak interactions and insulating at strong interactions. In the DMFT phase diagram, a coexistence region is established at Uc1<U<Uc2, where both metallic and insulating solutions are possible at the same U. From previous DMFT studies in the infinite dimensions[39], it is known that Uc12.3D and Uc22.9D, where D is the half-width of the semi-circular density of states. In Fig. 1, we display the spectral features across the transition computed in our calculations with the numerical renormalization solver. The character of the solution persists with the phase of the area where it comes from when it moves into the coexistence region from one side of the phase diagram. Thus, assuming a very gentle change of the parameters to move into the coexistence area, the spectral feature and the quasiparticle weight depend on whether it is from the metal to the insulator or the other way around.

Figure 1. (Color online) (a) Evolution of the hybridization function Im[Δ(ω)] by tuning the on-site interaction U for the Bethe lattice with half-filling. There exist a phase transition from the metallic to the insulating state (increasing U, Uc22.9), as well as from the insulating to the metallic state (decreasing U, Uc12.3). (b) Quasiparticle weight with data from DMFT-NRG. The blue(black) line represents the phase transition from the metallic(insulating) to the insulating(metallic) state, and the green lines shows the phase transition points measured by the quasiparticle weight.

Local quantum fluctuations are fully considered in the conventional single-site DMFT by self-consistently mapping the lattice site of the Hubbard model into a local impurity surrounded by a quantum bath. The resulting single-site Anderson impurity model (AIM) is then written as follows:

HAIM= k,σϵkakσakσ+ k,σ Vk cσ akσ +Vk*akσ cσ +Unnμσn σ,

where akσ and akσ denote fermionic creation and annihilation operators at a bath orbital k associated with energy ϵk, respectively, and Vk is the strength of coupling to the impurity site. The central part of DMFT is to solve the Anderson impurity model, which can be done for instance by using quantum Monte Carlo (QMC)[45, 46, 47], numerical renormalization group (NRG)[48], or exact diagonalization (ED)[49] methods. The solution of AIM provides a self-energy feedback to the lattice Green's function, updating the quantum bath to build AIM for the next iteration proceeding toward self-consistency. Technical details depend on the type of the impurity solver.

2. Hybridization function as a training data

The hybridization function works as the Weiss mean-field to construct the quantum bath of AIM, bridging the lattice model and the corresponding impurity model. In the Bethe lattices, the hybridization function Δ(z) is related to the local Green's function G(z) as Δ(z)=(D/2)2G(z), and therefore the imaginary part of the real-frequency hybridization function Im[Δ(ω+i0+)] is proportional to the spectral function.

We use Δ(ω) as our training data for a neural network to learn the difference between the metal and insulating phases. We employ the numerical renormalization group (NRG) method[48] to solve AIM at zero temperature and prepare the input data of the hybridization function Δ(ω) for the neural network. We use the open-source NRG code[50, 51]. NRG describes the low-frequency behavior accurately and allows direct access to a real-frequency spectral function. Note that the previous work[41] also considered the spectral function as an input data but used the ED method and the expansion with the Legendre function. Because NRG has much finer low frequency resolution, we expect that NRG-generated training data can deliver more complete information to the machine, which would lead to a clearer pattern in the trained neural network. This fits better with our purpose of interpreting how it makes the prediction based on the data.

We prepare the training data set of Δ(ω) at various values of U strictly in the metallic (U<Uc1) and insulating (U>Uc2) phases, away from the coexistence region, to put an exclusive label on the data for supervised learning. Specifically, the input data set for training consists of 300 samples of Δ(ω) generated at each of equally-spaced 150 U's in the range of 0.6U2.1 in a metallic phase and at 150 U's in the range of 3.0U4.5 in an insulating phase. For each U, the input data Δ=(Re[Δ(ω1)],,Re[Δ(ωN)],Im[Δ(ω1)],,Im[Δ(ωN)]) is obtained in the frequency range of 105ωnω0λn15.0 equally spaced in the logarithmic-scale grids with the ratio λ=1.01 and ω0=105 given by the NRG solver.

While the training data is prepared exclusively in the region of metallic and insulating phases, the test data set contains the coexistence area of the phase diagram. The test data set includes 229 samples representing the metal-to-insulator transition generated at equally spaced U in the range of 0.6UUc2 and 226 samples representing the insulator-to-metal transition generated in the range of Uc1U4.5. To go inside of the coexistence area, our DMFT calculation starts from the deep metallic and insulating phases and gradually adjusts U toward the coexistence area by restarting the DMFT iteration with the solution obtained at a previous U.

3. Training artificial neural network to learn the spectral features

We perform supervised learning for a feed-forward neural network model to discriminate the metallic and insulating phases with the prepared input dataset. We consider a simple network structure with a single hidden layer sketched in Fig. 2(a). The neural network receives the hybridization function Δ(ωn) as an input, processes it, and then prints some value to judge whether the data belongs to a metallic or insulating phase. The sigmoid function σ(x)=1/(1+exp(x)) is employed as an activation function between each layer. The data processing in the neural network can be written as

Figure 2. (Color online) (a) Structure of the fully connected neural network. The input layer receives the hybridization function and gives the probability of each phase as an output. The sigmoid function σ(x) is employed as an activation function between each layer.
(b) Test outputs Pmetal with data from DMFT-NRG. The blue(black) line represents the phase transition from the metallic(insulating) to the insulating(metallic) state, and the green lines shows the phase transition points calculated by quasiparticle weight. (top) Neural network with number of hidden node set to Nh=100 and Nh=10. (bottom) Logistic regression without hidden layer.


where each W(i) and b(i) are the weight matrix and the bias vector that are unknowns to be determined by the training procedures. The output vector P can be written as PT=(Pmetal,Pinsulator1Pmetal), where P=0.5 is the typical criterion to make the machine prediction of the phase.

In the training process, the uniform initialization and the Xavier initialization[52] are employed to initialize the unknowns. The objective function to be minimized is chosen to be the binary cross-entropy loss function, and the L2 regularization is used to prevent overfitting. The full batch gradient descent is used for the optimization. The learning rate and the regularization coefficient are set to be 103 and to 105, respectively. The model is numerically trained for 5000 epochs with a full batch of the data by using the PyTorch library.

1. Finding a minimal structure of the neural network

To find the simplest possible structure of the neural network, we observe how the accuracy of the prediction changes with decreasing the number of neurons in the hidden layer. Figure 2(b) presents the test output Pmetal for each of examined neural networks trained with different sizes of the hidden layer. Surprisingly, it turns out that the prediction accuracy is quite independent of the size of the hidden layer, which does not change even when the entire hidden layer is completely removed. This indicates that the logistic regression without the hidden layer suffices for discriminating the phases with the input of the hybridization function. This observation allows us a chance to analyze the actual data flow from the input to the output in the minimally simple structure of the neural network.

2. Machine-inspired indicator of the transition

Because the weight matrix essentially governs the prediction if the presence of the hidden layer is not important, we analyze the feature of the neural connectivity in the weight matrix to understand the basis of the prediction made by the neural network. Figure 3 shows the pattern observed in the elements of the weight matrix with decreasing the size of the hidden layer, which becomes a simple function that may characterize the predicting power in the limit of the complete removal of the hidden layer. Figure 4 plots the weight matrix as a one-dimensional function of the frequency for the case of no hidden layer that gives the logistic regression.

Figure 3. (Color online) Heat maps for the weight matrix W(1) of different neural network models with input node index of logarithmic mesh index. The weight matrix is composed of the input layer receiving the real and the imaginary part of the hybridization function. (a) Neural network with number of hidden nodes set to (a) Nh=100, (b) and Nh=10. (c) Logistic regression without hidden layer.

Figure 4. (Color online) Weight matrix of the Logistic regression for (a,c) the real part un (b,d) the imaginary part vn. The weight matrix (a,b) W(1)(ω) on the real frequency domain (c,d) and on the logistic frequency domain. Dotted black line represents simple polynomial fitting curve.

In the logistic regression model with the weights in Fig. 4, the network output can be written simply by the inner production of two vectors as

P=wΔ=nu nRe[Δ(ω n)]+v nIm[Δ(ω n)]

where the simpler notation introduced as wW(1) is a vector since we have only one weight matrix that directly connects Nω input nodes and a single output node. In the weight vector w can be decomposed into two parts as w=(u,v), where u and v represent the elements with index associated with the real and imaginary parts of the input, respectively. We find that the observed patterns in the neural weights suggest a simple form of the network output when applied to the formulation of the Anderson impurity model with a quantum bath of discrete orbitals. In terms of the parameters {ϵk,Vk} in the Anderson impurity model, one can formally write the real-frequency hybridization function as

Δ(ω)=k|V k|2ω+0+ϵ k.

The input data is at the logarithmic-scale grids in the frequency domain, and thus the density of data in the linear scale is proportional to 1/ω. Then, we approximate Eq. (4) in the continuum limit of frequency as

Po({ϵk,Vk})= k dω u(ω) |ω| | Vk |2 ω ϵk +v( ϵ k ) | Vk |2 | ϵk |,

which becomes further simpler with the observed features in u(ω) and v(ω).

Figure 4 presents u(ω) and v(ω) numerically obtain from the training. It turns out that u(ω) is an odd function while v(ω) is an even function. Considering the real-part of Δ(ω) is an even function at the half-filling with the particle-hole symmetric distribution of ϵk, the first term in Eq. (6) does not contribute, and the second term governs the network output. Then, we may further approximate the network output as

Pokv(ϵk)|V k|2|ϵ k|,

which we know is drastically different between the metallic and insulating phases. It is well known that the presence of the quasiparticle peak at zero frequency distinguishes the metallic phase from the insulating phase. Thus, the 1/ϵk factor plays a central role in Eq. (7), and we may write a simple indicator of the metal-insulator transition as

Q({ϵk,Vk})= k|Vk |2|ϵk |1,

omitting the neural network contribution v(ω) that is almost constant at low frequencies. The inverse prevents the quantity from being divergent in the metallic phase where an orbital with ϵk0 exists. Because we consider a bath with few discrete orbitals for the use of Eq. (8), we ignore the possibility of having an orbital with Vk=0 at ϵk=0 in the insulating phase although it is possible in principle in the continuum bath.

In the metallic phase, Q should be close to zero because of the presence of a very small ϵk. On the other hand, in the insulating phase, ϵ is not small because of the spectral gap so that Q becomes some finite value. In short, the neural network in the simplest form built by training with the hybridization function essentially captures the presence of the zero-frequency peak, which is encoded as the 1/ϵk factor into the network output function. The details of the neural network are not important for the performance. This inspires us to write an indicator to sharply distinguish the metal and insulator phases in the level of the AIM Hamiltonian construction which is practically relevant to the Hilbert space projection with discrete orbitals in the exact diagonalization solver.

3. Numerical verification of the transition indicator

Because the proposed indicator Q is the most relevant to the case with discrete orbitals of {ϵk,Vk}, we verify its accuracy and usability with the data of {ϵk,Vk} obtained by separately performing the DMFT calculations with the exact diagonalization solver in the Bethe lattices and the three dimensional lattices. The DMFT iterations are performed in an imaginary frequency domain, following the conventional implementation[49]. The Hilbert space projection to build the bath of discrete orbitals is done on the Matsubara frequency grid iωn=i(2n+1)π/β, where we use β=100 for the grid spacing, and n ranges from -1000 to 1000. We use seven orbitals to build a bath for the Anderson impurity model. Figure 5 presents the behavior of Q computed with the ED solver in the infinite-dimensional Bethe lattices and in the three dimensional lattices. The indicator Q exhibits jumps at the transition points, which accurately coincide with the transition point indicated by the conventional test of the double occupancy.

Figure 5. (Color online) Phase predictions using Q({ϵk,Vk})=[ kVk2/|ϵk|]1 for (a) the Bethe lattice, (b) the simple cubic lattice, and (c) the body-centered cubic lattice. The blue(black) line is for datasets from the metallic(insulating) to the insulating(metallic) state, and the green lines indicate the phase transition points for Uc1 and Uc2 measured by the double occupancy.

We have investigated the supervised learning of the metal-insulator transition with the DMFT data of the hybridization function computed using the NRG solver. For the interpretation of the data-drive prediction, we have focused on a feed-forward neural network with a single hidden layer and attempted to decrease the size of the hidden layer to find a transparent minimal structure. It turns out that the accuracy of the transition point identification is not affected by the size of the hidden layer. The minimal structure can be constructed without the hidden layer, which becomes equivalent to the logistic regression model where the weight matrix governs the prediction. By analyzing the observed pattern of the weight matrix, we have found that the neural network mainly reads the presence of the quasiparticle peak, and this functionality can be implemented even without the complex structure of the hidden layer. Keeping the essence of the mathematical structure of the neural network output, we have proposed a simple indicator of the metal-insulator transition as a function of discrete bath parameters. The accuracy of the proposed indicator is numerically verified in the DMFT calculations with the ED solver in various lattices.

  1. G. Carleo et al, Rev. Mod. Phys. 91, 045002 (2019).
  2. P. Mehta et al, Phys. Rep. 810, 1 (2019).
  3. V. Dunjko and H. J. Briegel, Rep. Prog. 81, 074001 (2018).
    Pubmed CrossRef
  4. J. Carrasquilla, Adv. Phys. X 5, 1797528 (2020).
  5. G. Torlai and R. G. Melko, Annu. Rev. Condens. Matter Phys. 11, 325 (2020).
  6. J. Carrasquilla and R. G. Melko, Nat. Phys. 13, 431 (2017).
  7. J. Venderley, V. Khemani and E.-A. Kim, Phys. Rev. Lett. 120, 257204 (2017).
    Pubmed CrossRef
  8. L. Wang, Phys. Rev. B 94, 195105 (2016).
  9. A. Canabarro et al, Phys. Rev. B 100, 045129 (2019).
  10. X.-Y. Dong, F. Pollmann and X.-F. Zhang, Phys. Rev. B 99, 121104 (2019).
  11. N. L. Holanda and M. A. R. Griffith, Phys. Rev. B 102, 054107 (2020).
  12. Q. H. Tran, M. Chen and Y. Hasegawa, Phys. Rev. E 103, 052127 (2021).
    Pubmed CrossRef
  13. J. Liu, Y. Qi, Z. Y. Meng and L. Fu, Phys. Rev. B 95, 041101 (2017).
  14. H. Shen, J. Liu and L. Fu, Phys. Rev. B 97, 205140 (2018).
  15. Y. Wu, L.-M. Duan and D.-L. Deng, Phys. Rev. B 101, 214308 (2020).
  16. Y. Nagai, M. Okumura and A. Tanaka, Phys. Rev. B 101, 115111 (2020).
  17. Y. Nagai, M. Okumura, K. Kobayashi and M. Shiga, Phys. Rev. B 102, 041124 (2020).
  18. G. Carleo and M. Troyer, Science 355, 602 (2017).
    Pubmed CrossRef
  19. Z. Cai and J. Liu, Phys. Rev. B 97, 035116 (2018).
  20. R. G. Melko, G. Carleo, J. Carrasquilla and J. I. Cirac, Nat. Phys. 15, 887 (2019).
  21. L. Yang et al, Phys. Rev. Res. 2, 012039 (2020).
  22. K. Kashiwa, Y. Kikuchi and A. Tomiya, Prog. Theor. Exp. Phys. 2019, 083 (2019).
  23. P. Suchsland and S. Wessel, Phys. Rev. B 97, 174435 (2018).
  24. W. Zhang, L. Wang and Z. Wang, Phys. Rev. B 99, 054208 (2019).
  25. Y. Jintau and C. Junpeng, Phys. Lett. A 412, 127589 (2021).
  26. D. Kim and D.-H. Kim, Phys. Rev. E 98, 022138 (2018).
    Pubmed CrossRef
  27. D. Kim and D.-H. Kim, J. Stat. Mech. 2021, 023202 (2021).
  28. C. Casert, T. Vieijra, J. Nys and J. Ryckebusch, Phys. Rev. E 99, 023304 (2019).
    Pubmed CrossRef
  29. H. Théveniaut and F. Alet, Phys. Rev. B 100, 224202 (2019).
  30. O. Balabanov and M. Granath, Mach. Learn.: Sci. Technol. 2, 025008 (2021).
  31. J. Arnold, F. Schäfer, M. Žonda and A. U. J. Lode, Phys. Rev. Res. 3, 033052 (2021).
  32. K. Liu, J. Greitemann and L. Pollet, Phys. Rev. B 99, 104410 (2019).
  33. D. Bachtis, G. Aarts and B. Lucini, Phys. Rev. E 102, 053306 (2020).
    Pubmed CrossRef
  34. S. Blücher et al, Phys. Rev. D 101, 094507 (2020).
  35. A. Dawid et al, New J. Phys. 22, 115001 (2020).
  36. J. Greitemann et al, Phys. Rev. B 100, 174408 (2019).
  37. K. Liu et al, Phys. Rev. Res. 3, 023016 (2021).
  38. N. Rao, K. Liu and L. Pollet, Phys. Rev. E 104, 015311 (2021).
    Pubmed CrossRef
  39. A. Georges, G. Kotliar, W. Krauth and M. J. Rozenberg, Rev. Mod. Phys. 68, 13 (1996).
  40. L.-F. Arsenault, A. Lopez-Bezanilla, O. A. von Lilienfeld and A. J. Millis, Phys. Rev. B 90, 155136 (2014).
  41. L.-F. Arsenault, O. A. von Lilienfeld and A. J. Millis, arXiv:1506.08858 (2015).
  42. T. Song and H. Lee, Phys. Rev. B 100, 045153 (2019).
  43. E. Sheridan et al, Phys. Rev. B 104, 205120 (2021).
  44. D. Vollhardt, K. Byczuk and M. Kollar, Dynamic Mean-Field Theory. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012), p 203-236.
  45. E. Gull, P. Werner, O. Parcollet and M. Troyer, EPL 82, 57003.
  46. P. Werner et al, Phys. Rev. Lett. 97, 076405 (2006).
    Pubmed CrossRef
  47. A. N. Rubtsov, V. V. Savkin and A. I. Lichtenstein, Phys. Rev. B 72, 035122 (2005).
  48. R. Bulla, T. A. Costi and T. Pruschke, Rev. Mod. Phys. 80, 395 (2008).
  49. M. Caffarel and W. Krauth, Phys. Rev. Lett. 72, 1545 (1994).
    Pubmed CrossRef
  50. R. Žitko and T. Prunschke, Phys. Rev. B 79, 085106 (2009).
  51. R. Žitko, NRG ljubljana (2021).
  52. X. Glorot and Y. Bengio, in Proceedings of the thirteenth international conference on artificial intelligence and statistics, p 249-256.
  53. S. Wright and J. Nocedal, Numerical optimization. (Springer Science, 1999), p 67-68.

Stats or Metrics

Share this article on :

Related articles in NPSM