PART I: DISCRETIZATION

In a series of two articles, we propose a comprehensive mathematical framework for Coupled-Cluster-type methods. These methods aim at accurately solving the many-body Schrödinger equation. In this first part, we rigorously describe the discretization schemes involved in Coupled-Cluster methods using graph-based concepts. This allows us to discuss different methods in a unified and more transparent manner, including multireference methods. Moreover, we derive the single-reference and the Jeziorski–Monkhorst multireference Coupled-Cluster equations in a unified and rigorous manner. Mathematics Subject Classification. 81V55, 81-08, 81-10. Received June 22, 2022. Accepted November 14, 2022.


Introduction
The Coupled-Cluster (CC) method is one of the most popular methods in computational quantum chemistry among Hartree-Fock (HF) and Density-Functional Theory (DFT).In its full generality, the quantum manybody problem is intractable, and it is one of the greatest challenges of quantum mechanics to devise practically useful methods to approximate the solutions of the many-body Schrödinger equation.Although the stationary Schrödinger equation itself is a linear eigenvalue problem, it is extremely high-dimensional even for a few particles and a low-dimensional one-particle space 1 .Here, we focus on those fermionic systems which are described by the so-called molecular Hamilton operator -on which most electronic-structure models are based in quantum chemistry.The Galerkin method applied to the Schrödinger equation (sometimes combined with an initial HF "guess") is branded Configuration Interaction (CI) in computational quantum chemistry; unfortunately, its applicability is limited due to the aforementioned high-dimensionality issue.The HF method is perhaps conceptually the simplest, whereby the ground state is approximated by minimizing the energy of the system over Slater determinants; the resulting Euler-Lagrange equations constitute a nonlinear eigenvalue problem that yields the HF ground state.HF theory has attracted much interest in the mathematical physics community, see e.g.[2,3,7,8,12,19,23,26,36].The spiritual successor to the statistical mechanics-motivated Thomas-Fermi theory-DFT-is the single most used method in quantum chemistry, and some of its mathematical aspects are also highly non-trivial [10,[20][21][22].
CC theory is a vast and highly active subfield of quantum chemistry, consisting of many variants and refinements.However, among the aforementioned methods, the CC approach has arguably received the least attention in the mathematics community.

Previous work
It is beyond the scope of this paper to give a historical review of the CC method and its vast number of variants.The interested reader is pointed to [4,5,14,18,35].The survey article [27] is somewhat more mathematically-oriented and also proposes a rather general framework.
The analysis of the single-reference CC method by R. Schneider and T. Rohwedder [31][32][33] serves as a starting point of our description of the CC discretization.In Part II, we will briefly summarize [33] and its follow-up works, where the context is more appropriate.

Outline
It is our intention to present both known and new results in a self-contained manner and primarily with a mathematical audience in mind.In Section 2, we describe the setting of the quantum-mechanical problems the CC theory is aimed at.Next, Section 2.3 gives a rough sketch of the most basic CI and CC methods.
We begin our discussion in Section 3.1 with the definition of a partial order relation which will be used to encode the relevant transitions of the system, called excitations.This partial order, and the induced lattice operations will be used in Section 3.2 to define the excitation graph, which fully describes the CC discretization scheme.We give a few examples of the generality of our concepts and also extend the definition of the excitation graph to the multireference (MR) case.After this, the corresponding excitation operators (Sect.3.3), cluster operators (Sect.3.4) and cluster amplitude spaces (Sect.3.5) are constructed, which are the essential building blocks for the formulation of any CC-like method.
In Section 4, we give short derivations of the conventional single-reference (SRCC) and Jeziorski-Monkhorst multireference (JM-MRCC) methods.We do so by generalizing the known procedure which is based on perturbation theory.
In Appendix A we calculate various graph-theoretic properties of the excitation graph.In Appendix B we propose a method based on linear programming to select reference determinants for the multi-reference setting in an optimal way.

Background
In this section we collect the concepts and results that are necessary for the forthcoming discussion.For proofs and more about the mathematical foundations of quantum mechanics, see e.g.[13,15,25,[28][29][30]38].
The spectrum of a linear operator  is written (), the elements of its discrete spectrum as ℰ  (), where  = 0, 1, 2, . .., if  is bounded from below.We use the usual notation [, ] =  −  for the commutator.The transpose of  is denoted as  † .For normed spaces  and  , the symbol ℒ(,  ) denotes normed space of bounded linear mappings  →  endowed with the operator norm ‖ • ‖ ℒ(, ) .Furthermore,  * denotes the (continuous) dual space.

Function spaces
In the context of many-body quantum mechanics, the Lebesgue-, and Sobolev spaces 2 (R3 ) and  1 (R 3 ) are viewed as "one-particle spaces".We ignore spin for simplicity and consider  2 (R 3 ) and  1 (R 3 ) as real Hilbert spaces, again for clarity.These choices are justified for our model Hamiltonian (see Sect. 2.2 below), but we remark that all the forthcoming considerations can be trivially extended to the more general setting.The one-particle spaces are then used to define the  -particle fermionic spaces (see e.g.[24]) 2 (R 3 ), and endowed with the inner products Ψ(X)Φ(X) dX and respectively.Here, X = (x 1 , . . ., x  ) ∈ R 3 and z • w denotes the Euclidean inner product.Also, is the distributional gradient operator acting on the th triple of the arguments.The norms corresponding to ⟨•, •⟩ and ⟨•, •⟩ H 1 are denoted as ‖ • ‖ and ‖ • ‖ H 1 , respectively.We define the second order Sobolev space as Let  ≥  or  = ∞ and assume that an  2 -orthonormal (spin-)orbital set Corresponding to ℬ  we can construct the set of Slater determinants The negative exponent Sobolev space H −1 will also be used in the sequel, which is given by the continuous dual space (H 1 ) * .We will exploit that the dense continuous embeddings H 1 ⊂ L 2 ⊂ H −1 hold true (see e.g.[1]), i.e. the spaces in question form a Gelfand triple.
Remark 2.1.An important remark is in order.Recall that  ⊂  ⊂  * are said to form a Gelfand triple if  is a real reflexive Banach space,  is real separable Hilbert space and the embedding  ⊂  is continuous and  is dense in  (see e.g.[39], Sect.23.4).It follows that for any Ψ ∈  there is a ̂︀ Ψ ∈  * such that ⟨ ̂︀ Ψ, Φ⟩  * × = ⟨Ψ, Φ⟩  for all Φ ∈  , and the mapping  →  * given by Ψ ↦ → ̂︀ Ψ is linear, injective and continuous.Hence, the embedding  ⊂  * is also continuous (and dense).Henceforth, we will write Ψ in place of ̂︀ Ψ for brevity.
Convention: We will drop the subscript from ⟨•, •⟩, as it is either obvious from its arguments that the duality pairing ⟨•, •⟩  * × has to be used, or both ⟨•, •⟩  * × and ⟨•, •⟩  are acceptable due to the Gelfand triple setting as discussed above.In particular, we apply this convention to H 1 ⊂ L 2 ⊂ H −1 , to the cluster amplitude spaces discussed in Section 3.5 and also in the general framework of Section 4.

Schrödinger Hamiltonian
In this section, we introduce the model Hamiltonian for concreteness.Let ,  : R 3 → R be Kato class potentials: ,  ∈  3/2 (R 3 ) +  ∞  (R 3 ) 2 with  even and define the quadratic form ℰ on H 1 as for any Ψ ∈ H 1 .For every  > 0, there is a   > 0 so that Kato's inequality (see e.g.[12] for a detailed proof), holds true.This implies that the quadratic form induced by  and  is infinitesimally form bounded with respect to −△.The KLMN theorem (see e.g.[29]) implies that there exists a unique self-adjoint operator ℋ : (ℋ) → L 2 associated to ℰ, having form domain (ℋ) = (ℰ) = H 1 and being lower semibounded.This ℋ is given by for all Ψ ∈ (ℋ) and X ∈ R 3 .Kato's inequality implies that there is a constant  > 0, such that for all Ψ, Φ ∈ H 1 .Thus, ℋ can be extended to a bounded mapping H 1 → H −1 , which we denote with the same symbol.We say that Ψ ∈ H 1 and ℰ ∈ R satisfy the weak Schrödinger equation if ⟨ℋΨ, Φ⟩ = ℰ⟨Ψ, Φ⟩ for all Φ ∈ H 1 .
As far as the finite-dimensional case  < ∞ is concerned, we simply consider the Galerkin projection of the weak Schrödinger equation.More precisely, let H 1  ⊂ H 1 be as defined in Section 2.1.Then Ψ ∈ H 1  and ℰ ∈ R are said to satisfy the projected Schrödinger equation if ⟨ℋΨ, Φ⟩ = ℰ⟨Ψ, Φ⟩ for all Φ ∈ H 1  .The so-called (electronic) molecular Hamiltonian ℋ corresponds to the special case where   ∈ N ( = 1, . . .,  ) and r 1 , . . ., r  ∈ R 3 denote the charges and the positions of the  ∈ N nuclei.

The CI and the CC method
We now give a very rough description of the single-reference CI and CC methods.For the rigorous derivations, we refer to Section 4.
In a preliminary step-typically using the Hartree-Fock method-the reference determinant is constructed and normalized so that ‖Φ 0 ‖ = 1.We restrict our discussion here to the case when relevant the function spaces are real.The occupied orbitals Here,  = ∞ is allowed.The orthonormal set ℬ  generates the Slater determinants B  and the subspace H 1  ⊂ H 1 (see Sect. 2.1).For later convenience, we introduce the space H 1,⊥ as the L 2 -orthogonal complement of Span{Φ 0 } in H 1 .Further, we also set In both the CI and the CC method, the Schrödinger equation ℋΨ = ℰΨ is solved based on the reference wavefunction Φ 0 .For simplicity 3 , we consider the case when Ψ is sought after in the form Ψ = Φ 0 + Ψ, where ⟨Ψ, Φ 0 ⟩ = 0.In other words, Ψ is calculated via a correction Ψ to Φ 0 .Note that ⟨Ψ, Φ 0 ⟩ = 1, which is called the intermediate normalization condition.If the "targeted" wavefunction Ψ happens to be orthogonal to the reference determinant Φ 0 , then the Ansatz Ψ = Φ 0 + Ψ cannot yield a solution. 3Although the CI method is more general.
The Full Configuration Interaction (FCI) method can be summarized as follows: find Ψ ∈ H 1,⊥  such that Here, ℰ CI = ‖Ψ‖ −2 ⟨ℋΨ, Ψ⟩ is called the CI-, or variational energy.The projected CI method is simply the Galerkin projection of the previous problem to some finite dimensional subspace 3) The choice of the Galerkin subspace V  is typically based on so-called truncation rank, for instance V  = V SD , is the span of singly-, and doubly excited determinants in B  .The corresponding (projected) CI method in this case is designated as "CISD".
The CI equations are more commonly expressed using cluster operators.A cluster operator  : L 2 → L 2 is a bounded linear operator that is a linear combination of special products of fermionic creation and annihilation operators  †  and   (see Part II for a definition), so that the action of each such product is to replace some occupied orbitals ℬ occ with the same number of virtual orbitals ℬ ,virt in a Slater determinant (see Rem. 3.18).A cluster operator  can therefore be parametrized with the said linear-combination coefficients, which are denoted by the lower case  and are called cluster amplitudes.The vector space of all cluster amplitudes is denoted by V.There is a one-to-one correspondence between functions in L 2,⊥ (resp.H 1,⊥ ) and functions of the form Φ 0 , where  : L 2 → L 2 (resp. : H 1 → H 1 ) is a cluster operator (see [31]).Therefore, (2.2) can be expressed as follows: find a cluster operator  (or, equivalently cluster amplitudes ), such that ⟨ℋ( + )Φ 0 , Φ 0 ⟩ = ℰ CI ()⟨( + )Φ 0 , Φ 0 ⟩ for all cluster operators . ( Here, ℰ CI () = ‖( + )Φ 0 ‖ −2 ⟨ℋ( + )Φ 0 , ( + )Φ 0 ⟩.Although this might seem an unnecessary complication at first, cluster operators are essential for the formulation of the CC method.
In the CC method, the "exponential Ansatz" is assumed for the intermediately normalized wavefunction Ψ. Substituting Ψ =   Φ 0 into the Schrödinger equation, where  is a cluster operator, we get for some ℰ CC ∈ R. First, to determine ℰ CC we premultiply (2.5) by  − (  is always invertible), and take the inner product with Φ 0 to obtain the CC energy where we used the normalization ‖Φ 0 ‖ = 1.Second, by premultiplying (2.5) by  − again, but now testing against functions in H 1,⊥  , we get the Full CC (FCC) method: find cluster amplitudes  * ∈ V such that The projected CC method is the Galerkin projection of the FCC problem with respect to some subspace V  ⊂ V.More precisely, the task is to find  ,* ∈ V  such that For the moment, we denote the corresponding CC energy by ℰ ,CC .Again, V  is based on some truncation, such as SD, in which case the corresponding method is called "CCSD".We now discuss the relation between CI and CC.It was shown that the FCI (2.2) and the FCC (2.7) equations are equivalent, see Theorem 5.3 of [31].However, the corresponding Galerkin-projected problems are not equivalent.Further, while ℰ CI ≤ ℰ ,CI due to the Rayleigh-Ritz variational principle, the same is not true for the CC method and numerical experience undoubtedly shows that there is no obvious relation in general between ℰ CC = ℰ CI and ℰ ,CC ; this last phenomenon is called the nonvariational property of CC theory.Note that according to Theorem 2.2, FCC is variational.
Despite this, the gains of CC over CI are significant.First, by construction, the CC method is size-consistent, even when truncated (Theorem 4.10 of [33]).This property is crucial for the precise determination of various chemical properties.Second, the evaluation of expressions involving the similarity-transformed Hamilton operator  − ℋ  is greatly eased by the formula see Theorem A.1 of [33] 4 , where the iterated commutators are given by [ℋ,  ] (0 ,  ] for  ≥ 1. Equation (2.9) may be referred to as the terminating Baker-Campbell-Hausdorff series, and it makes the computer implementation of CC methods feasible even for moderately sized systems.
In particular, the Slater-Condon rules imply that the CC energy can be computed as5 Furthermore, (2.9) also implies that the polynomial system (2.7) (and hence its Galerkin projection (2.8)) is quartic in terms of the cluster amplitudes .Despite their apparent simplicity, the CC equations usually involve many complicated terms and even their assembly is a nontrivial task.In summary, the CC method approximates an extremely high-dimensional linear problem (2.2) by a low-dimensional nonlinear problem (2.8).

Coupled-Cluster discretization
Using an appropriate string of creation and annihilation operators, any fermionic state can be changed to any other one (see Part II, or [14,37]).In our context, a set of  occupied orbitals is given; its complement is called the set of virtual orbitals.The action of an excitation operator on a Slater determinant consists of annihilating a few occupied orbitals and creating the same number of virtual orbitals (hence the particle number  is conserved).A de-excitation operator amounts to the reverse action: annihilating some virtual orbitals and creating the same number of occupied ones.Obviously, any  -particle Slater determinant can be obtained by acting with an appropriate excitation operator on the "reference state", which is the  -particle Slater determinant of all the occupied orbitals.However, it might also be possible to arrive at the same Slater determinant from another state through successive excitations.The concrete relationships are nontrivial and this section is devoted to their description.

Excitation order
Let Λ be a countable set called the orbital set and let 2 Λ denote the power set of Λ.In concrete examples, we will often use the numbers Λ = {1, 2, 3, . ..} to label the elements of Λ for the sake of simplicity, and set  = |Λ|.Let  ≥ 1 denote the number of particles, and set  = { ∈ 2 Λ : || =  }, the elements of which are called ( -particle) states.The particle number  is assumed to be fixed throughout.Fix  ≥ 1 reference states Ω = {0 1 , . . ., 0  } ⊂ .
For every  = 1, . . .,  define and on it, the partial order relation  ⪯   ⇐⇒   ⊂   and   ⊂   for any ,  ∈   , where and the complement is to be understood relative to Λ.According to commonly used nomenclature, we call   the occupied part of  w.r.t.0  and   the virtual part of  w.r.t.0  .This partial order relation is a generalization of Definition 4.2 of [31].By definition,   = { ∈  : 0  ⪯  } and for the sake of convenience, we introduce the notations  =  Ω and   =   {0  }.Note that the reference states are defined not to be comparable with respect to ⪯  with each other.The partial order ⪯  generates the join and meet lattice operations for all ,  ∈   .Furthermore, we introduce the orthocomplementation  ⊥ = Λ .
For the so-called single-reference (SR) case,  = 1 and we will make the convention that all the  indices are dropped from the notation.For the next result, we extend ⪯, ∨ and ∧ to the whole 2 Λ .Proposition 3.1.The structure  = (2 Λ , ∨, ∧, 0, 1, ⊥) is a Boolean algebra, that is, a distributive, bounded lattice in which the de Morgan laws hold true.Here, we set 1 := Λ, the identity for ∧.
A similar statement holds true in the multi-reference (MR) case, for the individual structures Even though the algebraic structure on  is nice, the subset  loses this structure.In fact,  is not a sublattice of , since for example  ∨  ,  ∧   ̸ ∈  for distinct  and  with  =  = ∅.The reason why we stated Proposition 3.1, however, is because we will exploit the operational rules for ∨, ∧ and ⊥ on a few occasions; for instance, in the following trivial result.Lemma 3.2.Let ,  ∈ 2 Λ be such that  ⪯  .Then,  ∨   =  if and only if  =  ⊥ ∧  .
Proof.We have where in the last step we used  ⪯  .Further, if  ′ ∨   =  as well, then  ′ ∨   =  ∨  .By joining  ⊥ to both sides, we get  ′ = .
The poset (  , ⪯  ) also admits a rank function which makes it a graded poset.Being a graded poset means that there is a rank function rk  :   → N satisfies rk  () < rk  () whenever  ≺  , and rk  () = rk  () + 1 if there is no element  such that  ≺   ≺  .The choice rk  () = |  | is easily seen to satisfy the requirements.Obviously, the maximum value that rk  () can take is  .For a geometric description of the rank function, see Appendix B.

Excitation graphs
As we remarked in the previous section,   fails to be a sublattice of the Boolean algebra   .Therefore, let us consider pairs (, ) ∈   ×   for which  ∨   ∈   .In other words, pairs (, ) ∈   ×   for which we wish to avoid that possibility on physical grounds.Namely, such an operation would introduce a repeated virtual orbital in the state  ∨   and that is not allowed on the account of the Pauli exclusion principle.We note in passing that  ∨   ∈   is equivalent to  ∧   ∈   .In conclusion, we restrict our attention to the set Hence, if (, ) ∈ ℒ  , then we have  ∨   ∈   .The set ℒ  is symmetric to the diagonal (which it does not contain), i.e. (, ) ∈ ℒ  iff (, ) ∈ ℒ  and (, ) ̸ ∈ ℒ  .Also, (0  , ), (, 0  ) ∈ ℒ  for any  ∈   .Furthermore, the rank function rk  is additive on ℒ  in the sense that for any (, ) ∈ ℒ  .This property may also seen to be a reason why we want to exclude the case Indeed, it could also be taken as the definition of ℒ  .
Proposition 3.3.The set ℒ  can be written as which is what we wanted to show.

Transitivity of certain subgraphs, and of 𝐺 full
itself will come up later, since vaguely speaking this property will imply the algebraic closedness of the set of excitation operators that we attach to the edges (see Sects.3.3 and 3.4).
We label the edges of  full  with their corresponding .Thus, to every directed edge (,  ∨  ) ∈  full  there corresponds a map  , :   →   defined with the instruction  , () =  ∨  .This way, the digraph  full  may be interpreted as a commutative diagram (cf.Sect.3.3).Note that a label  , may appear on multiple edges.
Furthermore, for any subgraph   = (  ,   ), we introduce the set of excitations Ξ(  ) ⊂   of   via Note that the excitations are indexed with the same set   as the states themselves, but in general Ξ(  ) ̸ =   .Nonetheless, for the full excitation graph  full  , we have in fact Ξ( full  ) =   .The reason why explicitly stated that we are considering the "full" excitation graphs is that, in practice, one is forced to ignore the "degree of freedom" (called "cluster amplitudes", see Sect.3.5) corresponding to some edges 6 .This is done by considering certain subsets of the full edge set  full  .
Rank-truncation does not introduce isolated vertices in   ( 1 , . . .,   ) as long as one of the   's is 1.However, in the doubles (D) case, (D) does in fact produce isolated vertices so that vertices of odd rank cannot be reached.Also, note that these truncated subgraphs like   (S) and   (SD) are not transitive in general.
We shall summarize these observations in the next theorem.Recall that a digraph is said to be weakly connected if every pair of vertices has an undirected path between them.
Proof.Obvious from the definition.
Next, we briefly consider two rather "exotic" CC-like methods to demonstrate the generality of the excitation graph concept.
Finally, we define excitation graph in the multireference case, which is a natural extension of the above concepts.
Note that as opposed to the SR graph  full  , the MR graph  full might have parallel edges (called "redundant" excitations), this justifies that  full was introduced as a multigraph.Notice that other references cannot be "reached" from a given one (see Sect. 1).An algorithm for choosing the set of reference states Ω = {0  }  =1 in an optimal way, adhering to some given criteria is described in Appendix B.

Excitation operators
Recall that Ω = {0  }  =1 denotes the set of references, and that   does not contain the other reference states Ω {0  }.The construction described below is to be repeated for every  = 1, . . .,  separately.
First, we fix an ordering of the indices in  ∈ .Then, for every element  = { 1 , . . .,   } ∈  we assign the lexicographically ordered  -tuple Without loss of generality, we can assume that the orbital indices contained in 0  are strictly less than the virtual indicies Λ 0  .As in Section 2.1, fix an orthonormal set ℬ  = {  } ∈Λ ⊂  1 (R 3 ) and the corresponding Slater determinants Recall the notation H 1  ⊂ H 1 for the subspace spanned by B  ; which is allowed to be finite-, or infinitedimensional depending on  = |Λ|.Definition 3.15.Let   = (  ,   ) be a subgraph of  full  .The family of linear operators for each  ∈ Ξ(  ) and  ∈ , and extended boundedly and linearly to the whole space H 1  (see [31]) is called the family of excitation operators on   .Here, (, ) is the sign of the permutation (, ) that puts the  -tuple ((  ) < , (  ) < ) in lexicographical order.
It is important to note that in general excitation operators corresponding to different reference states do not commute: Here, |vac⟩ is the Fock vacuum state, { 1 , . . .,   } = 0  and { 1 , . . .,   } =  with  1 < . . .<   and  1 < . . .<   .In other words,   changes the orbitals 0  to , as expected.Although the excitation operators commute with each other, they do not commute in general with the Hamiltonian.
We now define a family of operators which "reverse" the action of  () .
Definition 3.19.Let   = (  ,   ) be a subgraph of  full  .For all  ∈ Ξ(  ), the linear operators ( for any  ∈ , and extended boundedly and linearly to the whole space H 1  , are called de-excitation operators on   .
It is highly important to stress that in general excitation-, and de-excitation operators do not commute with each other: , in other words, the  ()  's are nonnormal operators.Also, [( ] ̸ = 0 in general.This fact is the source of many technical obstacles in the analysis of the CC method, primarily because it implies that the similarity-transformed Hamilton operator (2.9) is nonnormal.

Cluster operators
From now on, we omit the reference index  from the notations, with the understanding that the considerations hold true for every reference independently.Suppose that we constructed the set of excitation operators {  } ∈Ξ() for a given consistent subgraph  = (, ).The completion of their linear hull is called the space of cluster operators on  endowed with operator norm ‖ • ‖ ℒ(H 1 ,H 1 ) .As mentioned earlier, if  is not the full excitation graph  full , then certain excitation operators will be absent and therefore, they will be missing from v() as well.
Proof.It is enough to prove that an arbitrary product of  +1 excitation operators is zero.In fact, by definition every excitation operator either increases the rank of a Slater determinant by at least 1 or maps it to zero.But the rank cannot increase above  , so the product must be zero.
It is well-known that the vector space v( full ) constructed on the full excitation graph  full forms a commutative algebra (see e.g.[33], Lem.4.2) with the usual multiplication (a subalgebra of the algebra of bounded linear operators ℒ(H 1  , H 1  )).According to Proposition 3.22, it is also nilpotent.More generally, we have Theorem 3.23.v() is a nilpotent, commutative algebra for any transitive excitation graph .
If, however,  is not transitive, then v() is not an algebra in general-for instance in v((SD)) there are no excitation operators of rank 3 and above, but the rank of the products of excitation operators can be arbitrary (≤  ).
Example 3.24.We observed in Example 3.11 that the CAS-subgraph (CAS) corresponding to the TCC method is transitive and consistent, hence v((CAS)) forms a subalgebra of v( full ) (cf. [17]).Similarly, for (int) in Example 3.12, v((int)) also forms a subalgebra.However, in a truncated setting, where only certain low-rank edges of (CAS) (or (int)) are retained, transitivity, hence the subalgebra property is lost.
Let now the excitation graph  = (, ) be arbitrary.A cluster operator  ∈ v() may be decomposed according to the excitation ranks of its constituent excitations as We say that  is of rank  if it contains excitation operators of rank at most .Note that the graded structure of  is compatible with this decomposition in the sense that if  and  are of ranks  and , respectively, then  is of rank at most  + .
Remark 3.25.In the SR case, the cluster operators can be used to express any wavefunction in H 1  if the full excitation graph  full is used for their construction.In fact, in this case,   Φ 0 = Φ  for every  ∈ , so we may express any function in H 1  through a linear combination of the excitation operators and the identity .More precisely, if for some scalars {  } ∈ .Recall that in Section 2.3 we assumed the intermediate normalization condition ⟨Ψ, Φ 0 ⟩ = 1, which implies  0 = 1.There is a one-to-one correspondence between functions Ψ ∈ H 1,⊥  and the cluster operators  Ψ defined as It is not clear, however, that  Ψ ∈ ℒ(H 1  , H 1  ).See Theorem 3.26 below for the precise statement of this nontrivial fact.Also, if the excitation graph does not contain every edge of the form (0, )-which is typically the case if some truncation is used-then it is not possible to assign a cluster operator (3.7) to every Ψ ∈ H 1,⊥  .The following important result makes the aforementioned correspondence between functions and cluster operators precise.Theorem 3.26.(Theorem 4.1 and Lemma 5.1 of [31]) Fix Ψ ∈ H 1,⊥ .Then, the following hold true.
(1) The cluster operator (2)  † Ψ ∈ ℒ(H 1 , H 1 ), and there is a constant  ′ > 0 independent of Ψ such that , and there cannot be a uniform lower bound in terms of ‖Ψ‖ H 1 .
Next, we consider the so-called exponential Ansatz, which is the representation and  ∈ v( full ).Here,   is simply a finite sum due to the nilpotency of  , i.e.
The inverse of the exponential should be the logarithm, as one would expect.Theorem 3.27.Lemma 5.2 of [31] For any cluster operator  ∈ v( full ) there exists a unique cluster operator  ∈ v( full ), such that   =  + .Furthermore, Moreover, the exponential map is a bijection between Furthermore, the result also holds true if It is important to note that if some proper excitation subgraph  = (, ) is considered instead of  full , the previous result does not hold.For instance, if (SD) is considered, then it might not be possible to represent  +  as   , where  ∈ v( full ) and  ∈ v().This in particular implies that wavefunctions of the form   Φ 0 where  ∈ v() is not the totality of intermediately normalized wavefunctions.
Then the JM expansion coefficients  ()  of Ψ  are simply   .

Cluster amplitude spaces
The linear combination coefficients of the excitation operators making up a cluster operator are called cluster amplitudes.Let ℓ 2 () denote Hilbert space of square summable real-, or complex-valued sequences indexed by the edge labels of the excitation graph , i.e.
It is clear that the space of cluster operators v() is canonically isomorphic to V() via As customary in CC theory, we will never explicitly denote this isomorphism, and instead use capital letters , , , ,  , etc. to denote the cluster operators and small letters , , , , , etc. to denote their corresponding cluster amplitudes.Furthermore, to every amplitude space V() there corresponds a functional amplitude space Clearly, an appropriate subset of the Slater determinant basis B  (see (3.4)) forms a basis of the functional amplitude space V().
Hence, the inclusion map  U : U → L 2 , given by  U Φ = Φ for all Φ ∈ U satisfies  † U = Π U .We continue by recalling an important notion due to [33].

Derivation of the Coupled-Cluster equations
In this section, we give derivations of the SRCC-, and a variant of the MRCC equations.The approach presented here is based on [27].We would like to stress that the discussion only applies to the full (that is, untruncated) CC methods.
The essence of the following theorem seems to be well-known in the physics and quantum chemistry literature, and the method itself is generally attributed to C. Bloch [6], who devised it in the context of perturbation theory.Theorem 4.1.Let H and L be (real or complex) Hilbert spaces so that they form a Gelfand triple: H ⊂ L ⊂ H * .Let ℋ : H → H * be a bounded operator.Let M, N ⊂ H be any pair of closed subspaces so that the following complementarity condition holds: Then the following are equivalent.
Using (4.2), this can be further written as The desired result follows by noting that ran Ξ † = N.
In practice, M (called the "exact model space") is unknown and N (called the "model space") is chosen in a way that it provides a "reasonable approximation" to M, i.e. that (4.1) holds.In particular, M ⊂ N ⊥ is not permitted.Then, the unknown "wave operator" Ξ (hence M) can be determined by solving the weak Bloch equation (4.2).Next, the eigenvalue problem for ℋ eff is solved to obtain the energies ℰ 1 , . . ., ℰ  and (some of the) eigenvectors.(i) It is important to note that solving the Bloch equation only provides a weakly ℋ-invariant subspace M and it might not be a direct sum of (weak) eigenspaces in general.In other words, M might be spanned by an incomplete set of eigenvectors.Clearly, in such a situation some of the eigenvectors cannot be recovered through solving the eigenproblem for the effective Hamiltonian ℋ eff .(ii) The Bloch equation (4.2) is more commonly given in the "strong" form "ΞℋΞ = ℋΞ".
The situation is greatly simplified, when one considers one-dimensional subspaces N and M, because a onedimensional invariant subspace is always an eigenspace.
The Jeziorski-Monkhorst method [16] uses the following Ansatz for the wave operator: which corresponds to (3.8).
extension to the multireference case (Def.3.14).Another advantage of our approach is that it avoids the use of second-quantized formalism and hence allowed us to prove the basic results (such as Thms.3.17 and 3.21) in a more transparent manner.Besides these, we also pointed out a number of structural properties of the excitation graph in Section 3.2.It is important to note that some of these graph-theoretic properties are reflected in the algebraic structure of the excitation operators (Thms.3.16 and 3.23).Some relevant combinatorial quantities have been calculated in Appendix A. Furthermore, we proposed an algorithm to determine the reference states in an optimal fashion for the multirefence case in Appendix B.
In Section 4, we provided unified and rigorous derivations of both the single-reference-(Sect.4.1), and a multireference (Sect.4.2) CC method.The derivations used a general theorem (Thm.4.1) motivated by a known method based on perturbation theory.

Appendix A. Properties of the excitation graph
Here, we restrict ourselves to the single-reference case ( = 1) and drop the subscript 's from the notation.Recall that  denotes the cardinality of the orbital set Λ. Given  ∈ , we introduce the set of paths of length  from 0 to  in , P  () = { ∈  × . . .×  : there is a path 0 →  in  having edges }.
The following theorem sheds light on the combinatorial structure of the excitation graph.where we used Vandermonde's identity once more.Next, we prove (v).We need to change 0 into  in  steps (edges).Suppose that the rank-increment of each step is  1 , . . .,   , and are such that  1 + . . .+   = .In the th step we replace letters ( 1 , . . .,    ) with ( 1 , . . .,    ).These choices can be done independently, so there are ! 2 possibilities.However, the order of the 's and 's is irrelevant in each step so we have to divide by ( 1 !• • •   !) 2 .Summing over all  1 , . . .,   gives the stated formula.x  ≥ 1,  = 1, . . ., where c ∈ Q  is a given rational cost vector. 8We warn the reader that the notation − → 0 for the vector representation of 0 is slightly colliding with − → 0 , the actual zero vector for the Hamming space.
Remark B.1.If c  = 0 for some , then we will automatically have x  = 1 in the solution, even if  H ( − →   , 2) does not cover.On the other hand, assigning a larger (resp.infinite) cost c  will likely (resp.surely) end up x  = 0 in the solution.
The above problem is called a "multidimensional knapsack problem" in the optimization community, which seems to be extensively studied.However, we just naively solve the BILP using general ILP methods available in Mathematica.In our experience, the BILP can be built up and solved in a small amount of time for practically relevant parameters  ,  and , even on an older machine.
Theorem 3.16.Let   be a consistent subgraph of  full  .Then,   (  ) ≡   ( full  ) for all  ∈ Ξ(  ).Proof.Fix  ∈ Ξ(  ), then by (3.3) and Definition 3.8, (,  ∨  ) ∈   for all  ∈   .Consequently,   (  )Φ  =   ( full  )Φ  for all  ∈ .Based on this result, if   is consistent, it is safe to drop the "  " from the notation   (  ) and simply denote the excitation operators by   , or by   in the SR case.However, it is important to note that for a given ,   in general for differing reference states  ̸ = ℓ, see Remark 3.13.The excitation operators enjoy nice algebraic properties which we summarize in the next theorem (cf.[31], Lem.2.5).3.17.Let   = (  ,   ) be a consistent subgraph of  full  and let {  } ∈Ξ() denote the set of excitation operators on   .Then the following properties hold true.