A semismooth Newton method for implicitly constituted non-Newtonian fluids and its application to the numerical approximation of Bingham flow

We propose a semismooth Newton method for non-Newtonian models of incompressible flow where the constitutive relation between the shear stress and the symmetric velocity gradient is given implicitly; this class of constitutive relations captures for instance the models of Bingham and Herschel-Bulkley. The proposed method avoids the use of variational inequalities and is based on a particularly simple regularisation for which the (weak) convergence of the approximate stresses is known to hold. The system is analysed at the function space level and results in mesh-independent behaviour of the nonlinear iterations.


Introduction
Let Ω ⊂ R d be a Lipschitz polygonal/polyhedral domain for d ∈ {2, 3} and consider the following system for the velocity u : Ω → R d , shear stress tensor S S S : Ω → R d×d sym and pressure p : Ω → R of an incompressible fluid: αu − div S S S + div(u ⊗ u) + ∇p = f in Ω, div u = 0 in Ω, (1a) where f : Ω → R d is given, and α ≥ 0 is e.g. a parameter that arises from an implicit time discretisation (α = 0 for the steady problem). In this work we will close the system above with an implicit constitutive relation of the form G G G(S S S, D D D(u)) = 0 a.e. in Ω, range of models and allows for a thermodynamically consistent analysis (see e.g. [39]). For example, the usual Newtonian and power-law models can be written in the form (1b): where τ * ≥ 0 is the yield stress, and ν > 0 is the viscosity; the closely related Herschel-Bulkley constitutive relation can be obtained e.g. by letting ν = ν(D D D(u)) := K|D D D(u)| r−2 , with K > 0 and r > 1. Such models can be used to describe waxy crude oil, toothpaste, paint, pastes, drilling muds, and mango jam, among other things [5,7,26,29]; see [4] for a nice survey on various aspects of the modeling and simulation of viscoplastic fluids. Note that, in contrast to the constitutive relations (2), the relation (3) where p + := max{p, 0} is the positive part of p. For a rigorous mathematical analysis of models describing implicitly constituted incompressible fluids, the reader is referred to [10,12]; in particular existence of weak solutions for large data is known to hold both for the steady and unsteady problems, under appropriate assumptions for G G G (see also [8] for a classification of various models that can be described within this framework). Regarding the numerical analysis, in [18] and [44] the authors established (weak) convergence of the finite element approximations to a weak solution of the system for the steady and unsteady problems, respectively. This was later extended to formulations including the shear stress S S S in [21], and an augmented Lagrangian preconditioner was proposed in [20].
In previous works dealing with the numerical analysis of implicitly constituted fluids a fixed point argument is employed to obtain existence of discrete solutions, and then the emphasis is placed on proving that convergence to a weak solution of the problem holds. However, there is barely any discussion on how to solve such discrete nonlinear systems. For smooth constitutive relations, such as (2), this is not an issue since the standard Newton's method can be applied, but it is an important matter when dealing e.g. with the Bingham constitutive relation. In practice, it is common to circumvent the non-differentiability by regularising; a popular regularisation is for example the one proposed in [6]: Regularisations such as (5) are very popular owing to their simplicity. However, a major drawback of this approach is that by and large there are no results that guarantee the convergence of the shear stress S S S ε → S S S, as ε → 0, and previous works suggest that this convergence might not actually hold in general (see e.g. [22,33,42]); one exception is the regularisation of the Bingham model introduced in [16], for which the authors prove convergence of the approximate solutions to a solution of the original problem. The main alternative, whenever an accurate approximation of the yield surfaces (i.e. the region of transition between the two regimes described by (3)) is essential, is given by the augmented Lagrangian method, which is based on a variational inequality formulation of the problem (see [26]). The main drawback of this method is that it requires the use of more sophisticated mathematical tools to be analysed and convergence can be slow; the search for ways to accelerate the algorithm is still a very active area of research (see e.g. [19]).
Here we propose the use of an alternative regularisation, introduced in [13], for which the convergence S S S ε S S S can in fact be established. This regularisation takes the very simple form The approach proposed here is conceptually very simple and avoids the use of formulations based on variational inequalities. Note that the regularised function G G G ε1 ε2 inherits the smoothness of the original constitutive relation, and so in general it will not be differentiable (it brings however other advantages that will be discussed later). For this reason a semismooth version of Newton's method will be required.
In the classical version of Newton's method for solving the equation F (z) = 0, where F : Z → X is a function defined between two Banach spaces Z and X, one applies iteratively the following steps until a convergence criterion is satisfied: (1) Solve the linear problem: With the appropriate hypotheses it is well known that the iterates produced by this method converge (locally) quadratically to the solution of the problem. In the semismooth version of the algorithm one has to replace DF (z k ) in step 1 above by an element of the generalised Jacobian of F . Then, assuming some regularity conditions, it is possible to prove that the iterates converge at least (locally) superlinearly to the solution (the required hypotheses will be described in detail in subsequent sections).
After introducing the relevant weak and finite element formulations in Section 2, we will proceed in Section 3 to write the problem (1a) with the regularised constitutive relation (6) in terms of a function F between appropriate Banach spaces and formulate a semismooth Newton algorithm that, with appropriate assumptions, can be shown to fall within the framework of semismooth Newton methods in function spaces of [49]; in particular, we will see that some expressions describing the Bingham constitutive relation are more advantageous. The analysis is carried out at the function space level in order to avoid mesh-dependent behaviour. Numerical experiments focusing on the Bingham constitutive relation that support this result are carried out in Section 4; these experiments employ the finite element method and a stress-velocity-pressure formulation.
Semismooth Newton methods have been applied before in the context of viscoplastic flow; semismooth Newton methods were analysed in [16,17] for the solution of certain variational inequalities related to the description of Bingham flow, and so the approach is distinct to the one presented here (as mentioned above, one of the advantages the framework presented here is precisely that it avoids the formalism of variational inequalities). A non-smooth Newton method was developed for a different formulation describing Herschel-Bulkley fluids with r ∈ (1, 2) in [41], but, as opposed to the present work, some mesh-dependence was observed in the solver. In addition, the analysis from [41] is closely tied to the particular structure of the Herschel-Bulkley relation, which is a constraint not present here: the framework employed here captures a wider variety of models; for instance, given the implicit nature of the constitutive relation, one could very easily consider stress-dependent viscosities or even swap the roles of the shear stress S S S and the symmetric velocity gradient D D D(u), and obtain models describing inviscid (i.e. Euler) fluids for low values of |D D D(u)|, and viscous otherwise (see [8] for a more detailed description of related models).

Implicitly constituted fluids
Throughout this work we will employ standard notation for Sobolev and Lebesgue spaces (for example (W 1,r (Ω), ||·|| W 1,r (Ω) )). We define the space W k,r 0 (Ω), with k ∈ N, r > 1, to be the closure of the space of compactly supported smooth functions C ∞ 0 (Ω) with respect to the norm ||·|| W k,r (Ω) . In addition, for r ∈ (1, ∞) we define the following subspaces: The operator tr(τ τ τ ) above denotes the usual trace of the matrix function τ τ τ . The Frobenius inner product between matrices will be written as τ τ τ : σ σ σ := tr(τ τ τσ σ σ ). Suppose that f ∈ L r (Ω) d . In order to focus solely on the difficulties brought by the nonlinear constitutive relation, we will neglect the convective term in the subsequent analysis. Therefore, in the weak formulation of the problem we look for (S S S, u, p) ∈ L r sym,tr (Ω) d×d × V r × L r 0 (Ω), where V r := W 1,r 0,div (Ω) d ∩ L 2 (Ω) d and r is the Hölder conjugate of r, such that In the existence analysis of systems such as (7), the assumptions on the constitutive relation are traditionally written in terms not of the function G G G, but of its induced graph on R d×d sym × R d×d sym . In this work we will prefer to work directly with the function G G G, following the approach first proposed in [13], which leads to conditions that are easier to check in practice.
The assumptions (G1)-(G4) imply that G G G induces a maximal monotone r-coercive graph on R d×d sym × R d×d sym (see [13,Def. 3.1]) and so they guarantee the existence of a weak solution to the problem [13,Thm. 2.2]. It can be seen that the Bingham relation (4) and relations of power-law type like (2) satisfy the assumptions above (see e.g. [13,48]). The existence result obtained in [13] relies on an approximation scheme based on solving the problem associated to the approximate constitutive relation (6) and then passing to the limit (ε 1 , ε 2 ) → 0; in particular, one has that the approximate solutions (S S S ε1,ε2 , u ε1,ε2 , p ε1,ε2 ) converge weakly to a weak solution (S S S, u, p) of the system. From a theoretical point of view the regularisation (6) is advantageous because the resulting approximate graph is strictly monotone and 2-coercive, thus leading to an approximate problem with a Hilbert space structure. Figure 1 shows a plot of the regularised constitutive relation (6) for the Bingham and Herschel-Bulkley models for different values of ε = ε 1 = ε 2 .
Remark 2.2. In the theoretical analysis carried out in [13], the parameters ε 1 and ε 2 were taken to be equal: This choice was made because having two distinct parameters did not bring any advantages to the analysis [13,Rmrk. 4.2]. Here we prefer to introduce the regularisation in the form (6) to emphasise the fact that the two parameters have different units: ε 1 has units of viscosity and ε 2 has units of fluidity; otherwise, starting directly from (8) can give the impression that one is commiting a "dimensional crime". Furthermore, when performing numerical computations it is not clear whether different choices for ε 1 and ε 2 could result in better behaviour. However, for the sake of simplicity we will employ the form (8) in the rest of this work.

Finite element approximation
We will now introduce the necessary notation to be able to define the finite element formulation of the system. Since the constitutive relation is implicit, a natural approach is to work with a 3-field formulation that includes the shear stress S S S (such a formulation was analysed in [21]). Let {T n } n∈N be a family of shape-regular triangulations such that the mesh size h n := max K∈Tn h K vanishes as n → ∞. We then define the following conforming families of finite element spaces: where P S (K), P V (K), and P M (K) are spaces of polynomials defined on the element K ∈ T n . We will also introduce the following useful subspaces:

|S S S|
Assumption 2.5 (Fortin Projector Π n Σ ). For each n ∈ N there is a linear projector Π n Σ : L 1 sym (Ω) d×d → Σ n such that: • (Preservation of divergence). For any σ ∈ L 1 sym (Ω) d×d we have that • (L s -stability). For every s ∈ (1, ∞) there is a constant c > 0, independent of n, such that: Assumption 2.6 (Fortin Projector Π n V ). For each n ∈ N there is a linear projector Π n V : W 1,1 0 (Ω) d → V n such that the following properties hold: • (Preservation of divergence). For any v ∈ W 1,1 0 (Ω) d we have that • (W 1,s -stability). For every s ∈ (1, ∞) there is a constant c > 0, independent of n, such that: Assumption 2.7 (Projector Π n M ). For each n ∈ N there is a linear projector Π n M : L 1 (Ω) → M n such that for all s ∈ (1, ∞) there is a constant c > 0, independent of n, such that: An important consequence of Assumptions 2.5 and 2.6 is inf-sup stability; i.e. we have that for any s ∈ (1, ∞) there exist two constants β s , γ s > 0, that do not depend on n, such that There are several families of velocity and pressure pairs V n -M n that satisfy the assumptions above. We can for example mention the Taylor-Hood P k -P k−1 and MINI elements (see e.g. [9,25]). The Scott-Vogelius element P k -P disc k−1 is also known to satisfy the assumptions, for instance, on barycentrically refined meshes [36]; this element has in addition the useful property that discretely divergence-free velocities are in fact also pointwise divergence-free. Regarding the shear stress, if the velocity element consists of globally continuous piecewise polynomials of degree k ∈ N, then an example of a space satisfying Assumption 2.5 is given by (see e.g. [21]): In the finite element formulation of the (regularised) problem we then look for functions (S S S n , u n , p n ) ∈ Σ n sym × V n × M n 0 such that: The Assumptions 2.4-2.7 and (G1)-(G4) guarantee that the discrete problem (12) has a solution, and that, as n → ∞, the sequence of discrete solutions converges (weakly) to a solution of the continuous problem (see [21,Rem. 3.8]). The rest of the paper will focus on how to solve the nonlinear problems described in this section by means of a semismooth Newton method.

Semismooth functions
Semismoothness is a concept originally introduced in [32] for the analysis of finite-dimensional nonsmooth optimisation problems (see also [34,35]); it was helpful in the development of a generalised Newton method for functions that are not Fréchet-differentiable.
Suppose F : R m → R k is a locally Lipschitz function. Since F is (by Rademacher's theorem) differentiable everywhere except maybe on a set U F of zero measure, one can define a generalised notion for the Jacobian of F at any point z ∈ R m , called Clarke's differential, as where co A denotes the convex hull of the set A. It is known that Clarke's differential ∂F (z) is nonempty, compact and convex, and that when F is continuously differentiable then ∂F (z) = {∇F (z)} (for more details the reader is referred to [15]). We then say that F is semismooth at z if it is directionally differentiable and An important consequence of this definition is that if F is a locally Lipschitz and semismooth function in a neighborhood around z * ∈ R m , where F (z * ) = 0, and ∂F (z * ) is nonsingular, then the generalised Newton iteration where M j ∈ ∂F (z j ) is arbitrary, converges superlinearly to z * , provided |z 0 − z * | is sufficiently small [14]. At the infinite dimensional level, where F : Z → X is now a function defined between two Banach spaces Z and X, Rademacher's theorem does not apply and therefore a characterisation of a generalised gradient such as (13) is not available. Thus, one has to work with a generally weaker and more abstract notion of generalised Jacobians. Let U be a neighborhood of z ∈ Z, and suppose ∂F : Similarly to the finite-dimensional case, one has that if F is continuously differentiable then it is {F }semismooth. Furthermore, as stated in the following proposition, the semismooth Newton algorithm also leads to superlinear convergence. Suppose that the function F : Z → X is ∂F -semismooth at z * and that F (z * ) = 0. Suppose that there exists a positive constant c, such that for every z ∈ U and every M ∈ ∂F (z), M is invertible with M −1 L(X;Z) ≤ c. Then the generalised Newton-Kantorovich iteration defined by (15) converges superlinearly to z * for all z 0 sufficiently close to z * .
We will now formulate a semismooth Newton method for the problem (1a). Define for r ∈ (1, ∞) the mapping where q > 1 and G G G denotes the Nemytskii operator associated to G G G. Observe that the second and third entries in the definition of F are given by linear and bounded operators, and so the semismoothness (or lack thereof) of F depends solely on that of G G G. This is advantageous because from this one immediately obtains a candidate for the multivalued function ∂F from Proposition 3.1. Namely, for (S S S, u, p) ∈ Z, the generalised Jacobian ∂F (S S S, u, p) will consist of the linear operators M ∈ L(Z; X) of the form where [d 1 , d 2 ] is a measurable selection of ∂G G G(S S S(·), D D D(u(·))), with ∂G G G being the Clarke generalised gradient of G G G (cf. [50,Def. 3.40]). The main algorithm can be summarised as follows: , and for each k = 0, 1, 2, . . . perform the steps: (1) If F (S S S k , u k , p k ) = 0, then terminate with (S S S k , u k , p k ); (2) Choose an arbitrary element M k ∈ ∂F , where ∂F is defined through (18); The continuity of Nemytskii mappings between Lebesgue spaces has been well studied and it is known (see e.g. [27,Thm. 4]) that if G G G satisfies, for some q > 1, the growth condition where c,c > 0, then the Nemytskii map is continuous. Suppose for a moment that the partial derivatives are continuous, and define If we assume that the following growth conditions hold: where c i ,c i are non-negative constants, and then the operators are continuous, which in turn implies that the corresponding Nemytskii maps are continuously differentiable (see e.g. [27,Thm. 7]), and the derivatives In summary, if one assumes that the partial derivatives of G G G are continuous, and they satisfy the growth assumptions (22), then the Nemytskii map (20) is Fréchet-differentiable (and therefore so is F ), which would make the use of a Newton-Kantorovich algorithm possible. Note that the number q defined (21) is smaller than both r and r ; this is an example of a norm gap, which is a phenomenon that appears in other contexts such as optimal control theory (see e.g. [47]). As mentioned in the introduction, we wish here to work with Bingham-like constitutive relations that do not satisfy the differentiability requirement just described, and we therefore wish to have at our disposal a similar characterisation for semismoothness of superposition operators. The first results in this direction were obtained in [49] assuming Lipschitz continuity, which is too restrictive for our setting. Thankfully, this assumption was relaxed in [43], leading to conditions that guarantee semismoothness that are more widely applicable. The following lemma, consequence of [43, Prop. A.1], summarises the required assumptions in the current setting. ∞) is arbitrary, and let q ≥ 1 be defined by (21). Suppose that G G G : R d×d sym × R d×d sym → R d×d sym is a locally Lipschitz function satisfying the growth assumption (19), with partial derivatives ∂ σ σ σ G G G, ∂ τ τ τ G G G satisfying (22).
where c 1 , c 2 are positive constants, and s, p are defined by (23); assume further that M can be written as the a. Proof. First note that the existence of the partial derivatives ∂ σ σ σ G G G, ∂ τ τ τ G G G is guaranteed by Rademacher's theorem. Define now the local Lipschitz constants of G G G as where σ σ σ,σ σ σ, τ τ τ ,τ τ τ ∈ R d×d sym ; then the growth assumption (22) can be reframed as defined for (x, σ σ σ, τ τ τ ) ∈ Ω × R d×d sym × R d×d sym , induces a continuous Nemytskii operator, which in particular implies the desired condition (27).
In practice, the function M in Lemma 3.3 arises from derivatives of G G G, and so the growth conditions (22) and (26) are redundant. This is the case for instance with the generalised Jacobian defined by (18). After having identified the necessary assumptions to guarantee semismoothness, it only remains to examine the solvability of the linearised system.
is a measurable selection of ∂G G G(S S S(·), D D Du(·)), such that where c is a positive constant.
An immediate consequence of Proposition 3.1 is the following convergence result.
Theorem 3.5. Let G G G : R d×d sym × R d×d sym → R d×d sym be a function satisfying assumptions (G1)-(G4) for some r > 1. Suppose that the growth conditions (19) and (22) hold, and that Assumption 3.4 is satisfied. Let (S S S, u, p) ∈ L r sym (Ω) d×d × V r × L r 0 (Ω) be such that F (S S S, u, p) = 0, where F is defined by (17). Then the iterates produced by Algorithm 3.2 converge superlinearly to (S S S, u, p), for any initial guess (S S S 0 , u 0 , p 0 ) sufficiently close to (S S S, u, p).
In some cases higher regularity results allow for less restrictive growth conditions for G G G, meaning that a larger q could be selected in (17). For instance, the strict monotonicity (9) might result in higher regularity for the problem associated to the regularisation G G G ε defined by (6), namely (S S S ε , u ε ) ∈ W 1,min{2,r } sym (Ω) d×d × W 2,min{2,r} (Ω) d ; in that case one could e.g. guarantee the continuity of the Nemytskii operator G G G ε with the more general growth condition where p * := pd d−p denotes the Sobolev exponent of p (it is defined as infinity when p = d). Without assuming any sort of strict monotonicity of the constitutive relation, it might prove difficult to solve the linearised system (31) from Assumption 3.4. In this case the use of the regularisation (6) also brings an advantage: the monotonicity condition (9) implies by the implicit function theorem that the partial derivatives of G G G ε are uniformly positive definite, and so the matrix function A ε := −(d ε 1 ) −1 d ε 2 is uniformly positive definite, which means that the system (31) can be rewritten as a Stokes-type system: whereH H H := (d ε 1 ) −1 H H H ∈ L q (Ω) d×d . If q ≥ max{r, r }, the system (34) can be solved by standard energy arguments, and the required estimate (32) is for instance a consequence of [11,Thm. 1.4]. In the opposite case q ∈ (1, max{r, r }) the situation becomes slightly more delicate since the term divH H H, with H H H ∈ L q sym (Ω) d×d and q small, does not allow the use of standard energy estimates. The results from [11] still guarantee the existence of a solution, but the estimate takes the form In this case, the addition of a smoothing step to Algorithm 3.2 that takes the solution back to (T T T ε , v ε , m ε ) ∈ L r sym (Ω) d×d × W 1,r 0 (Ω) d × L r 0 (Ω) would be necessary (see e.g. [50] for more details on smoothing operators).
Corollary 3.6. Let G G G : R d×d sym × R d×d sym → R d×d sym be a function satisfying (G1)-(G4) for some r > 1, and let defined through (17) with G G G replaced by G G G ε . Then, if q ≥ max{r, r } (recall (33)), the iterates produced by Algorithm 3.2 (employing F ε instead of F ) converge superlinearly to (S S S ε , u ε , p ε ), for any initial guess (S S S ε 0 , u ε 0 , p ε 0 ) sufficiently close to (S S S ε , u ε , p ε ). If q < max{r, r } the same result holds if a smoothing step is included in Algorithm 3.2.
Remark 3.7. While the previous discussion dealt with the continuous problem (1), the exact same arguments regarding e.g. continuity could be applied to a weakly enforced form of the constitutive relation and so in turn also to the finite element discretisation (12). At the discrete level all norms are equivalent, which would make it straightforward to obtain estimate (32), without needing sophisticated results like the ones from [11]. However, it is likely that this would then result in mesh-dependent behaviour of the iterations; it is for this reason that we endeavoured to carry out the analysis at the function space level.
Remark 3.8. The convective term can be incorporated into the analysis in a similar manner as is done for the Newtonian problem. For example, if we employ a Picard-like linearisation of the form where u ∈ V r is the current solution and φ ∈ C ∞ 0 (Ω) d is a test function, the solvability of the linearised system remains intact because of the skew-symmetry property B[u; v, v] = 0 for all v ∈ V r (but note that one has to assume that r is large enough so that choosing φ = v is allowed), which is a consequence of the divergence-free condition div u = 0. If one wishes to employ instead the form of the convective term that would arise from Newton linearisation: the situation is slightly more delicate, and even in the Newtonian case it is necessary to assume e.g. that the viscosity is large enough in order to guarantee that the linearised system can be solved (see e.g. [25,Lm. 3.2]).

The Bingham constitutive relation
In this section we will look at some explicit examples of constitutive relations that fall into the framework described in the previous section. Namely, we will show that the Bingham constitutive relation can be described by means of a semismooth function G G G that also satisfies appropriate growth conditions. First, we claim that the relation (4b) defining a Bingham fluid is semismooth. Note that, excepting the first term, all the terms appearing in the following function are smooth. Therefore we only need concern ourselves with the function The function h is locally Lipschitz, and its (Clarke) generalised Jacobian is given by where I I I denotes the fourth order identity tensor. Since for τ τ τ = 0 the function h is actually smooth, we only need to prove semismoothness at τ τ τ = 0: The function (36) is therefore semismooth and its generalised Jacobian is given by For the Bingham constitutive relation we have r = 2 and so the solution to the problem belongs to (S S S, u, p) ∈ L 2 sym (Ω) d×d × W 1,2 0 (Ω) d × L 2 0 (Ω). On the other hand, a simple application of Young's inequality yields the growth conditions for G G G and its derivatives: |∂ σ σ σ G G G(σ σ σ, τ τ τ )| ≤ c(|σ σ σ| + |τ τ τ |) +c, (40) |∂ τ τ τ G G G(σ σ σ, τ τ τ )| ≤ c(|σ σ σ| + |τ τ τ |) +c, where c,c > 0. The growth conditions (40) alone do not suffice to be able to apply the results from the previous section, since they only imply that the Nemytskii operator (20) is continuous into L 1 sym (Ω) d×d , which brings great difficulties since it is unlikely that one can solve the linearised system (31) with a uniform bound on the solution, if the linearised system is at all solvable (recall the lack of monotonicity).
Thankfully, at this point we can make use of regularity results that guarantee the boundedness of gradients for Bingham and Herschel-Bulkley models [23,24]. Noting also that the mapping D D D(u ε ) → S S S ε defined by the relation (6) is Lipschitz continuous [30,Lm. 4.5], we obtain also boundedness of the stresses, which implies that the corresponding Nemytskii operator G G G ε will be continuous into L 2 sym (Ω) d×d . Moreover, thanks to the uniform monotonicity condition (9), it is possible to solve the linearised system in a straightforward manner, which implies that Corollary 3.6 can be applied to obtain convergence of Algorithm 3.2 (note that G G G ε is semismooth, because it is a composition of a linear transformation with G G G). Remark 3.9. Strictly speaking, the expression (36) does not exactly represent the constitutive relation, since when |τ τ τ | = 0, the equation G G G(σ σ σ, τ τ τ ) = 0 is satisfied for arbitrary σ σ σ ∈ R d×d sym , and not just |σ σ σ| ≤ τ * as required by the Bingham relation (3). However, the regularised relation G G G ε does not exhibit this unwanted behaviour; e.g. Figure 1 was obtained using this expression. Numerical experiments in Section 4 will also show that the semismooth Newton algorithm will perform well using the regularised version of (36).
The following is yet another expression that can be used to describe the Bingham relation: The expression (41) in fact satisfies a less restrictive growth condition which e.g. guarantees the continuity of the Nemytskii operator G G G into L 2 sym (Ω) d×d without the need for regularity results, but unfortunately it also fails to be semismooth. This shows that different expressions for the same constitutive can relation result in different behaviour, and so it would be worthwhile in future research to look for the most advantageous ways of writing the constitutive relation.
Remark 3.10. The Herschel-Bulkley relation can be analysed in a similar manner. For instance, we could use the following expression: Since r > 1, the function τ τ τ → |τ τ τ | r−1 τ τ τ is actually differentiable, and so by the same arguments employed above, the function (43) is semismooth. As for the growth condition, one can e.g. see that the regularised expression satisfies |G G G ε (σ σ σ, τ τ τ )| ≤ c(|σ σ σ| max{r,2} + |τ τ τ | max{r,2} ) +c c,c > 0. (44) Then, for instance if we look at the shear-thinning case r < 2, the natural higher integrability (S S S, D D D(u)) ∈ L 2 * sym (Ω) d×d × L r * sym (Ω) d×d would suggest we rewrite (44) as Therefore, if we assume e.g. that r ≥ 3d d+2 (which ensures that r * 2 ≥ r ), Corollary 3.6 implies the convergence of Algorithm 3.2 without the need for a smoothing step. We note that the condition r ≥ 3d d+2 is advantageous when analysing the convergence of finite element approximations for the system including the convective term (see e.g. [18]).
Remark 3.11. The method presented in this work has the advantage that it works directly with the basic formulation conventionally derived in continuum mechanics: balance laws in the form of partial differential equations (1a) supplemented with a constitutive relation in the form of a pointwise algebraic constraint (1b). This leads to a lot of flexibility in the choice of constitutive relations; for instance, one could easily capture with the general form (1b) constitutive relations of the type In contrast, the variational inequality formulation makes heavy use of the form of the potential, e.g. for the Bingham model, Ω (ν|D D D(u)| 2 + τ * |D D D(u)|) and it is not immediately clear how one should modify the formulation to capture a relation such as (46). In addition, the variational inequality formulation is itself derived as a consequence of the laws (1), and in some cases it is not completely obvious that a solution of the variational inequality is also a solution of the weak form of (1), e.g. in the three-dimensional unsteady Bingham model including inertia, where testing with the velocity u is not allowed.

Numerical examples
In this section we implement Algorithm 3.2 with the regularisation (6) applied to the Bingham constitutive relation expressed via (4b).
All the examples were implemented using Firedrake [40]. The semismooth Newton iterations are carried out until the Euclidean norm of the residual falls below 1×10 −9 , unless specified otherwise; The linear systems were solved with the sparse direct solver from MUMPS [2]. Firedrake makes use of the Unified Form Language [1], which automatically chooses an element of the generalised derivative of the positive part f + of a function f as:

Flow between two plates
Consider the problem posed on the domain Ω = (0, 4) × (−1, 1) with f = 0. The following function solves the Bingham problem exactly (cf. [20,28]): The boundary data are then chosen so as to match the given expression above. Even though there are many families of finite element methods satisfying Assumptions 2.4-2.7 (recall the discussion in Section 2.2), which in theory lead to a convergent scheme for a big class of implicitly constituted fluids, in the particular case of viscoplastic flow it has been observed in practice that schemes for which the shear stress S S S is approximated by piecewise constant functions are more appropriate (see e.g. [46]). For instance, when using the Taylor-Hood P 2 -P 1 element, oscillations that spoil the overall convergence of the nonlinear iterations might appear; Figure 2 shows e.g. such oscillations in the solution obtained using a mesh with 3.6 × 10 4 degrees of freedom. For this reason, in the rest of this work we will employ instead a stabilised P 1 -P 1 element for the velocity and pressure; i.e. the finite element spaces will be chosen as (recall (11)): and the following stabilisation term is added to the right-hand-side of the discrete equation (12c): In all the numerical experiments, the phrase "degrees of freedom" (or "#dofs") refers to the total number of unknowns associated to the stress, velocity, and pressure. (a) Pressure obtained with the Taylor-Hood P 2 -P 1 element. (b) Pressure obtained with the stabilised P 1 -P 1 element. Figure 2. Pressure for the flow between two plates with ε = 0.001 and 3.6 × 10 4 dofs.
In this example we set τ * = 1, and continuation was employed to obtain better initial guesses for the nonlinear iterations; in particular, the problem was solved for the values ε ∈ {0.5, 0.0166, 0.001, 0.0001}, with each solution being used as the initial guess in the iterations, for the next smaller value of ε. Figures 3 and 4 show the L 2 and H 1 error of the velocity as ε and the mesh size decrease, respectively; it can be observed that for values  (Ω) ε = 0.5 ε = 0.0166 ε = 0.001 ε = 0.0001 (b) Velocity error as the mesh size decreases. less than ε = 0.001, it is the discretisation error that dominates. Figures 5 and 6 show the decrease in the residual norm as the number of semismooth Newton iterations increases. The different values of ε and the mesh size result in very similar behaviour for the residual, and in particular we see that the solver does not seem to exhibit any mesh-dependence. For the sake of comparison, Figure 7 shows the decrease in the residual norm when using the alternative expression (4a) for the constitutive relation; recall that, as seen in Section 3.2, this expression does not define a semismooth function. To ensure convergence of the nonlinear iterations, in this case we had to employ a continuation scheme with respect to ε: given two previously computed solutions z 1 , z 2 , corresponding to the values ε 1 , ε 2 , respectively, the initial guess for the semismooth Newton iteration for ε is chosen as Although convergence is achieved, the behaviour is somewhat more erratic in comparison to that obtained using (4b), and it takes a higher number of iterations and continuation steps for the solver to converge. This supports the results from Section 3.2, which suggest that the expression (4b) should be more advantageous.

Lid-driven cavity
The lid-driven cavity problem has been used as a test for many numerical schemes throughout the years (see e.g. [3,45]). In this example we solve the unsteady problem on the domain defined by the square Ω = (0, 1) 2 ; the following boundary conditions are imposed on the velocity field for all times t ∈ [0, T ]:  (Ω) ε = 0.5 ε = 0.0166 ε = 0.001 ε = 0.0001 (b) Velocity error as the mesh size decreases.  Table 1. Strength and vertical coordinate of the central vortex for the lid-driven cavity problem The time derivative is discretised using implicit Euler's method with a time step ∆t, and the goal is to compute the steady state of the system. In this example we use an absolute tolerance for the nonlinear iterations of 10 −7 , and we deem the solution to have reached a steady state when the difference between two consecutive solutions u k+1 − u k L 2 (Ω) is less than 10 −6 . Figure 8 shows comparative plots of the magnitude of the symmetric velocity gradient D D D(u) and the stress S S S of the steady state, for two different values of the yield stress, obtained using a discretisation with 4.5 × 10 5 degrees of freedom. The scheme yields a solution that exhibits the expected behaviour, i.e. a plug region appears in the upper half of the cavity, and it increases in size as τ * increases. Table 1 shows, for several values of the yield stress τ * , the maximum of the stream function ψ max , and the vertical coordinate of the center of the vortex that appears in the middle of the cavity. The obtained values are in good agreement with [45].

Expansion-contraction channel
In this section we consider the flow of a Bingham fluid on a channel whose width expands and then contracts; the geometry of the problem is depicted in Figure 9. The problem has already been non-dimensionalised, and in the current setting the Bingham number Bn takes the role of the yield stress; for details about the nondimensionalisation procedure see [31], where this problem was solved employing the augmented Lagrangian method and a finite difference discretisation.
The boundary condition for the velocity at the inflow and outflow boundaries ∂Ω 1 ∪ ∂Ω 2 is chosen to be a fully developed Poiseuille flow (cf. Section 4.1). On the rest of the boundary ∂Ω \ (∂Ω 1 ∪ ∂Ω 2 ) we impose a  Table 2. Length of the dead zone for the expansion-contraction problem withL = 3, δ = 1 5 , h = 6 5 , for several values of the Bingham number.
no-slip condition u = 0. Figure 10 shows the magnitude of D D D(u) (using a logarithmic scale for the color map) for the solution obtained using a mesh with 330819 degrees of freedom, and the valuesL = 4, δ = 1 2 , h = 2. Figure 11 then shows the variation of the pressure in the horizontal direction; the pressure exhibits a linear variation before and after the cavity, as expected from a Poiseuille profile.
As the height h gets closer to 1, the plug in the cavity breaks into two dead zones; Figure 12 shows two different solutions that exhibit this feature. Table 2 shows the (non-dimensionalised) length of the dead zone for several values of the Bingham number. Although the interfaces between yielded and unyielded regions is not as sharp as those obtained with the augmented Lagrangian method in [31], we find that the proposed algorithm is still capable of producing a behaviour consistent with [31], including e.g. the small plug regions that appear as the channel expands or contracts (cf. Figure 12