One Ciphertext, Many Messages: SIMD operations in FHE

FHE

Jul 14, 2025

simd-in-fhe-image

Fully Homomorphic Encryption (FHE) promises a revolutionary capability: performing arbitrary computations on encrypted data without ever decrypting it. This breakthrough enables secure cloud computing where sensitive data remains protected even during processing. However, early FHE schemes faced a critical bottleneck. Consider a simple task like adding two encrypted vectors of length 1000. This would require 1000 separate homomorphic additions, each involving expensive operations on large polynomials. For real-world applications processing massive datasets, this element-by-element approach creates prohibitive computational overhead.

Modern FHE schemes like BGV, BFV and CKKS solve this challenge through an elegant mathematical insight: they can pack multiple plaintext values into a single ciphertext and perform operations on all packed values simultaneously.

Instead of encrypting individual numbers, these schemes:

  • Pack hundreds or thousands of values into "slots" within a single ciphertext
  • Perform homomorphic operations that affect all slots in parallel
  • Support sophisticated data movement operations like rotations and permutations

This Single Instruction, Multiple Data (SIMD) approach transforms FHE from a theoretical curiosity into a practical tool. A single homomorphic multiplication can now process an entire vector at once, delivering speedups of several orders of magnitude.

This post explores the mathematical foundations underlying these SIMD operations. We'll see how abstract concepts from algebra and number theory combine to create a powerful framework for parallel homomorphic computation. While the mathematics is sophisticated, understanding these foundations is crucial for implementing efficient FHE applications and pushing the boundaries of what's possible with encrypted computation.

Mathematical Foundations

This post assumes you're comfortable with the basics of groups and rings, things like the distributive laws and fundamental properties of rings with identity. If you've worked with elementary group theory, you should be in good shape.

Ideals

Definition. Let RR be a ring. A subset IRI \subseteq R is called an ideal when:

  1. (I,+)(I, +) forms a subgroup of (R,+)(R, +)
  2. For any rRr \in R and aIa \in I, both r×ar \times a and a×ra \times r belong to II

That second condition is what makes ideals special, they "absorb" multiplication by any element in the ring. It's like having a mathematical black hole that pulls in anything you multiply with it.

Example. In the integers Z\mathbb{Z}, consider the set nZ={nkkZ}n\mathbb{Z} = \{nk \mid k \in \mathbb{Z}\} of all multiples of nn. This forms an ideal because multiplying any multiple of nn by any integer gives you another multiple of nn.

Principal Ideals

A principal ideal is the simplest kind, one generated by a single element. Given aRa \in R, we can create the ideal

(a)={r×arR}.(a) = \{r \times a \mid r \in R\}.

This captures the smallest ideal that contains aa. It's everything you can get by multiplying aa by elements from the ring.

Building New Rings: Quotients

Given a ring RR and an ideal II, we can construct a brand new ring called the quotient ring R/IR/I.

The elements of R/IR/I are cosets, think of them as equivalence classes a+I={a+iiI}a + I = \{a + i \mid i \in I\} for aRa \in R. We define operations on these cosets by:

  • Addition: (a+I)+(b+I)=(a+b)+I(a + I) + (b + I) = (a + b) + I
  • Multiplication: (a+I)×(b+I)=(a×b)+I(a + I) \times (b + I) = (a \times b) + I

Examples:

  1. Z/nZ\mathbb{Z}/n\mathbb{Z}: This gives us the familiar "clock arithmetic" - integers modulo nn with elements {0,1,2,,n1}\{0, 1, 2, \ldots, n-1\}.

  2. Polynomial quotients: Taking R=F[X]R = \mathbb{F}[X] and I=(f(X))I = (f(X)), we get polynomials modulo f(X)f(X) - essentially polynomial arithmetic where we replace high powers using the relation f(X)=0f(X) = 0.

image

Roots of Unity

In any ring RR, an nn-th root of unity is an element ζ\zeta such that ζn=1\zeta^n = 1. But we're particularly interested in primitive nn-th roots of unity ζk1\zeta^k \neq 1 for all positive integers k<nk < n.

Here's a key insight: ζnk\zeta_n^k is a primitive nn-th root of unity if and only if gcd(k,n)=1\gcd(k,n)=1.

Why this works

Let ζn=e2πi/n\zeta_n = e^{2\pi i / n}. The nn-th roots of unity are the nn distinct powers ζn0,ζn1,,ζnn1\zeta_n^0, \zeta_n^1, \ldots, \zeta_n^{n-1}, so ζn\zeta_n itself is primitive.

The key observation is that ζna=1\zeta_n^a = 1 if and only if aa is divisible by nn. Here's why: write n=aq+rn = aq + r with 0r<a0 \leq r < a. Then ζnr=ζnnaq=1\zeta_n^r = \zeta_n^{n-aq} = 1. But since ζn\zeta_n is primitive, this only happens when r=0r = 0.

Now, if gcd(k,n)=d>1\gcd(k,n) = d > 1, then (ζnk)n/d=ζnkn/d=ζnnk/d=1(\zeta_n^k)^{n/d} = \zeta_n^{k \cdot n/d} = \zeta_n^{n \cdot k/d} = 1, so ζnk\zeta_n^k is not primitive.

Conversely, if gcd(k,n)=1\gcd(k,n) = 1 and (ζnk)r=1(\zeta_n^k)^r = 1, then nn divides krkr. Since nn and kk are coprime, we must have nn divides rr. This means the first positive power of ζnk\zeta_n^k that equals 11 is the nn-th power.

Euler's totient function φ(n)\varphi(n) counts exactly these "good" exponents - the positive integers knk \leq n that are coprime to nn. So there are precisely φ(n)\varphi(n) primitive nn-th roots of unity.

Example. In the complex numbers, the primitive nn-th roots of unity are exactly e2πik/ne^{2\pi i k/n} where gcd(k,n)=1\gcd(k,n) = 1.

Euler's totient function φ(n)\varphi(n) counts the positive integers up to nn that are coprime to nn: φ(n)={1kngcd(k,n)=1}\varphi(n) = |\{1 \leq k \leq n \mid \gcd(k,n)=1\}|

A Concrete Example: 8th Roots of Unity in F32\mathbb{F}_{3^2}

Let's work out a specific case. We'll find the 8th roots of unity in the quadratic extension

F32=F3[X]/(X2+1)\mathbb{F}_{3^2} = \mathbb{F}_3[X]/(X^2+1)
How Many Roots Should We Expect?

In a finite field of size qq, the multiplicative group Fq×\mathbb{F}_q^{\times} has order q1q-1.

  • If nn is coprime to q1q-1, then Fq\mathbb{F}_q contains only the trivial nn-th root of unity: just 11.
  • If nn divides q1q-1, the field contains exactly nn distinct nn-th roots of unity, forming the unique subgroup of that order.

Here, q=9q = 9, so q1=8q-1 = 8. Since 88 divides 88, we expect all eight 8th roots of unity to live inside F32×\mathbb{F}_{3^2}^{\times}.

Finding All Eight Roots

We write elements of F32\mathbb{F}_{3^2} in the form a+bxa+bx with a,b{0,1,2}a,b \in \{0,1,2\}, where x2=12(mod3)x^2 = -1 \equiv 2 \pmod{3}.

Using Sage (or by hand if you're feeling ambitious):

R.<t> = PolynomialRing(GF(3))          
F.<x> = GF(3^2, modulus = t^2 + 1) 

roots8 = [z for z in F if z^8 == 1]     
print(roots8)

This gives us:

1,x+1,2x,2x+1,2,2x+2,x,x+21, x + 1, 2x, 2x + 1, 2, 2x + 2, x, x + 2

All eight solutions to z8=1z^8 = 1 in F32\mathbb{F}_{3^2}.

Which Ones Are Primitive?

An element is primitive when its order is exactly 88. Euler's totient tells us φ(8)=4\varphi(8) = 4, so exactly four of our eight roots should be primitive.

g = F.multiplicative_generator() 
primitive = [g^k for k in (1,3,5,7)]   
print(primitive)

This reveals:

{x+1,2x+1,2x+2,x+2}\{x+1, 2x+1, 2x+2, x+2\}

These are our four primitive 8th roots. The remaining roots have orders 1, 2, or 4.

image

Cyclotomic Polynomials: Collecting the Primitives

The nn-th cyclotomic polynomial Φn(X)\Phi_n(X) is designed to capture exactly the primitive nn-th roots of unity. Over the rationals:

Φn(X)=1kngcd(k,n)=1(Xζnk)\Phi_n(X) = \prod_{\substack{1 \leq k \leq n\\\gcd(k,n)=1}} (X - \zeta_n^k)

This polynomial has degree φ(n)\varphi(n) - one factor for each primitive root.

Let's construct Φ8(X)\Phi_8(X)

By definition:

Φ8(X)=1k8gcd(k,8)=1(Xζ8k)=X4+1\Phi_8(X) = \prod_{\substack{1 \leq k \leq 8\\\gcd(k,8)=1}} (X-\zeta_8^k) = X^4+1

This gives us a monic quartic (degree φ(8)=4\varphi(8) = 4) that's irreducible over Q\mathbb{Q}.

Homomorphisms: Structure-Preserving Maps

Group Homomorphisms

A group homomorphism between groups (G,)(G, \cdot) and (H,)(H, *) is a function ϕ:GH\phi: G \to H that respects the group operation:

ϕ(ab)=ϕ(a)ϕ(b)\phi(a \cdot b) = \phi(a) * \phi(b)

Two important subsets come with every homomorphism:

  • The kernel: ker(ϕ)={gGϕ(g)=eH}\ker(\phi) = \{g \in G \mid \phi(g) = e_H\} (elements that map to the identity)
  • The image: im(ϕ)={ϕ(g)gG}\text{im}(\phi) = \{\phi(g) \mid g \in G\} (all possible outputs)

Both form subgroups of their respective groups.

Ring Homomorphisms

Similarly, a ring homomorphism between rings (R,+,×)(R, +, \times) and (S,,)(S, \oplus, \odot) preserves both operations:

  • ψ(a+b)=ψ(a)ψ(b)\psi(a + b) = \psi(a) \oplus \psi(b)
  • ψ(a×b)=ψ(a)ψ(b)\psi(a \times b) = \psi(a) \odot \psi(b)
  • ψ(1R)=1S\psi(1_R) = 1_S (preserves multiplicative identity)

Isomorphisms: When Structures Are "The Same"

When a homomorphism is bijective, we call it an isomorphism. Isomorphic structures are essentially identical from an algebraic perspective - they have the same "shape."

An automorphism is an isomorphism from a structure to itself. The collection of all automorphisms of a group GG forms a group under composition, denoted Aut(G)\text{Aut}(G).

The Chinese Remainder Theorem: A Powerful Decomposition

Let RR be a commutative ring with unity, and let I1,I2,,InI_1, I_2, \ldots, I_n be comaximal ideals (meaning Ii+Ij=RI_i + I_j = R whenever iji \neq j). Define I=I1I2InI = I_1 \cap I_2 \cap \cdots \cap I_n. Then we have an isomorphism:

R/I(R/I1)×(R/I2)××(R/In)R/I \cong (R/I_1) \times (R/I_2) \times \cdots \times (R/I_n)
Why this works

We'll prove this by induction on the number of ideals.

Base Case (n=2n = 2): Let I,JI, J be comaximal ideals with I+J=RI + J = R. We want to show R/(IJ)(R/I)×(R/J)R/(I \cap J) \cong (R/I) \times (R/J).

Consider the natural map:

φ:R(R/I)×(R/J),x(x+I,x+J)\varphi: R \to (R/I) \times (R/J), \quad x \mapsto (x+I, x+J)

By the First Isomorphism Theorem, it suffices to show φ\varphi is surjective with kernel IJI \cap J.

First Isomorphism Theorem for Rings: If φ ⁣:R    S\varphi\colon R \;\longrightarrow\; S is a surjective ring homomorphism with kernel ker(φ)\ker(\varphi), then φ\varphi induces a natural ring isomorphism

φ:R/ker(φ)        S,r+ker(φ)  φ(r).\varphi: R/\ker(\varphi) \;\xrightarrow{\;\sim\;}\; S, \qquad r + \ker(\varphi)\mapsto\;\varphi(r).

Kernel verification: We have

xkerφ(x+I,x+J)=(0+I,0+J)xI and xJxIJx \in \ker\varphi \Leftrightarrow (x+I, x+J) = (0+I, 0+J) \Leftrightarrow x \in I \text{ and } x \in J \Leftrightarrow x \in I \cap J

Surjectivity: Take any (x1+I,x2+J)(R/I)×(R/J)(x_1+I, x_2+J) \in (R/I) \times (R/J). Since I+J=RI + J = R, we can find y1Iy_1 \in I and y2Jy_2 \in J with y1+y2=1y_1 + y_2 = 1.

Define x=x1+y1(x2x1)=x2y2(x2x1)x = x_1 + y_1(x_2 - x_1) = x_2 - y_2(x_2 - x_1).

Then xx1(modI)x \equiv x_1 \pmod{I} and xx2(modJ)x \equiv x_2 \pmod{J}, so φ(x)=(x1+I,x2+J)\varphi(x) = (x_1+I, x_2+J).

Inductive Step: Suppose the theorem holds for nn ideals. For n+1n+1 comaximal ideals I1,,In+1I_1, \ldots, I_{n+1}, observe that I1,,In1,InIn+1I_1, \ldots, I_{n-1}, I_n \cap I_{n+1} are pairwise comaximal.

By the inductive hypothesis:

R/I(R/I1)××(R/In1)×(R/(InIn+1))R/I \cong (R/I_1) \times \cdots \times (R/I_{n-1}) \times (R/(I_n \cap I_{n+1}))

Applying the base case to InI_n and In+1I_{n+1}:

R/(InIn+1)R/In×R/In+1R/(I_n \cap I_{n+1}) \cong R/I_n \times R/I_{n+1}

Combining these gives the desired result.

SIMD Operations in Fully Homomorphic Encryption

Now we get to the exciting part - how all this abstract algebra enables powerful parallel computation in encrypted data.

Where Our Polynomials Live

We start with a quotient ring A=Z[X]/(Φm(X))A = \mathbb{Z}[X]/(\Phi_m(X)), where Φm(X)\Phi_m(X) is the mm-th cyclotomic polynomial. This is our fundamental workspace.

In homomorphic encryption, we don't encrypt integers directly. Instead, we work over a prime field Fp=Z/pZ\mathbb{F}_p = \mathbb{Z}/p\mathbb{Z} that serves as our plaintext space. Reducing all coefficients modulo pp gives us the plaintext ring:

Ap=Fp[X]/(Φm(X))A_p = \mathbb{F}_p[X]/(\overline{\Phi_m}(X))

The overline just reminds us we've done the modular reduction.

The Magic of Cyclotomic Factorization

Here's where number theory delivers something beautiful. As long as pp doesn't divide mm, the cyclotomic polynomial Φm(X)\Phi_m(X) completely factors over Fp\mathbb{F}_p into irreducible pieces of equal degree:

Φm(X)=F1(X)F2(X)Fn(X)modp\overline{\Phi_m}(X) = F_1(X) F_2(X) \cdots F_n(X) \bmod p

Each FiF_i is irreducible with degree d=ordm(p)d = \text{ord}_m(p) - the multiplicative order of pp modulo mm.

Let's pick any irreducible factor, say F1(X)F_1(X), and define:

E=Zp[X]/(F1(X)),η=[XmodF1(X)]EE = \mathbb{Z}_p[X]/(F_1(X)), \quad \eta = [X \bmod F_1(X)] \in E

Since F1(X)F_1(X) is irreducible, EE becomes a field with pdp^d elements. Every element in EE can be written as f(η)f(\eta) for some polynomial f(X)f(X). Notice that η\eta is a root of F1(X)F_1(X), making it a primitive mm-th root of unity.

Why the coset η=[X]\eta=[X] is instantly a root of F1F_1: Forming the quotient ring
E=Zp[X]/(F1(X))E = \mathbb{Z}_p[X]/(F_1(X))
is nothing more than declaring inside the ring that the polynomial F1(X)F_1(X) now equals zero.

  • The projection π:Zp[X]E\pi:\mathbb{Z}_p[X]\to E sends every polynomial to its coset. Write
    η:=π(X)=X+(F1(X)).\eta := \pi(X) = X + (F_1(X)).
    This η\eta is the “image of XX’’.
  • Because F1(X)F_1(X) is in the ideal we factor by, its coset is 00: π(F1(X))=0.\pi\bigl(F_1(X)\bigr)=0.
    But π\pi is a homomorphism, so π(F1(X))=F1(π(X))=F1(η)\pi\bigl(F_1(X)\bigr)=F_1\bigl(\pi(X)\bigr)=F_1(\eta). Hence F1(η)=0F_1(\eta)=0 in EE. By construction, η\eta behaves exactly like a root of F1F_1.

The polynomial Φm(X)\overline{\Phi_m}(X) has φ(m)\varphi(m) distinct roots in EE, specifically ηj\eta^j for each jj in the unit group Zm\mathbb{Z}_m^*. These roots distribute evenly among the irreducible factors, with each factor getting exactly dd roots.

To understand the distribution, consider the subgroup:

H=pˉZm,pˉ:=[pmodm]H = \langle \bar{p} \rangle \subset \mathbb{Z}_m^*, \quad \bar{p} := [p \bmod m]

This subgroup has order dd and contains 1,pˉ,pˉ2,,pˉd11, \bar{p}, \bar{p}^2, \ldots, \bar{p}^{d-1}.

When we form the quotient group Zm/H\mathbb{Z}_m^*/H, we get n=φ(m)/dn = \varphi(m)/d distinct cosets:

kH={kh:hH}ZmkH = \{k \cdot h : h \in H\} \subset \mathbb{Z}_m^*

Choosing representatives k1,k2,,knk_1, k_2, \ldots, k_n from each coset, we can arrange things so each Fi(X)F_i(X) has exactly the dd roots ηk\eta^k where kkiHk \in k_i H.

The Chinese Remainder Breakthrough

Now comes the payoff. We can establish isomorphisms for each factor:

Zp[X]/(Fi(X))E,[f(X)modFi(X)]f(ηki)\mathbb{Z}_p[X]/(F_i(X)) \to E, \quad [f(X) \bmod F_i(X)] \mapsto f(\eta^{k_i})

Combining this with the fact that

ApZp[X]/ ⁣(F1(X))  ×    ×  Zp[X]/ ⁣(Fn(X))f(x)([f(X)modF1(X)],  ,  [f(X)modFn(X)]),\begin{array}{rcl} A_{p} & \longrightarrow & \displaystyle \Bbb Z_{p}[X]\bigl/\!\bigl(F_{1}(X)\bigr) \;\times\; \cdots \;\times\; \Bbb Z_{p}[X]\bigl/\!\bigl(F_{n}(X)\bigr) \\[6pt] f(x) & \longmapsto & \bigl(\, [\,f(X)\bmod F_{1}(X)\,],\; \ldots,\; [\,f(X)\bmod F_{n}(X)\,] \bigr), \end{array}

gives us the crucial isomorphism:

ApEn,f(x)(f(ηk1),,f(ηkn))A_p \to E^n, \quad f(x) \mapsto (f(\eta^{k_1}), \ldots, f(\eta^{k_n}))

This is the key insight: we can perform component-wise operations on vectors in EnE^n by doing ring operations in ApA_p!

If we have:

aAp(α1,,αn)EnbAp(β1,,βn)En\begin{gathered} a \in A_p \leftrightarrow (\alpha_1, \ldots, \alpha_n) \in E^n \\ b \in A_p \leftrightarrow (\beta_1, \ldots, \beta_n) \in E^n \end{gathered}

Then:

a+b(α1+β1,,αn+βn)ab(α1β1,,αnβn)\begin{gathered} a + b \leftrightarrow (\alpha_1 + \beta_1, \ldots, \alpha_n + \beta_n) \\ a \cdot b \leftrightarrow (\alpha_1\beta_1, \ldots, \alpha_n\beta_n) \end{gathered}

Converting between the coefficient representation in ApA_p and the slot representation in EnE^n is computationally straightforward using the Number Theoretic Transform (NTT).

image

Cyclotomic Polynomial Example: NTT Implementation

Let's work through a concrete example with specific parameters:

  • m=8m = 8, φ(m)=4\varphi(m) = 4, p=3p = 3, d=2d = 2 (since 321mod83^2 \equiv 1 \bmod 8), n=φ(m)/d=2n = \varphi(m)/d = 2
Cyclotomic Polynomial Factorization

The 8th cyclotomic polynomial is Φ8(X)=X4+1\Phi_8(X) = X^4 + 1. Over F3[X]\mathbb{F}_3[X], this splits as:

R.<X> = PolynomialRing(GF(3))
factor = (X^4 + 1).factor()
print(factor)      
X4+1=(X2+X+2)(X2+2X+2)in F3[X]X^4 + 1 = (X^2 + X + 2)(X^2 + 2X + 2) \quad \text{in } \mathbb{F}_3[X]

The four primitive 8th roots of unity are:

{α+1,2α+1,2α+2,α+2}E\{\alpha+1, 2\alpha+1, 2\alpha+2, \alpha+2\} \subset E

Each irreducible quadratic "owns" exactly two of them:

PolynomialRoots in EE
X2+X+2X^2 + X + 2{2α+1,α+1}\{2\alpha+1, \alpha+1\}
X2+2X+2X^2 + 2X + 2{2α+2,α+2}\{2\alpha+2, \alpha+2\}
Cosets and the Chinese Remainder Theorem Isomorphism

The coset structure is determined by:

  • Subgroup: H=3={1,3}Z8H = \langle 3 \rangle = \{1, 3\} \subset \mathbb{Z}_8^*
  • Quotient: Z8/H={{1,3},{5,7}}\mathbb{Z}_8^*/H = \{\{1,3\}, \{5,7\}\}
  • Representatives: k1=1k_1 = 1, k2=5k_2 = 5

This gives us the isomorphism:

A3=F3[X]/(X4+1)E2A_3 = \mathbb{F}_3[X]/(X^4+1) \xrightarrow{\sim} E^2 f(x)(f(ηk1),f(ηk2))=(f(α+1),f(2α+2))f(x) \mapsto (f(\eta^{k_1}), f(\eta^{k_2})) = (f(\alpha+1), f(2\alpha+2))
Inverse Transform: Slot Values to Polynomial

We want to encode the following slot values:

f(η)=2,f ⁣(η5)=1,f(\eta)=2,\qquad f\!\bigl(\eta^5\bigr)=1,

where η=α+1\eta = \alpha+1.

Polynomial Representation

Any polynomial in A3A_3 has a unique representative:

f(X)=a0+a1X+a2X2+a3X3,aiF3.f(X)=a_0+a_1X+a_2X^{2}+a_3X^{3},\qquad a_i\in\mathbf F_{3}.

This means we need 4 evaluations of our function to interpolate it using NTT. Because the Frobenius automorphism xx3x \mapsto x^{3} fixes each F3\Bbb F_{3} component, we have:

f(η)=2=f(η3),f(η5)=1=f(η7)f(\eta)= 2 = f(\eta^{3}),\qquad f(\eta^{5})= 1 = f(\eta^{7})
Setting Up the Inverse NTT

Indeed
ω:=η2=2x\omega:=\eta^{2}=2x, ω2=(2x)2=4x2=x2=2\omega^{2}=(2x)^{2}=4x^{2}=x^{2}=2,
ω4=22=1\omega^{4}=2^{2}=1,
and ω1\omega\neq1, ω21\omega^{2}\neq1, so ω\omega is a primitive 44-th root of unity.

We need ff evaluated at the odd powers of η\eta:

{η1,η3,η5,η7}={ηω0,ηω1,ηω2,ηω3}.\{\,\eta^{1},\eta^{3},\eta^{5},\eta^{7}\} =\{\,\eta\,\omega^{0},\,\eta\,\omega^{1},\,\eta\,\omega^{2},\,\eta\,\omega^{3}\}.

Factor out wjw^{j} and absorb it into the coefficients:

f ⁣(w2k+1)=j=03aj(w2k+1)j=j=03ajwj(ωk)j  =  j=03bjωjk,\begin{aligned} f\!\bigl(w^{2k+1}\bigr) &=\sum_{j=0}^{3} a_{j}\bigl(w^{2k+1}\bigr)^{j}\\[4pt] &=\sum_{j=0}^{3} a_{j}\,w^{j}\,(\omega^{k})^{j} \;=\; \sum_{j=0}^{3} b_{j}\,\omega^{jk}, \end{aligned}

Thus

  • Twistb=Dinab=D_{\text{in}}\,a where Din=diag(1,w,w2,w3)D_{\text{in}}=\operatorname{diag}(1,w,w^{2},w^{3}).
  • NTTy=Fby=F\,b with Fk,j=ωjkF_{k,j}=\omega^{jk}.
Untwisting after the inverse NTT

Inverse NTT returns b=F1yb=F^{-1}y. Remove the twist:

aj=bjwj=bjw8j\,a_{j}=b_{j}\,w^{-j}=b_{j}\,w^{8-j}\,

All coefficients are now back in F3\mathbb{F}_{3}.

Thus:

f(X)=a0+a1X+a2X2+a3X3=X3+X,f(X)=a_{0}+a_{1}X+a_{2}X^{2}+a_{3}X^{3}=X^{3}+X,
Sanity Check

We can verify that the polynomial obtained is correct by checking the slot values:

  • Slot 0: f(X)mod(X2+X+2)=2f(X) \mod (X^2 + X + 2) = 2
  • Slot 1: f(X)mod(X2+2X+2)=1f(X) \mod (X^2 + 2X + 2)= 1

Everything matches the given slot values, confirming that the reconstruction is correct.

Adding the second polynomial

Let's do the same transformation for the case

f(η)=x+2,f ⁣(η5)=2x,f(\eta)=x+2,\qquad f\!\bigl(\eta^5\bigr)=2x,

First we find the missing evaluations

f(η3)=f(η)3=2+2x,f(η7)=f(η5)3=x f(\eta^{3}) = f(\eta)^3 = 2 + 2x,\qquad f(\eta^{7}) = f(\eta^5)^3 = x

Then going through the same inverse NTT process we get

f(X)=X+1 f(X)=X + 1
Homomorphic addition

We have

X3+X(2,1),X+1(x+2,2x) X^3 + X \mapsto (2, 1), \qquad X + 1 \mapsto (x+2, 2x)

Let’s check that homomorphic addition is well-defined:

X3+2X+1(x+1,2x+1) X^3 + 2X + 1 \mapsto (x+1, 2x+1)

Finding the missing evaluations:

f(η3)=f(η)3=1+2x,f(η7)=f(η5)3=1+x f(\eta^{3}) = f(\eta)^3 = 1 + 2x,\qquad f(\eta^{7}) = f(\eta^5)^3 = 1 + x

and running inverse NTT to reconstruct the polynomial

f(X)=X3+2X+1 f(X)=X^3 + 2X + 1
F3  = GF(3)
F9.<x> = GF(9, modulus = x^2 + 1)   
R.<X> = PolynomialRing(F3)

slot0, slot1 = (2, 1)
           
w   = x + 1                            
omega = w^2                            
omega_inv = omega.inverse()            

y0, y2 = F9(slot0), F9(slot1)
y1, y3 = y0^3, y2^3
y      = [y0, y1, y2, y3]              

b = []
for j in range(4):
    s = F9.zero()
    for k in range(4):
        s += y[k] * (omega_inv)^(j*k)  
    b.append(s)
    
a = [F3(b[j] * w^(8 - j))  for j in range(4)]   


f = (a[0] + a[1]*X + a[2]*X^2 + a[3]*X^3) % (X^4 + 1)

print(f)

Galois Automorphisms: The Key to Data Movement

To understand how to move data between slots, we return to our ring A=Z[X]/(Φm(X))A = \mathbb{Z}[X]/(\Phi_m(X)), with xx representing the image of XX in AA.

For each jZmj \in \mathbb{Z}_m^*, we can define:

θj:AA,θj(f(x))=f(xj)\theta_j : A \to A, \quad \theta_j(f(x)) = f(x^j)

This is well-defined because if gcd(j,m)=1\gcd(j,m) = 1, then whenever ω\omega is a primitive mm-th root of unity, so is ωj\omega^j. This means that if Φm(ω)=0\Phi_m(\omega) = 0, then Φm(ωj)=0\Phi_m(\omega^j) = 0 as well, giving us:

Φm(X)Φm(Xj)in Z[X]\Phi_m(X) \mid \Phi_m(X^j) \quad \text{in } \mathbb{Z}[X]

These maps have a natural group structure: since (xj)k=xjk(x^j)^k = x^{jk} in AA, we have:

θjθk=θjk\theta_j \circ \theta_k = \theta_{jk}

This gives us an injective group homomorphism:

ZmAut(A),jθj\mathbb{Z}_m^* \hookrightarrow \text{Aut}(A), \quad j \mapsto \theta_j

These θj\theta_j maps are precisely the Galois automorphisms of our cyclotomic extension.

The Frobenius Map: A Special Automorphism

The Frobenius automorphism deserves special attention:

σ:EE,f(η)f(ηp)\sigma : E \to E, \quad f(\eta) \mapsto f(\eta^p)

For every αE\alpha \in E, we have σ(α)=αp\sigma(\alpha) = \alpha^p. Importantly:

αZpσ(α)=α\alpha \in \mathbb{Z}_p \Leftrightarrow \sigma(\alpha) = \alpha

Under our correspondence between ApA_p and EnE^n, if we let pˉ=[pmodm]Zm\bar{p} = [p \bmod m] \in \mathbb{Z}_m^* and apply θpˉ\theta_{\bar{p}}:

θpˉ(f(x))(σ(f(ηk1)),,σ(f(ηkn)))En\theta_{\bar{p}}(f(x)) \to(\sigma(f(\eta^{k_1})), \ldots, \sigma(f(\eta^{k_n}))) \in E^n

This means θpˉ\theta_{\bar{p}} acts slot-wise as the Frobenius map on EE - crucial for understanding rotations.

One-Dimensional Rotations: Making Data Move

Consider an element gZmg \in \mathbb{Z}_m^* such that 1,g,g2,,gn11, g, g^2, \ldots, g^{n-1} forms a complete set of coset representatives for HH in Zm\mathbb{Z}_m^*. This means gng^n must lie in HH.

In the ideal case where gn=1g^n = 1, using representatives 1,g,,gn11, g, \ldots, g^{n-1}, our slot isomorphism becomes:

f(x)Ap(f(η1),f(ηg),,f(ηgn1))Enf(x) \in A_p \leftrightarrow (f(\eta^1), f(\eta^g), \ldots, f(\eta^{g^{n-1}})) \in E^n

When we apply θg\theta_g:

θg(f(x))(f(ηg),f(ηg2),,f(ηgn1),f(η1))\theta_g(f(x)) \leftrightarrow (f(\eta^g), f(\eta^{g^2}), \ldots, f(\eta^{g^{n-1}}), f(\eta^1))

Since the last entry wraps around to f(η1)f(\eta^1), the automorphism θg\theta_g rotates the slots one position to the left! More generally, θge\theta_{g^e} rotates left by ee positions, while θge\theta_{g^{-e}} rotates right by ee positions.

When gn1g^n \neq 1, things get trickier. If gn=pˉsg^n = \bar{p}^s for some s{1,,d1}s \in \{1, \ldots, d-1\}:

θg(f(x))(f(ηg),f(ηg2),,f(ηgn1),σs(f(η1)))\theta_g(f(x)) \leftrightarrow (f(\eta^g), f(\eta^{g^2}), \ldots, f(\eta^{g^{n-1}}), \sigma^s(f(\eta^1)))

This gives a rotation, but the last slot gets perturbed by σs\sigma^s. This isn't a clean rotation unless the first slot contains a value in Zp\mathbb{Z}_p, where σs\sigma^s acts trivially.

image

Clean Rotations Through Masking

Even when slots contain values outside Zp\mathbb{Z}_p, we can achieve perfect rotations using a clever masking technique.

For a rotation by e{1,,n1}e \in \{1, \ldots, n-1\} positions, we create masking elements:

MeAp(1,,1ne,0,,0e)EnM_e \in A_p \leftrightarrow (\underbrace{1,\ldots,1}_{n-e}, \underbrace{0,\ldots,0}_{e}) \in E^n 1MeAp(0,,0ne,1,,1e)En1-M_e \in A_p \leftrightarrow (\underbrace{0,\ldots,0}_{n-e}, \underbrace{1,\ldots,1}_{e}) \in E^n

For an element aAp(α0,,αn1)Ena \in A_p \leftrightarrow (\alpha_0, \ldots, \alpha_{n-1}) \in E^n:

  • Mask-then-rotate left by ee slots:
Meθge(a)(αe,,αn1,0,,0e)M_e \cdot \theta_{g^e}(a) \leftrightarrow (\alpha_e, \ldots, \alpha_{n-1}, \underbrace{0,\ldots,0}_{e})
  • Mask-then-rotate right by ee slots:
(1Me)θgen(a)(0,,0ne,α0,,αe1)(1-M_e) \cdot \theta_{g^{e-n}}(a) \leftrightarrow (\underbrace{0,\ldots,0}_{n-e}, \alpha_0, \ldots, \alpha_{e-1})

Combining these gives a perfect left rotation:

Meθge(a)+(1Me)θgen(a)M_e \cdot \theta_{g^e}(a) + (1-M_e) \cdot \theta_{g^{e-n}}(a)

This produces an element whose slots are exactly those of aa rotated ee places to the left, regardless of whether slot values lie outside Zp\mathbb{Z}_p.

Multi-Dimensional Operations: The Hypercube Approach

For more complex data arrangements, we can organize our nn slots as a multi-dimensional hypercube. Let n=n1n2nn = n_1 n_2 \cdots n_\ell be our slot count factored into \ell positive integers.

We choose generators g1,,g    Zmg_1,\dots,g_\ell\;\in\;\mathbb{Z}_m^{*} and arrange the coset representatives as:

g1e1g2e2ge,ei[ni]:={0,,ni1}g_1^{\,e_1}\,g_2^{\,e_2}\cdots g_\ell^{\,e_\ell}, \qquad e_i\in[n_i]:=\{0,\dots,n_i-1\}

This creates an \ell-dimensional hypercube with side-lengths (n1,,n)(n_1,\dots,n_\ell), where every slot has coordinates (e1,,e)(e_1,\dots,e_\ell).

Axis-wise Rotations Made Simple

For each dimension ii, the Galois automorphism θgi\theta_{g_i} shifts every hyper-column (e1,,ei1,,ei+1,,e)(e_1,\dots,e_{i-1},*,e_{i+1},\dots,e_\ell) forward by one step in the ii-th coordinate. Applying θgie\theta_{g_i^{\,e}} shifts by ee positions, while θgie\theta_{g_i^{-e}} shifts backwards.

Multi-Dimensional Masking

For true rotations without Frobenius disturbance, we use masking elements Me(i)M^{(i)}_{e} that place ones in the first nien_i-e slots of each hyper-column and zeros elsewhere.

The formula for rotating left by ee positions along the ii-th axis is:

Me(i)θgie(a)  +  (1Me(i))θgieni(a)M^{(i)}_{e}\,\theta_{g_i^{\,e}}(a)\;+\;(1-M^{(i)}_{e})\,\theta_{g_i^{\,e-n_i}}(a)

image

SIMD Operations in BGV with Lattigo

This example demonstrates slot-wise homomorphic addition using the Lattigo library’s BGV API. With BGV’s powerful SIMD capabilities, we can pack an entire vector of integers into a single ciphertext, perform encrypted computations on each vector entry in parallel, and then recover the results after decryption.

package main

import (
    "fmt"
    "log"
    "math/rand"
    "time"

    "github.com/tuneinsight/lattigo/v6/examples"
    "github.com/tuneinsight/lattigo/v6/schemes/bgv"
)

func main() {
    paramDef := examples.BGVParamsN12QP109
    params, err := bgv.NewParametersFromLiteral(paramDef)
    if err != nil {
        log.Fatalf("params error: %v", err)
    }

    kgen := bgv.NewKeyGenerator(params)
    sk := kgen.GenSecretKeyNew()
    pk := kgen.GenPublicKeyNew(sk)

    encoder := bgv.NewEncoder(params)
    encryptor := bgv.NewEncryptor(params, pk)
    decryptor := bgv.NewDecryptor(params, sk)
    evaluator := bgv.NewEvaluator(params, nil)

    T := params.PlaintextModulus()
    slots := params.MaxSlots()
    rand.Seed(time.Now().UnixNano())

    a := make([]uint64, slots)
    b := make([]uint64, slots)
    expected := make([]uint64, slots)
    for i := 0; i < slots; i++ {
        a[i] = uint64(rand.Int63n(int64(T)))
        b[i] = uint64(rand.Int63n(int64(T)))
        expected[i] = (a[i] + b[i]) % T
    }

    ptA := bgv.NewPlaintext(params, params.MaxLevel())
    ptB := bgv.NewPlaintext(params, params.MaxLevel())
    if err := encoder.Encode(a, ptA); err != nil {
	    log.Fatal(err)
    }
    if err := encoder.Encode(b, ptB); err != nil {
	    log.Fatal(err)
    }
    ctA, err := encryptor.EncryptNew(ptA)
    if err != nil { log.Fatal(err) }
    ctB, err := encryptor.EncryptNew(ptB)
    if err != nil { log.Fatal(err) }

    ctC, err := evaluator.AddNew(ctA, ctB)
    if err != nil { log.Fatal(err) }

    ptRes := decryptor.DecryptNew(ctC)
    res := make([]uint64, slots)
    if err := encoder.Decode(ptRes, res); err != nil {
	    log.Fatal(err)
    }

    fmt.Printf("i\t a\t b\t (a+b mod T)\t result\t match\n")
    for i := 0; i < slots; i++ {
        fmt.Printf("%d:\t%d\t%d\t%d\t\t%d\t%v\n",
            i, a[i], b[i], expected[i], res[i], res[i] == expected[i])
    }
}

Conclusion

The mathematical foundations explored in this post reveal how abstract algebra transforms fully homomorphic encryption from a theoretical curiosity into a practical tool for secure computation. The key insights we've covered include:

Algebraic Structure: The plaintext ring Ap=Fp[X]/(Φm(X))A_p = \mathbb{F}_p[X]/(\overline{\Phi_m}(X)) naturally decomposes into a product of extension fields EnE^n through cyclotomic factorization. This decomposition is what makes slot-based packing possible.

Galois Automorphisms: The maps θj:f(x)f(xj)\theta_j: f(x) \mapsto f(x^j) provide the mechanism for data movement between slots. These automorphisms preserve the ring structure while permuting slot contents according to the multiplicative group Zm\mathbb{Z}_m^*.

Rotation Techniques: Clean rotations require careful handling of the Frobenius map through masking operations. The hypercube organization extends this to multi-dimensional data arrangements, enabling sophisticated data movement patterns.

Practical Impact: These mathematical tools deliver speedups of several orders of magnitude compared to element-wise processing. A single homomorphic multiplication can now process an entire vector, making FHE viable for real-world applications in secure cloud computing, privacy-preserving machine learning, and confidential data analytics.

Acknowledgements

This post builds upon decades of research in both algebraic number theory and cryptography. Particular acknowledgment goes to Nigel Smart and Frederik Vercauteren for their foundational paper "Fully Homomorphic SIMD Operations", which established the theoretical framework for slot-based parallel computation in FHE schemes. Their work demonstrated how cyclotomic polynomial arithmetic could be leveraged to achieve true vectorization in homomorphic encryption.

The practical implementation insights draw heavily from Shai Halevi and Victor Shoup's comprehensive work "Design and implementation of HElib: a homomorphic encryption library", which translated these theoretical advances into efficient, usable software.