Jekyll2018-06-15T17:50:50+00:00https://mortendahl.github.io/Cryptography and Machine LearningMixing both for private data analysisSecure Computations as Dataflow Programs2018-03-01T12:00:00+00:002018-03-01T12:00:00+00:00https://mortendahl.github.io/2018/03/01/secure-computation-as-dataflow-programs<p><em><strong>TL;DR:</strong> using TensorFlow as a distributed computation framework for dataflow programs we give a full implementation of the SPDZ protocol with networking, in turn enabling optimised machine learning on encrypted data.</em></p>
<p>Unlike <a href="/2017/09/03/the-spdz-protocol-part1/">earlier</a> where we focused on the concepts behind secure computation as well as <a href="/2017/09/19/private-image-analysis-with-mpc/">potential applications</a>, here we build a fully working (passively secure) implementation with players running on different machines and communicating via typical network stacks. And as part of this we investigate the benefits of using a <a href="https://en.wikipedia.org/wiki/Dataflow_programming">modern distributed computation</a> platform when experimenting with secure computations, as opposed to building everything from scratch.</p>
<p>Additionally, this can also be seen as a step in the direction of getting private machine learning into the hands of practitioners, where integration with existing and popular tools such as <a href="https://www.tensorflow.org/">TensorFlow</a> plays an important part. Concretely, while we here only do a relatively shallow integration that doesn’t make use of some of the powerful tools that comes with TensorFlow (e.g. <a href="https://www.tensorflow.org/api_docs/python/tf/gradients">automatic differentiation</a>), we do show how basic technical obstacles can be overcome, potentially paving the way for deeper integrations.</p>
<p>Jumping ahead, it is clear in retrospect that TensorFlow is an obvious candidate framework for quickly experimenting with secure computation protocols, at the very least in the context of private machine learning.</p>
<p><a href="https://github.com/mortendahl/privateml/tree/master/tensorflow/spdz/">All code</a> is available to play with, either locally or on the <a href="https://cloud.google.com/compute/">Google Cloud</a>. To keep it simple our running example throughout is private prediction using <a href="https://beckernick.github.io/logistic-regression-from-scratch/">logistic</a> <a href="https://github.com/ageron/handson-ml/blob/master/04_training_linear_models.ipynb">regression</a>, meaning that given a private (i.e. encrypted) input <code class="highlighter-rouge">x</code> we wish to securely compute <code class="highlighter-rouge">sigmoid(dot(w, x) + b)</code> for private but pre-trained weights <code class="highlighter-rouge">w</code> and bias <code class="highlighter-rouge">b</code> (private training of <code class="highlighter-rouge">w</code> and <code class="highlighter-rouge">b</code> is considered in a follow-up post). <a href="#experiments">Experiments</a> show that for a model with 100 features this can be done in TensorFlow with a latency as low as 60ms and at a rate of up to 20,000 prediction per second.</p>
<p><em>A big thank you goes out to <a href="https://twitter.com/iamtrask">Andrew Trask</a>, <a href="https://twitter.com/korymath">Kory Mathewson</a>, <a href="https://twitter.com/janleike">Jan Leike</a>, and the <a href="https://twitter.com/openminedorg">OpenMined community</a> for inspiration and interesting discussions on this topic!</em></p>
<p><em><strong>Disclaimer</strong>: this implementation is meant for experimentation only and may not live up to required security. In particular, TensorFlow does not currently seem to have been designed with this application in mind, and although it does not appear to be the case right now, may for instance in future versions perform optimisations behind that scene that break the intended security properties. <a href="#thoughts">More notes below</a>.</em></p>
<h1 id="motivation">Motivation</h1>
<p>As hinted above, implementing secure computation protocols such as <a href="/2017/09/03/the-spdz-protocol-part1/">SPDZ</a> is a non-trivial task due to their distributed nature, which is only made worse when we start to introduce various optimisations (<a href="https://github.com/rdragos/awesome-mpc">but</a> <a href="https://github.com/bristolcrypto/SPDZ-2">it</a> <a href="https://github.com/aicis/fresco">can</a> <a href="http://oblivc.org/">be</a> <a href="https://github.com/encryptogroup/ABY">done</a>). For instance, one has to consider how to best orchestrate the simultanuous execution of multiple programs, how to minimise the overhead of sending data across the network, and how to efficient interleave it with computation so that one server only rarely waits on the other. On top of that, we might also want to support different hardware platforms, including for instance both CPUs and GPUs, and for any serious work it is highly valuable to have tools for visual inspection, debugging, and profiling in order to identify issues and bottlenecks.</p>
<p>It should furthermore also be easy to experiment with various optimisations, such as transforming the computation for improved performance, <a href="/2017/09/19/private-image-analysis-with-mpc/#generalised-triples">reusing intermediate results and masked values</a>, and supplying fresh “raw material” in the form of <a href="/2017/09/03/the-spdz-protocol-part1/#multiplication">triples</a> during the execution instead of only generating a large batch ahead of time in an offline phase. Getting all this right can be overwhelming, which is one reason earlier blog posts here focused on the principles behind secure computation protocols and simply did everything locally.</p>
<p>Luckily though, modern distributed computation frameworks such as <a href="https://www.tensorflow.org/">TensorFlow</a> are receiving a lot of research and engineering attention these days due to their use in advanced machine learning on large data sets. And since our focus is on private machine learning there is a natural large fundamental overlap. In particular, the secure operations we are interested in are tensor addition, subtraction, multiplication, dot products, truncation, and sampling, which all have insecure but highly optimised counterparts in TensorFlow.</p>
<h2 id="prerequisites">Prerequisites</h2>
<p>We make the assumption that the main principles behind both TensorFlow and the SPDZ protocol are already understood – if not then there are <a href="https://www.tensorflow.org/tutorials/">plenty</a> <a href="https://learningtensorflow.com/">of</a> <a href="https://github.com/ageron/handson-ml">good</a> <a href="https://developers.google.com/machine-learning/crash-course/">resources</a> for the former (including <a href="https://www.tensorflow.org/about/bib">whitepapers</a>) and e.g. <a href="/2017/09/03/the-spdz-protocol-part1/">previous</a> <a href="/2017/09/10/the-spdz-protocol-part2/">blog</a> <a href="/2017-09-19-private-image-analysis-with-mpc.md">posts</a> for the latter. As for the different parties involved, we also here assume a <a href="/2017/09/19/private-image-analysis-with-mpc/#setting">setting</a> with two server, a crypto producer, an input provider, and an output receiver.</p>
<p>One important note though is that TensorFlow works by first constructing a static <a href="https://www.tensorflow.org/programmers_guide/graphs">computation graph</a> that is subsequently executed in a <a href="https://www.tensorflow.org/api_guides/python/client">session</a>. For instance, inspecting the graph we get from <code class="highlighter-rouge">sigmoid(dot(w, x) + b)</code> in <a href="https://www.tensorflow.org/programmers_guide/graph_viz">TensorBoard</a> shows the following.</p>
<p><img src="/assets/tensorspdz/structure.png" alt="" /></p>
<p>This means that our efforts in this post are concerned with building such a graph, as opposed to actual execution as earlier: we are to some extend making a small compiler that translates secure computations expressed in a simple language into TensorFlow programs. As a result we benefit not only from working at a higher level of abstraction but also from the large amount of efforts that have already gone into optimising graph execution in TensorFlow.</p>
<p>See the <a href="#experiments">experiments</a> for a full code example.</p>
<h1 id="basics">Basics</h1>
<p>Our needs fit nicely with the operations already provided by TensorFlow as seen next, with one main exception: to match typical precision of floating point numbers when instead working with <a href="/2017/09/03/the-spdz-protocol-part1/#fixedpoint-numbers">fixedpoint numbers</a> in the secure setting, we end up encoding into and operating on integers that are larger than what fits in the typical word sizes of 32 or 64 bits, yet today these are the only sizes for which TensorFlow provides operations (a constraint that may have something to do with current support on GPUs).</p>
<p>Luckily though, for the operations we need there are efficient ways around this that allow us to simulate arithmetic on tensors of ~120 bit integers using a list of tensors with identical shape but of e.g. 32 bit integers. And this decomposition moreover has the nice property that we can often operate on each tensor in the list independently, so in addition to enabling the use of TensorFlow, this also allows most operations to be performed in parallel and can actually <a href="https://en.wikipedia.org/wiki/Residue_number_system">increase efficiency</a> compared to operating on single larger numbers, despite the fact that it may initially sound more expensive.</p>
<p>We discuss the <a href="/2018/01/29/the-chinese-remainder-theorem/">details</a> of this elsewhere and for the rest of this post simply assume operations <code class="highlighter-rouge">crt_add</code>, <code class="highlighter-rouge">crt_sub</code>, <code class="highlighter-rouge">crt_mul</code>, <code class="highlighter-rouge">crt_dot</code>, <code class="highlighter-rouge">crt_mod</code>, and <code class="highlighter-rouge">sample</code> that perform the expected operations on lists of tensors. Note that <code class="highlighter-rouge">crt_mod</code>, <code class="highlighter-rouge">crt_mul</code>, and <code class="highlighter-rouge">crt_sub</code> together allow us to define a right shift operation for <a href="/2017/09/03/the-spdz-protocol-part1/#fixedpoint-numbers">fixedpoint truncation</a>.</p>
<h2 id="private-tensors">Private tensors</h2>
<p>Each private tensor is determined by two shares, one of each server. And for the reasons mentioned above, each share is given by a list of tensors, which is represented by a list of nodes in the graph. To hide this complexity we introduce a simple class as follows.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PrivateTensor</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">share0</span><span class="p">,</span> <span class="n">share1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">share0</span> <span class="o">=</span> <span class="n">share0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">share1</span> <span class="o">=</span> <span class="n">share1</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">shape</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">share0</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">unwrapped</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">share0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">share1</span>
</code></pre></div></div>
<p>And thanks to TensorFlow we can know the shape of tensors at graph creation time, meaning we don’t have to keep track of this ourselves.</p>
<h2 id="simple-operations">Simple operations</h2>
<p>Since a secure operation will often be expressed in terms of several TensorFlow operations, we use abstract operations such as <code class="highlighter-rouge">add</code>, <code class="highlighter-rouge">mul</code>, and <code class="highlighter-rouge">dot</code> as a convenient way of constructing the computation graph. The first one is <code class="highlighter-rouge">add</code>, where the resulting graph simply instructs the two servers to locally combine the shares they each have using a subgraph constructed by <code class="highlighter-rouge">crt_add</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">unwrapped</span>
<span class="n">y0</span><span class="p">,</span> <span class="n">y1</span> <span class="o">=</span> <span class="n">y</span><span class="o">.</span><span class="n">unwrapped</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s">'add'</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_0</span><span class="p">):</span>
<span class="n">z0</span> <span class="o">=</span> <span class="n">crt_add</span><span class="p">(</span><span class="n">x0</span><span class="p">,</span> <span class="n">y0</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_1</span><span class="p">):</span>
<span class="n">z1</span> <span class="o">=</span> <span class="n">crt_add</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">y1</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">z0</span><span class="p">,</span> <span class="n">z1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">z</span>
</code></pre></div></div>
<p>Notice how easy it is to use <a href="https://www.tensorflow.org/api_docs/python/tf/device"><code class="highlighter-rouge">tf.device()</code></a> to express which server is doing what! This command ties the computation and its resulting value to the specified host, and instructs TensorFlow to automatically insert appropiate networking operations to make sure that the input values are available when needed!</p>
<p>As an example, in the above, if <code class="highlighter-rouge">x0</code> was previous on, say, the input provider then TensorFlow will insert send and receive instructions that copies it to <code class="highlighter-rouge">SERVER_0</code> as part of computing <code class="highlighter-rouge">add</code>. All of this is abstracted away and the framework will attempt to <a href="https://www.tensorflow.org/about/bib">figure out</a> the best strategy for optimising exactly when to perform sends and receives, including batching to better utilise the network and keeping the compute units busy.</p>
<p>The <a href="https://www.tensorflow.org/api_docs/python/tf/name_scope"><code class="highlighter-rouge">tf.name_scope()</code></a> command on the other hand is simply a logical abstraction that doesn’t influence computations but can be used to make the graphs much easier to visualise in <a href="https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard">TensorBoard</a> by grouping subgraphs as single components as also seen earlier.</p>
<p><img src="/assets/tensorspdz/add.png" alt="" /></p>
<p>Note that by selecting <em>Device</em> coloring in TensorBoard as done above we can also use it to verify where the operations were actually computed, in this case that addition was indeed done locally by the two servers (green and turquoise).</p>
<h2 id="dot-products">Dot products</h2>
<p>We next turn to dot products. This is more complex, not least since we now need to involve the crypto producer, but also since the two servers have to communicate with each other as part of the computation.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">dot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">unwrapped</span>
<span class="n">y0</span><span class="p">,</span> <span class="n">y1</span> <span class="o">=</span> <span class="n">y</span><span class="o">.</span><span class="n">unwrapped</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s">'dot'</span><span class="p">):</span>
<span class="c"># triple generation</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">CRYPTO_PRODUCER</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">sample</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">sample</span><span class="p">(</span><span class="n">y</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">ab</span> <span class="o">=</span> <span class="n">crt_dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="n">a0</span><span class="p">,</span> <span class="n">a1</span> <span class="o">=</span> <span class="n">share</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">b0</span><span class="p">,</span> <span class="n">b1</span> <span class="o">=</span> <span class="n">share</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
<span class="n">ab0</span><span class="p">,</span> <span class="n">ab1</span> <span class="o">=</span> <span class="n">share</span><span class="p">(</span><span class="n">ab</span><span class="p">)</span>
<span class="c"># masking after distributing the triple</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_0</span><span class="p">):</span>
<span class="n">alpha0</span> <span class="o">=</span> <span class="n">crt_sub</span><span class="p">(</span><span class="n">x0</span><span class="p">,</span> <span class="n">a0</span><span class="p">)</span>
<span class="n">beta0</span> <span class="o">=</span> <span class="n">crt_sub</span><span class="p">(</span><span class="n">y0</span><span class="p">,</span> <span class="n">b0</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_1</span><span class="p">):</span>
<span class="n">alpha1</span> <span class="o">=</span> <span class="n">crt_sub</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">a1</span><span class="p">)</span>
<span class="n">beta1</span> <span class="o">=</span> <span class="n">crt_sub</span><span class="p">(</span><span class="n">y1</span><span class="p">,</span> <span class="n">b1</span><span class="p">)</span>
<span class="c"># recombination after exchanging alphas and betas</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_0</span><span class="p">):</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="n">reconstruct</span><span class="p">(</span><span class="n">alpha0</span><span class="p">,</span> <span class="n">alpha1</span><span class="p">)</span>
<span class="n">beta</span> <span class="o">=</span> <span class="n">reconstruct</span><span class="p">(</span><span class="n">beta0</span><span class="p">,</span> <span class="n">beta1</span><span class="p">)</span>
<span class="n">z0</span> <span class="o">=</span> <span class="n">crt_add</span><span class="p">(</span><span class="n">ab0</span><span class="p">,</span>
<span class="n">crt_add</span><span class="p">(</span><span class="n">crt_dot</span><span class="p">(</span><span class="n">a0</span><span class="p">,</span> <span class="n">beta</span><span class="p">),</span>
<span class="n">crt_add</span><span class="p">(</span><span class="n">crt_dot</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">b0</span><span class="p">),</span>
<span class="n">crt_dot</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">))))</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_1</span><span class="p">):</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="n">reconstruct</span><span class="p">(</span><span class="n">alpha0</span><span class="p">,</span> <span class="n">alpha1</span><span class="p">)</span>
<span class="n">beta</span> <span class="o">=</span> <span class="n">reconstruct</span><span class="p">(</span><span class="n">beta0</span><span class="p">,</span> <span class="n">beta1</span><span class="p">)</span>
<span class="n">z1</span> <span class="o">=</span> <span class="n">crt_add</span><span class="p">(</span><span class="n">ab1</span><span class="p">,</span>
<span class="n">crt_add</span><span class="p">(</span><span class="n">crt_dot</span><span class="p">(</span><span class="n">a1</span><span class="p">,</span> <span class="n">beta</span><span class="p">),</span>
<span class="n">crt_dot</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">b1</span><span class="p">)))</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">z0</span><span class="p">,</span> <span class="n">z1</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">truncate</span><span class="p">(</span><span class="n">z</span><span class="p">)</span>
<span class="k">return</span> <span class="n">z</span>
</code></pre></div></div>
<p>However, with <code class="highlighter-rouge">tf.device()</code> we see that this is still relatively straight-forward, at least if the <a href="/2017/09/19/private-image-analysis-with-mpc/#dense-layers">protocol for secure dot products</a> is already understood. We first construct a graph that makes the crypto producer generate a new dot triple. The output nodes of this graph is <code class="highlighter-rouge">a0, a1, b0, b1, ab0, ab1</code></p>
<p>With <code class="highlighter-rouge">crt_sub</code> we then build graphs for the two servers that mask <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y</code> using <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code> respectively. TensorFlow will again take care of inserting networking code that sends the value of e.g. <code class="highlighter-rouge">a0</code> to <code class="highlighter-rouge">SERVER_0</code> during execution.</p>
<p>In the third step we reconstruct <code class="highlighter-rouge">alpha</code> and <code class="highlighter-rouge">beta</code> on each server, and compute the recombination step to get the dot product. Note that we have to define <code class="highlighter-rouge">alpha</code> and <code class="highlighter-rouge">beta</code> twice, one for each server, since although they contain the same value, if we had instead define them only on one server but used them on both, then we would implicitly have inserted additional networking operations and hence slowed down the computation.</p>
<p><img src="/assets/tensorspdz/dot.png" alt="" /></p>
<p>Returning to TensorBoard we can verify that the nodes are indeed tied to the correct players, with yellow being the crypto producer, and green and turquoise being the two servers. Note the convenience of having <code class="highlighter-rouge">tf.name_scope()</code> here.</p>
<h2 id="configuration">Configuration</h2>
<p>To fully claim that this has made the distributed aspects of secure computations much easier to express, we also have to see what is actually needed for <code class="highlighter-rouge">td.device()</code> to work as intended. In the code below we first define an arbitrary job name followed by identifiers for our five players. More interestingly, we then simply specify their network hosts and wrap this in a <a href="https://www.tensorflow.org/deploy/distributed"><code class="highlighter-rouge">ClusterSpec</code></a>. That’s it!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">JOB_NAME</span> <span class="o">=</span> <span class="s">'spdz'</span>
<span class="n">SERVER_0</span> <span class="o">=</span> <span class="s">'/job:{}/task:0'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">JOB_NAME</span><span class="p">)</span>
<span class="n">SERVER_1</span> <span class="o">=</span> <span class="s">'/job:{}/task:1'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">JOB_NAME</span><span class="p">)</span>
<span class="n">CRYPTO_PRODUCER</span> <span class="o">=</span> <span class="s">'/job:{}/task:2'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">JOB_NAME</span><span class="p">)</span>
<span class="n">INPUT_PROVIDER</span> <span class="o">=</span> <span class="s">'/job:{}/task:3'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">JOB_NAME</span><span class="p">)</span>
<span class="n">OUTPUT_RECEIVER</span> <span class="o">=</span> <span class="s">'/job:{}/task:4'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">JOB_NAME</span><span class="p">)</span>
<span class="n">HOSTS</span> <span class="o">=</span> <span class="p">[</span>
<span class="s">'10.132.0.4:4440'</span><span class="p">,</span>
<span class="s">'10.132.0.5:4441'</span><span class="p">,</span>
<span class="s">'10.132.0.6:4442'</span><span class="p">,</span>
<span class="s">'10.132.0.7:4443'</span><span class="p">,</span>
<span class="s">'10.132.0.8:4444'</span><span class="p">,</span>
<span class="p">]</span>
<span class="n">CLUSTER</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">train</span><span class="o">.</span><span class="n">ClusterSpec</span><span class="p">({</span>
<span class="n">JOB_NAME</span><span class="p">:</span> <span class="n">HOSTS</span>
<span class="p">})</span>
</code></pre></div></div>
<p><em>Note that in the screenshots we are actually running the input provider and output receiver on the same host, and hence both show up as <code class="highlighter-rouge">3/device:CPU:0</code>.</em></p>
<p>Finally, the code that each player executes is about as simple as it gets.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">server</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">train</span><span class="o">.</span><span class="n">Server</span><span class="p">(</span><span class="n">CLUSTER</span><span class="p">,</span> <span class="n">job_name</span><span class="o">=</span><span class="n">JOB_NAME</span><span class="p">,</span> <span class="n">task_index</span><span class="o">=</span><span class="n">ROLE</span><span class="p">)</span>
<span class="n">server</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="n">server</span><span class="o">.</span><span class="n">join</span><span class="p">()</span>
</code></pre></div></div>
<p>Here the value of <code class="highlighter-rouge">ROLE</code> is the only thing that differs between the programs the five players run and typically given as a command-line argument.</p>
<h1 id="improvements">Improvements</h1>
<p>With the basics in place we can look at a few optimisations.</p>
<h2 id="tracking-nodes">Tracking nodes</h2>
<p>Our first improvement allows us to reuse computations. For instance, if we need the result of <code class="highlighter-rouge">dot(x, y)</code> twice then we want to avoid computing it a second time and instead reuse the first. Concretely, we want to keep track of nodes in the graph and link back to them whenever possible.</p>
<p>To do this we simply maintain a global dictionary of <code class="highlighter-rouge">PrivateTensor</code> references as we build the graph, and use this for looking up already existing results before adding new nodes. For instance, <code class="highlighter-rouge">dot</code> now becomes as follows.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">dot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span>
<span class="n">node_key</span> <span class="o">=</span> <span class="p">(</span><span class="s">'dot'</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">nodes</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node_key</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
<span class="k">if</span> <span class="n">z</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="c"># ... as before ...</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">z0</span><span class="p">,</span> <span class="n">z1</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">truncate</span><span class="p">(</span><span class="n">z</span><span class="p">)</span>
<span class="n">nodes</span><span class="p">[</span><span class="n">node_key</span><span class="p">]</span> <span class="o">=</span> <span class="n">z</span>
<span class="k">return</span> <span class="n">z</span>
</code></pre></div></div>
<p>While already significant for some applications, this change also opens up for our next improvement.</p>
<h2 id="reusing-masked-tensors">Reusing masked tensors</h2>
<p>We have <a href="/2017/09/10/the-spdz-protocol-part2/">already</a> <a href="/2017/09/19/private-image-analysis-with-mpc/#generalised-triples">mentioned</a> that we’d ideally want to mask every private tensor at most once to primarily save on networking. For instance, if we are computing both <code class="highlighter-rouge">dot(w, x)</code> and <code class="highlighter-rouge">dot(w, y)</code> then we want to use the same masked version of <code class="highlighter-rouge">w</code> in both. Specifically, if we are doing many operations with the same masked tensor then the cost of masking it can be amortised away.</p>
<p>But with the current setup we mask every time we compute e.g. <code class="highlighter-rouge">dot</code> or <code class="highlighter-rouge">mul</code> since masking is baked into these. So to avoid this we simply make masking an explicit operation, additionally allowing us to also use the same masked version across different operations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">mask</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span>
<span class="n">node_key</span> <span class="o">=</span> <span class="p">(</span><span class="s">'mask'</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
<span class="n">masked</span> <span class="o">=</span> <span class="n">nodes</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node_key</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
<span class="k">if</span> <span class="n">masked</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">unwrapped</span>
<span class="n">shape</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">shape</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s">'mask'</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">CRYPTO_PRODUCER</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">sample</span><span class="p">(</span><span class="n">shape</span><span class="p">)</span>
<span class="n">a0</span><span class="p">,</span> <span class="n">a1</span> <span class="o">=</span> <span class="n">share</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_0</span><span class="p">):</span>
<span class="n">alpha0</span> <span class="o">=</span> <span class="n">crt_sub</span><span class="p">(</span><span class="n">x0</span><span class="p">,</span> <span class="n">a0</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_1</span><span class="p">):</span>
<span class="n">alpha1</span> <span class="o">=</span> <span class="n">crt_sub</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">a1</span><span class="p">)</span>
<span class="c"># exchange of alphas</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_0</span><span class="p">):</span>
<span class="n">alpha_on_0</span> <span class="o">=</span> <span class="n">reconstruct</span><span class="p">(</span><span class="n">alpha0</span><span class="p">,</span> <span class="n">alpha1</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_1</span><span class="p">):</span>
<span class="n">alpha_on_1</span> <span class="o">=</span> <span class="n">reconstruct</span><span class="p">(</span><span class="n">alpha0</span><span class="p">,</span> <span class="n">alpha1</span><span class="p">)</span>
<span class="n">masked</span> <span class="o">=</span> <span class="n">MaskedPrivateTensor</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">a0</span><span class="p">,</span> <span class="n">a1</span><span class="p">,</span> <span class="n">alpha_on_0</span><span class="p">,</span> <span class="n">alpha_on_1</span><span class="p">)</span>
<span class="n">nodes</span><span class="p">[</span><span class="n">node_key</span><span class="p">]</span> <span class="o">=</span> <span class="n">masked</span>
<span class="k">return</span> <span class="n">masked</span>
</code></pre></div></div>
<p>Note that we introduce a <code class="highlighter-rouge">MaskedPrivateTensor</code> class as part of this, which is again simply a convenient way of abstracting over the five lists of tensors we get from <code class="highlighter-rouge">mask(x)</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MaskedPrivateTensor</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">a0</span><span class="p">,</span> <span class="n">a1</span><span class="p">,</span> <span class="n">alpha_on_0</span><span class="p">,</span> <span class="n">alpha_on_1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">a</span>
<span class="bp">self</span><span class="o">.</span><span class="n">a0</span> <span class="o">=</span> <span class="n">a0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">a1</span> <span class="o">=</span> <span class="n">a1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">alpha_on_0</span> <span class="o">=</span> <span class="n">alpha_on_0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">alpha_on_1</span> <span class="o">=</span> <span class="n">alpha_on_1</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">shape</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">a</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">unwrapped</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">a0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">a1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">alpha_on_0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">alpha_on_1</span>
</code></pre></div></div>
<p>With this we may rewrite <code class="highlighter-rouge">dot</code> as below, which is now only responsible for the recombination step.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">dot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span> <span class="ow">or</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">MaskedPrivateTensor</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">)</span> <span class="ow">or</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">MaskedPrivateTensor</span><span class="p">)</span>
<span class="n">node_key</span> <span class="o">=</span> <span class="p">(</span><span class="s">'dot'</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">nodes</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node_key</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
<span class="k">if</span> <span class="n">z</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">):</span> <span class="n">x</span> <span class="o">=</span> <span class="n">mask</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">PrivateTensor</span><span class="p">):</span> <span class="n">y</span> <span class="o">=</span> <span class="n">mask</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="n">a</span><span class="p">,</span> <span class="n">a0</span><span class="p">,</span> <span class="n">a1</span><span class="p">,</span> <span class="n">alpha_on_0</span><span class="p">,</span> <span class="n">alpha_on_1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">unwrapped</span>
<span class="n">b</span><span class="p">,</span> <span class="n">b0</span><span class="p">,</span> <span class="n">b1</span><span class="p">,</span> <span class="n">beta_on_0</span><span class="p">,</span> <span class="n">beta_on_1</span> <span class="o">=</span> <span class="n">y</span><span class="o">.</span><span class="n">unwrapped</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">name_scope</span><span class="p">(</span><span class="s">'dot'</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">CRYPTO_PRODUCER</span><span class="p">):</span>
<span class="n">ab</span> <span class="o">=</span> <span class="n">crt_dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="n">ab0</span><span class="p">,</span> <span class="n">ab1</span> <span class="o">=</span> <span class="n">share</span><span class="p">(</span><span class="n">ab</span><span class="p">)</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_0</span><span class="p">):</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="n">alpha_on_0</span>
<span class="n">beta</span> <span class="o">=</span> <span class="n">beta_on_0</span>
<span class="n">z0</span> <span class="o">=</span> <span class="n">crt_add</span><span class="p">(</span><span class="n">ab0</span><span class="p">,</span>
<span class="n">crt_add</span><span class="p">(</span><span class="n">crt_dot</span><span class="p">(</span><span class="n">a0</span><span class="p">,</span> <span class="n">beta</span><span class="p">),</span>
<span class="n">crt_add</span><span class="p">(</span><span class="n">crt_dot</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">b0</span><span class="p">),</span>
<span class="n">crt_dot</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">))))</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">device</span><span class="p">(</span><span class="n">SERVER_1</span><span class="p">):</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="n">alpha_on_1</span>
<span class="n">beta</span> <span class="o">=</span> <span class="n">beta_on_1</span>
<span class="n">z1</span> <span class="o">=</span> <span class="n">crt_add</span><span class="p">(</span><span class="n">ab1</span><span class="p">,</span>
<span class="n">crt_add</span><span class="p">(</span><span class="n">crt_dot</span><span class="p">(</span><span class="n">a1</span><span class="p">,</span> <span class="n">beta</span><span class="p">),</span>
<span class="n">crt_dot</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">b1</span><span class="p">)))</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">z0</span><span class="p">,</span> <span class="n">z1</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">truncate</span><span class="p">(</span><span class="n">z</span><span class="p">)</span>
<span class="n">nodes</span><span class="p">[</span><span class="n">node_key</span><span class="p">]</span> <span class="o">=</span> <span class="n">z</span>
<span class="k">return</span> <span class="n">z</span>
</code></pre></div></div>
<p>As a verification we can see that TensorBoard shows us the expected graph structure, in this case inside the graph for <a href="/2017/04/17/private-deep-learning-with-mpc/#approximating-sigmoid"><code class="highlighter-rouge">sigmoid</code></a>.</p>
<p><img src="/assets/tensorspdz/masking-reuse.png" alt="" /></p>
<p>Here the value of <code class="highlighter-rouge">square(x)</code> is first computed, then masked, and finally reused in four multiplications.</p>
<p>There is an inefficiency though: while the <a href="https://arxiv.org/abs/1603.04467">dataflow nature</a> of TensorFlow will in general take care of only recomputing the parts of the graph that have changed between two executions, this does not apply to operations involving sampling via e.g. <a href="https://www.tensorflow.org/api_docs/python/tf/random_uniform"><code class="highlighter-rouge">tf.random_uniform</code></a>, which is used in our sharing and masking. Consequently, masks are not being reused across executions.</p>
<h2 id="caching-values">Caching values</h2>
<p>To get around the above issue we can introduce caching of values that survive across different executions of the graph, and an easy way of doing this is to store tensors in <a href="https://www.tensorflow.org/api_docs/python/tf/Variable">variables</a>. Normal executions will read from these, while an explicit <code class="highlighter-rouge">cache_populators</code> set of operations allow us to populated them.</p>
<p>For example, wrapping our two tensors <code class="highlighter-rouge">w</code> and <code class="highlighter-rouge">b</code> with such <code class="highlighter-rouge">cache</code> operation gets us the following graph.</p>
<p><img src="/assets/tensorspdz/cached.png" alt="" /></p>
<p>When executing the cache population operations TensorFlow automatically figures out which subparts of the graph it needs to execute to generate the values to be cached, and which can be ignored.</p>
<p><img src="/assets/tensorspdz/cached-populate.png" alt="" /></p>
<p>And likewise when predicting, in this case skipping sharing and masking.</p>
<p><img src="/assets/tensorspdz/cached-predict.png" alt="" /></p>
<h2 id="buffering-triples">Buffering triples</h2>
<p>Recall that a main purpose of <a href="/2017/09/03/the-spdz-protocol-part1/#multiplication">triples</a> is to move the computation of the crypto producer to an <em>offline phase</em> and distribute its results to the two servers ahead of time in order to speed up their computation later during the <em>online phase</em>.</p>
<p>So far we haven’t done anything to specify that this should happen though, and from reading the above code it’s not unreasonable to assume that the crypto producer will instead compute in synchronisation with the two servers, injecting idle waiting periods throughout their computation. However, from experiments it seems that TensorFlow is already smart enough to optimise the graph to do the right thing and batch triple distribution, presumably to save on networking. We still have an initial waiting period though, that we could get rid of by introducing a separate compute-and-distribute execution that fills up buffers.</p>
<p><img src="/assets/tensorspdz/tracing.png" alt="" /></p>
<p>We’ll skip this issue for now and instead return to it when looking at private training since it is not unreasonable to expect significant performance improvements there from distributing the training data ahead of time.</p>
<h1 id="profiling">Profiling</h1>
<p>As a final reason to be excited about building dataflow programs in TensorFlow we also look at the built-in <a href="https://www.tensorflow.org/programmers_guide/graph_viz#runtime_statistics">runtime statistics</a>. We have already seen the built-in detailed tracing support above, but in TensorBoard we can also easily see how expensive each operation was both in terms of compute and memory. The numbers reported here are from the <a href="#experiments">experiments</a> below.</p>
<p><img src="/assets/tensorspdz/computetime.png" alt="" /></p>
<p>The heatmap above e.g. shows that <code class="highlighter-rouge">sigmoid</code> was the most expensive operation in the run and that the dot product took roughly 30ms to execute. Moreover, in the below figure we have navigated further into the dot block and see that sharing in this particular run taking about 3ms.</p>
<p><img src="/assets/tensorspdz/computetime-detailed.png" alt="" /></p>
<p>This way we can potentially identify bottlenecks and compare performance of different approaches. And if needed we can of course switch to tracing for even more details.</p>
<h1 id="experiments">Experiments</h1>
<p>The <a href="https://github.com/mortendahl/privateml/tree/master/tensorflow/spdz/">GitHub repository</a> contains the code needed for experimentation, including examples and instructions for setting up either a <a href="https://github.com/mortendahl/privateml/tree/master/tensorflow/spdz/configs/localhost">local configuration</a> or a <a href="https://github.com/mortendahl/privateml/tree/master/tensorflow/spdz/configs/gcp">GCP configuration</a> of hosts. For the running example of private prediciton using a logistic regression model we use the GCP configuration, i.e. the parties are running on different virtual hosts located in the same Google Cloud zone, here on some of the weaker instances, namely dual core and 10GB memory.</p>
<p>A slightly simplified version of our program is as follows, where we first train a model in public, build a graph for the private prediction computation, and then run it in a fresh session. The model was somewhat arbitrarily picked to have 100 features.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">config</span> <span class="kn">import</span> <span class="n">session</span>
<span class="kn">from</span> <span class="nn">tensorspdz</span> <span class="kn">import</span> <span class="p">(</span>
<span class="n">define_input</span><span class="p">,</span> <span class="n">define_variable</span><span class="p">,</span>
<span class="n">add</span><span class="p">,</span> <span class="n">dot</span><span class="p">,</span> <span class="n">sigmoid</span><span class="p">,</span> <span class="n">cache</span><span class="p">,</span> <span class="n">mask</span><span class="p">,</span>
<span class="n">encode_input</span><span class="p">,</span> <span class="n">decode_output</span>
<span class="p">)</span>
<span class="c"># publicly train `weights` and `bias`</span>
<span class="n">weights</span><span class="p">,</span> <span class="n">bias</span> <span class="o">=</span> <span class="n">train_publicly</span><span class="p">()</span>
<span class="c"># define shape of unknown input</span>
<span class="n">shape_x</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="c"># construct graph for private prediction</span>
<span class="n">input_x</span><span class="p">,</span> <span class="n">x</span> <span class="o">=</span> <span class="n">define_input</span><span class="p">(</span><span class="n">shape_x</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'x'</span><span class="p">)</span>
<span class="n">init_w</span><span class="p">,</span> <span class="n">w</span> <span class="o">=</span> <span class="n">define_variable</span><span class="p">(</span><span class="n">weights</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'w'</span><span class="p">)</span>
<span class="n">init_b</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="n">define_variable</span><span class="p">(</span><span class="n">bias</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'b'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">use_caching</span><span class="p">:</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">cache</span><span class="p">(</span><span class="n">mask</span><span class="p">(</span><span class="n">w</span><span class="p">))</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">cache</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">sigmoid</span><span class="p">(</span><span class="n">add</span><span class="p">(</span><span class="n">dot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">w</span><span class="p">),</span> <span class="n">b</span><span class="p">))</span>
<span class="c"># start session between all players</span>
<span class="k">with</span> <span class="n">session</span><span class="p">()</span> <span class="k">as</span> <span class="n">sess</span><span class="p">:</span>
<span class="c"># share and distribute `weights` and `bias` to the two servers</span>
<span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="n">init_w</span><span class="p">,</span> <span class="n">init_b</span><span class="p">])</span>
<span class="k">if</span> <span class="n">use_caching</span><span class="p">:</span>
<span class="c"># compute and store cached values</span>
<span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cache_populators</span><span class="p">)</span>
<span class="c"># prepare to use `X` as private input for prediction</span>
<span class="n">feed_dict</span> <span class="o">=</span> <span class="n">encode_input</span><span class="p">([</span>
<span class="p">(</span><span class="n">input_x</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span>
<span class="p">])</span>
<span class="c"># run secure computation and reveal output</span>
<span class="n">y_pred</span> <span class="o">=</span> <span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">reveal</span><span class="p">(</span><span class="n">y</span><span class="p">),</span> <span class="n">feed_dict</span><span class="o">=</span><span class="n">feed_dict</span><span class="p">)</span>
<span class="k">print</span> <span class="n">decode_output</span><span class="p">(</span><span class="n">y_pred</span><span class="p">)</span>
</code></pre></div></div>
<p>Running this a few times with different sizes of <code class="highlighter-rouge">X</code> gives the timings below, where the entire computation is considered including triple generation and distribution; slightly surprisingly there were no real difference between caching masked values or not.</p>
<center>
<img width="80%" height="80%" src="/assets/tensorspdz/timings-10000.png" />
</center>
<p>Processing batches of size 1, 10, and 100 took roughly the same time, ~60ms on average, which might suggest a lower latency bound due to networking. At 1000 the time jumps to ~110ms, at 10,000 to ~600ms, and finally at 100,000 to ~5s. As such, if latency is important than we can perform ~1600 predictions per second, while if more flexible then at least ~20,000 per second.</p>
<center>
<img width="80%" height="80%" src="/assets/tensorspdz/timings-100000.png" />
</center>
<p>This however is measuring only timings reported by profiling, with actual execution time taking a bit longer; hopefully some of the production-oriented tools such as <a href="https://www.tensorflow.org/serving/"><code class="highlighter-rouge">tf.serving</code></a> that come with TensorFlow can improve on this.</p>
<h1 id="thoughts">Thoughts</h1>
<p>After private prediction it’ll of course also be interesting to look at private training. Caching of masked training data might be more relevant here since it remains fixed throughout the process.</p>
<p>The serving of models can also be improved, using for instance the production-ready <a href="https://www.tensorflow.org/serving/"><code class="highlighter-rouge">tf.serving</code></a> one might be able to avoid much of the current initial overhead for orchestration, as well as having endpoints that can be <a href="https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md">safely exposed</a> to the public.</p>
<p>Finally, there are security improvements to be made on e.g. communication between the five parties. In particular, in the current version of TensorFlow all communication is happening over unencrypted and unauthenticated <a href="https://grpc.io/">gRPC</a> connections, which means that someone listening in on the network traffic in principle could learn all private values. Since support for <a href="https://grpc.io/docs/guides/auth.html">TLS</a> is already there in gRPC it might be straight-forward to make use of it in TensorFlow without a significant impact on performance. Likewise, TensorFlow does not currently use a strong pseudo-random generator for <a href="https://www.tensorflow.org/api_docs/python/tf/random_uniform"><code class="highlighter-rouge">tf.random_uniform</code></a> and hence sharing and masking are not as secure as they could be; adding an operation for cryptographically strong randomness might be straight-forward and should give roughly the same performance.</p>
<!--
# Dump
- https://learningtensorflow.com/lesson11/
- [XLA](https://www.tensorflow.org/performance/xla/)
TODO overhead compared to plain (maybe wait until training?)
https://www.tensorflow.org/programmers_guide/graphs
https://en.wikipedia.org/wiki/Dataflow_programming
https://github.com/ppwwyyxx/tensorpack
https://github.com/tensorflow/serving/issues/193
https://github.com/sandtable/ssl_grpc_example
[TensorBoard](https://www.tensorflow.org/programmers_guide/graph_viz)
-->Morten DahlTL;DR: using TensorFlow as a distributed computation framework for dataflow programs we give a full implementation of the SPDZ protocol with networking, in turn enabling optimised machine learning on encrypted data.Private Image Analysis with MPC2017-09-19T12:00:00+00:002017-09-19T12:00:00+00:00https://mortendahl.github.io/2017/09/19/private-image-analysis-with-mpc<p><em><strong>TL;DR:</strong> we take a typical CNN deep learning model and go through a series of steps that enable both training and prediction to instead be done on encrypted data.</em></p>
<p>Using deep learning to analyse images through <a href="http://cs231n.github.io/">convolutional neural networks</a> (CNNs) has gained enormous popularity over the last few years due to their success in out-performing many other approaches on this and related tasks.</p>
<p>One recent application took the form of <a href="http://www.nature.com/nature/journal/v542/n7639/full/nature21056.html">skin cancer detection</a>, where anyone can quickly take a photo of a skin lesion using a mobile phone app and have it analysed with “performance on par with [..] experts” (see the <a href="https://www.youtube.com/watch?v=toK1OSLep3s">associated video</a> for a demo). Having access to a large set of clinical photos played a key part in training this model – a data set that could be considered sensitive.</p>
<p>Which brings us to privacy and eventually <a href="https://en.wikipedia.org/wiki/Secure_multi-party_computation">secure multi-party computation</a> (MPC): how many applications are limited today due to the lack of access to data? In the above case, could the model be improved by letting anyone with a mobile phone app contribute to the training data set? And if so, how many would volunteer given the risk of exposing personal health related information?</p>
<p>With MPC we can potentially lower the risk of exposure and hence increase the incentive to participate. More concretely, by instead performing the training on encrypted data we can prevent anyone from ever seeing not only individual data, but also the learned model parameters. Further techniques such as <a href="https://en.wikipedia.org/wiki/Differential_privacy">differential privacy</a> could additionally be used to hide any leakage from predictions as well, but we won’t go into that here.</p>
<p>In this blog post we’ll look at a simpler use case for image analysis but go over all required techniques. A few notebooks are presented along the way, with the main one given as part of the <a href="#proof-of-concept-implementation">proof of concept implementation</a>.</p>
<p><a href="https://github.com/mortendahl/talks/raw/master/ParisML17.pdf">Slides</a> from a more recent presentation at the <a href="http://mlparis.org/">Paris Machine Learning meetup</a> are now also available.</p>
<p><em>A big thank you goes out to <a href="https://twitter.com/iamtrask">Andrew Trask</a>, <a href="https://twitter.com/smartcryptology">Nigel Smart</a>, <a href="https://twitter.com/adria">Adrià Gascón</a>, and the <a href="https://twitter.com/openminedorg">OpenMined community</a> for inspiration and interesting discussions on this topic! <a href="https://weakish.github.io/">Jakukyo Friel</a> has also very kindly made a <a href="https://www.jqr.com/article/000113">Chinese translation</a>.</em></p>
<h1 id="setting">Setting</h1>
<p>We will assume that the training data set is jointly held by a set of <em>input providers</em> and that the training is performed by two distinct <em>servers</em> (or <em>parties</em>) that are trusted not to collaborate beyond what our protocol specifies. In practice, these servers could for instance be virtual instances in a shared cloud environment operated by two different organisations.</p>
<p>The input providers are only needed in the very beginning to transmit their (encrypted) training data; after that all computations involve only the two servers, meaning it is indeed plausible for the input providers to use e.g. mobile phones. Once trained, the model will remain jointly held in encrypted form by the two servers where anyone can use it to make further encrypted predictions.</p>
<p>For technical reasons we also assume a distinct <em>crypto producer</em> that generates certain raw material used during the computation for increased efficiency; there are ways to eliminate this additional entity but we won’t go into that here.</p>
<p>Finally, in terms of security we aim for a typical notion used in practice, namely <em>honest-but-curious (or passive) security</em>, where the servers are assumed to follow the protocol but may otherwise try to learn as much possible from the data they see. While a slightly weaker notion than <em>fully malicious (or active) security</em> with respect to the servers, this still gives strong protection against anyone who may compromise one of the servers <em>after</em> the computations, despite what they do. Note that for the purpose of this blog post we will actually allow a small privacy leakage during training as detailed later.</p>
<h1 id="image-analysis-with-cnns">Image Analysis with CNNs</h1>
<p>Our use case is the canonical <a href="https://www.tensorflow.org/get_started/mnist/beginners">MNIST handwritten digit recognition</a>, namely learning to identify the Arabic numeral in a given image, and we will use the following CNN model from a <a href="https://github.com/fchollet/keras/blob/master/examples/mnist_transfer_cnn.py">Keras example</a> as our base.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">feature_layers</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">28</span><span class="p">,</span> <span class="mi">28</span><span class="p">,</span> <span class="mi">1</span><span class="p">)),</span>
<span class="n">Activation</span><span class="p">(</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">),</span>
<span class="n">Activation</span><span class="p">(</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">)),</span>
<span class="n">Dropout</span><span class="p">(</span><span class="o">.</span><span class="mi">25</span><span class="p">),</span>
<span class="n">Flatten</span><span class="p">()</span>
<span class="p">]</span>
<span class="n">classification_layers</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">),</span>
<span class="n">Activation</span><span class="p">(</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">Dropout</span><span class="p">(</span><span class="o">.</span><span class="mi">50</span><span class="p">),</span>
<span class="n">Dense</span><span class="p">(</span><span class="n">NUM_CLASSES</span><span class="p">),</span>
<span class="n">Activation</span><span class="p">(</span><span class="s">'softmax'</span><span class="p">)</span>
<span class="p">]</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">(</span><span class="n">feature_layers</span> <span class="o">+</span> <span class="n">classification_layers</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="nb">compile</span><span class="p">(</span>
<span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="s">'adam'</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">])</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">))</span>
</code></pre></div></div>
<p>We won’t go into the details of this model here since the principles are already <a href="http://cs231n.stanford.edu/">well-covered</a> <a href="https://github.com/ageron/handson-ml">elsewhere</a>, but the basic idea is to first run an image through a set of <em>feature layers</em> that transforms the raw pixels of the input image into abstract properties that are more relevant for our classification task. These properties are then subsequently combined by a set of <em>classification layers</em> to yield a probability distribution over the possible digits. The final outcome is then typically simply the digit with highest assigned probability.</p>
<p>As we shall see, using <a href="https://keras.io/">Keras</a> has the benefit that we can perform quick experiments on unencrypted data to get an idea of the performance of the model itself, as well as providing a simple interface to later mimic in the encrypted setting.</p>
<h1 id="secure-computation-with-spdz">Secure Computation with SPDZ</h1>
<p>With CNNs in place we next turn to MPC. For this we will use the state-of-the-art SPDZ protocol as it allows us to only have two servers and to improve <em>online</em> performance by moving certain computations to an <em>offline</em> phase as described in detail in earlier <a href="/2017/09/03/the-spdz-protocol-part1">blog</a> <a href="/2017/09/10/the-spdz-protocol-part2">posts</a>.</p>
<p>As typical in secure computation protocols, all computations take place in a field, here identified by a prime <code class="highlighter-rouge">Q</code>. This means we need to <a href="/2017/09/03/the-spdz-protocol-part1#fixed-point-encoding">encode</a> the floating-point numbers used by the CNNs as integers modulo a prime, which puts certain constraints on <code class="highlighter-rouge">Q</code> and in turn has an affect on performance.</p>
<p>Moreover, <a href="/2017/09/10/the-spdz-protocol-part2">recall</a> that in interactive computations such as the SPDZ protocol it becomes relevant to also consider communication and round complexity, in addition to the typical time complexity. Here, the former measures the number of bits sent across the network, which is a relatively slow process, and the latter the number of synchronisation points needed between the two servers, which may block one of them with nothing to do until the other catches up. Both hence also have a big impact on overall executing time.</p>
<p>Most importantly however, is that the only “native” operations we have in these protocols is addition and multiplication. Division, comparison, etc. can be done, but are more expensive in terms of our three performance measures. Later we shall see how to mitigate some of the issues raised due to this, but here we first recall the basic SPDZ protocol.</p>
<h2 id="tensor-operations">Tensor operations</h2>
<p>When we introduced the SPDZ protocol <a href="/2017/09/03/the-spdz-protocol-part1">earlier</a> we did so in the form of classes <code class="highlighter-rouge">PublicValue</code> and <code class="highlighter-rouge">PrivateValue</code> representing respectively a (scalar) value known in clear by both servers and an encrypted value known only in secret shared form. In this blog post, we now instead present it more naturally via classes <code class="highlighter-rouge">PublicTensor</code> and <code class="highlighter-rouge">PrivateTensor</code> that reflect the heavy use of <a href="https://www.tensorflow.org/programmers_guide/tensors">tensors</a> in our deep learning setting.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PrivateTensor</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">values</span><span class="p">,</span> <span class="n">shares0</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">shares1</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">values</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">shares0</span><span class="p">,</span> <span class="n">shares1</span> <span class="o">=</span> <span class="n">share</span><span class="p">(</span><span class="n">values</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">shares0</span> <span class="o">=</span> <span class="n">shares0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">shares1</span> <span class="o">=</span> <span class="n">shares1</span>
<span class="k">def</span> <span class="nf">reconstruct</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">PublicTensor</span><span class="p">(</span><span class="n">reconstruct</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">shares0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">shares1</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PublicTensor</span><span class="p">:</span>
<span class="n">shares0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">values</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">shares0</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">shares1</span> <span class="o">=</span> <span class="n">y</span><span class="o">.</span><span class="n">shares1</span>
<span class="k">return</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">shares0</span><span class="p">,</span> <span class="n">shares1</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PrivateTensor</span><span class="p">:</span>
<span class="n">shares0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shares0</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">shares0</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">shares1</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shares1</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">shares1</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">shares0</span><span class="p">,</span> <span class="n">shares1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PublicTensor</span><span class="p">:</span>
<span class="n">shares0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shares0</span> <span class="o">*</span> <span class="n">y</span><span class="o">.</span><span class="n">values</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">shares1</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shares1</span> <span class="o">*</span> <span class="n">y</span><span class="o">.</span><span class="n">values</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">shares0</span><span class="p">,</span> <span class="n">shares1</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PrivateTensor</span><span class="p">:</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">a_mul_b</span> <span class="o">=</span> <span class="n">generate_mul_triple</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">y</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">a</span><span class="p">)</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">()</span>
<span class="n">beta</span> <span class="o">=</span> <span class="p">(</span><span class="n">y</span> <span class="o">-</span> <span class="n">b</span><span class="p">)</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">()</span>
<span class="k">return</span> <span class="n">alpha</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="n">beta</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">alpha</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="n">b</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">a</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="n">beta</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">a_mul_b</span>
</code></pre></div></div>
<p>As seen, the adaptation is pretty straightforward using NumPy and the general form of for instance <code class="highlighter-rouge">PrivateTensor</code> is almost exactly the same, only occationally passing a shape around as well. There are a few technical details however, all of which are available in full in <a href="https://github.com/mortendahl/privateml/blob/master/spdz/Tensor%20SPDZ.ipynb">the associated notebook</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">share</span><span class="p">(</span><span class="n">secrets</span><span class="p">):</span>
<span class="n">shares0</span> <span class="o">=</span> <span class="n">sample_random_tensor</span><span class="p">(</span><span class="n">secrets</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">shares1</span> <span class="o">=</span> <span class="p">(</span><span class="n">secrets</span> <span class="o">-</span> <span class="n">shares0</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">shares0</span><span class="p">,</span> <span class="n">shares1</span>
<span class="k">def</span> <span class="nf">reconstruct</span><span class="p">(</span><span class="n">shares0</span><span class="p">,</span> <span class="n">shares1</span><span class="p">):</span>
<span class="n">secrets</span> <span class="o">=</span> <span class="p">(</span><span class="n">shares0</span> <span class="o">+</span> <span class="n">shares1</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">secrets</span>
<span class="k">def</span> <span class="nf">generate_mul_triple</span><span class="p">(</span><span class="n">x_shape</span><span class="p">,</span> <span class="n">y_shape</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">sample_random_tensor</span><span class="p">(</span><span class="n">x_shape</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">sample_random_tensor</span><span class="p">(</span><span class="n">y_shape</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">b</span><span class="p">),</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
</code></pre></div></div>
<p>As such, perhaps the biggest difference is in the above base utility methods where this shape is used.</p>
<h1 id="adapting-the-model">Adapting the Model</h1>
<p>While it is in principle possible to compute any function securely with what we already have, and hence also the base model from above, in practice it is relevant to first consider variants of the model that are more MPC friendly, and vice versa. In slightly more picturesque words, it is common to open up our two black boxes and adapt the two technologies to better fit each other.</p>
<p>The root of this comes from some operations being surprisingly expensive in the encrypted setting. We saw above that addition and multiplication are relatively cheap, yet comparison and division with private denominator are not. For this reason we make a few changes to the model to avoid these.</p>
<p>The various changes presented in this section as well as their simulation performances are available in full in the <a href="https://github.com/mortendahl/privateml/blob/master/image-analysis/Keras.ipynb">associated Python notebook</a>.</p>
<h2 id="optimizer">Optimizer</h2>
<p>The first issue involves the optimizer: while <a href="http://ruder.io/optimizing-gradient-descent/index.html#adam"><em>Adam</em></a> is a preferred choice in many implementations for its efficiency, it also involves taking a square root of a private value and using one as the denominator in a division. While it is theoretically possible to <a href="https://eprint.iacr.org/2012/164">compute these securely</a>, in practice it could be a significant bottleneck for performance and hence relevant to avoid.</p>
<p>A simple remedy is to switch to the <a href="http://ruder.io/optimizing-gradient-descent/index.html#momentum"><em>momentum SGD</em></a> optimizer, which may imply longer training time but only uses simple operations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span><span class="o">.</span><span class="nb">compile</span><span class="p">(</span>
<span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">SGD</span><span class="p">(</span><span class="n">clipnorm</span><span class="o">=</span><span class="mi">10000</span><span class="p">,</span> <span class="n">clipvalue</span><span class="o">=</span><span class="mi">10000</span><span class="p">),</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">])</span>
</code></pre></div></div>
<p>An additional caveat is that many optimizers use <a href="http://nmarkou.blogspot.fr/2017/07/deep-learning-why-you-should-use.html">clipping</a> to prevent gradients from growing too small or too large. This requires a <a href="https://www1.cs.fau.de/filepool/publications/octavian_securescm/smcint-scn10.pdf">comparison on private values</a>, which again is a somewhat expensive operation in the encrypted setting, and as a result we aim to avoid using this technique altogether. To get realistic results from our Keras simulation we increase the bounds as seen above.</p>
<h2 id="layers">Layers</h2>
<p>Speaking of comparisons, the <em>ReLU</em> and max-pooling layers poses similar problems. In <a href="https://www.microsoft.com/en-us/research/publication/cryptonets-applying-neural-networks-to-encrypted-data-with-high-throughput-and-accuracy/">CryptoNets</a> the former is replaced by a squaring function and the latter by average pooling, while <a href="https://eprint.iacr.org/2017/396">SecureML</a> implements a ReLU-like activation function by adding complexity that we wish to avoid to keep things simple. As such, we here instead use higher-degree sigmoid activation functions and average-pooling layers. Note that average-pooling also uses a division, yet this time the denominator is a public value, and hence division is simply a public inversion followed by a multiplication.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">feature_layers</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">28</span><span class="p">,</span> <span class="mi">28</span><span class="p">,</span> <span class="mi">1</span><span class="p">)),</span>
<span class="n">Activation</span><span class="p">(</span><span class="s">'sigmoid'</span><span class="p">),</span>
<span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">),</span>
<span class="n">Activation</span><span class="p">(</span><span class="s">'sigmoid'</span><span class="p">),</span>
<span class="n">AveragePooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">)),</span>
<span class="n">Dropout</span><span class="p">(</span><span class="o">.</span><span class="mi">25</span><span class="p">),</span>
<span class="n">Flatten</span><span class="p">()</span>
<span class="p">]</span>
<span class="n">classification_layers</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">),</span>
<span class="n">Activation</span><span class="p">(</span><span class="s">'sigmoid'</span><span class="p">),</span>
<span class="n">Dropout</span><span class="p">(</span><span class="o">.</span><span class="mi">50</span><span class="p">),</span>
<span class="n">Dense</span><span class="p">(</span><span class="n">NUM_CLASSES</span><span class="p">),</span>
<span class="n">Activation</span><span class="p">(</span><span class="s">'softmax'</span><span class="p">)</span>
<span class="p">]</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">(</span><span class="n">feature_layers</span> <span class="o">+</span> <span class="n">classification_layers</span><span class="p">)</span>
</code></pre></div></div>
<p><a href="https://github.com/mortendahl/privateml/blob/master/image-analysis/Keras.ipynb">Simulations</a> indicate that with this change we now have to bump the number of epochs, slowing down training time by an equal factor. Other choices of learning rate or momentum may improve this.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">))</span>
</code></pre></div></div>
<p>The remaining layers are easily dealt with. Dropout and flatten do not care about whether we’re in an encrypted or unencrypted setting, and dense and convolution are matrix dot products which only require basic operations.</p>
<h2 id="softmax-and-loss-function">Softmax and loss function</h2>
<p>The final <em>softmax</em> layer also causes complications for training in the encrypted setting as we need to compute both an <a href="https://cs.umd.edu/~fenghao/paper/modexp.pdf">exponentiation using a private exponent</a> as well as normalisation in the form of a division with a private denominator.</p>
<p>While both remain possible we here choose a much simpler approach and allow the predicted class likelihoods for each training sample to be revealed to one of the servers, who can then compute the result from the revealed values. This of course results in a privacy leakage that may or may not pose an acceptable risk.</p>
<p>One heuristic improvement is for the servers to first permute the vector of class likelihoods for each training sample before revealing anything, thereby hiding which likelihood corresponds to which class. However, this may be of little effect if e.g. “healthy” often means a narrow distribution over classes while “sick” means a spread distribution.</p>
<p>Another is to introduce a dedicated third server who only does this small computation, doesn’t see anything else from the training data, and hence cannot relate the labels with the sample data. Something is still leaked though, and this quantity is hard to reason about.</p>
<p>Finally, we could also replace this <a href="https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest">one-vs-all</a> approach with an <a href="https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-one">one-vs-one</a> approach using e.g. sigmoids. As argued earlier this allows us to fully compute the predictions without decrypting. We still need to compute the loss however, which could be done by also considering a different loss function.</p>
<p>Note that none of the issues mentioned here occur when later performing predictions using the trained network, as there is no loss to be computed and the servers can there simply skip the softmax layer and let the recipient of the prediction compute it himself on the revealed values: for him it’s simply a question of how the values are interpreted.</p>
<h2 id="transfer-learning">Transfer Learning</h2>
<p>At this point <a href="https://github.com/mortendahl/privateml/blob/master/image-analysis/Keras.ipynb">it seems</a> that we can actually train the model as-is and get decent results. But as often done in CNNs we can get significant speed-ups by employing <a href="http://cs231n.github.io/transfer-learning/">transfer</a> <a href="http://ruder.io/transfer-learning/">learning</a>; in fact, it is somewhat <a href="https://yashk2810.github.io/Transfer-Learning/">well-known</a> that “very few people train their own convolutional net from scratch because they don’t have sufficient data” and that “it is always recommended to use transfer learning in practice”.</p>
<p>A particular application to our setting here is that training may be split into a pre-training phase using non-sensitive public data and a fine-tuning phase using sensitive private data. For instance, in the case of a skin cancer detector, the researchers may choose to pre-train on a public set of photos and then afterwards ask volunteers to improve the model by providing additional photos.</p>
<p>Moreover, besides a difference in cardinality, there is also room for differences in the two data sets in terms of subjects, as CNNs have a tendency to first decompose these into meaningful subcomponents, the recognition of which is what is being transferred. In other words, the technique is strong enough for pre-training to happen on a different type of images than fine-tuning.</p>
<p>Returning to our concrete use-case of character recognition, we will let the “public” images be those of digits <code class="highlighter-rouge">0-4</code> and the “private” images be those of digits <code class="highlighter-rouge">5-9</code>. As an alternative, it doesn’t seem unreasonable to instead have used for instance characters <code class="highlighter-rouge">a-z</code> as the former and digits <code class="highlighter-rouge">0-9</code> as the latter.</p>
<h3 id="pre-train-on-public-dataset">Pre-train on public dataset</h3>
<p>In addition to avoiding the overhead of training on encrypted data for the public dataset, we also benefit from being able to train with more advanced optimizers. Here for instance, we switch back to the <code class="highlighter-rouge">Adam</code> optimizer for the public images and can take advantage of its improved training time. In particular, we see that we can again lower the number of epochs needed.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">),</span> <span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span> <span class="o">=</span> <span class="n">public_dataset</span>
<span class="n">model</span><span class="o">.</span><span class="nb">compile</span><span class="p">(</span>
<span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="s">'adam'</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">])</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">))</span>
</code></pre></div></div>
<p>Once happy with this the servers simply shares the model parameters and move on to training on the private dataset.</p>
<h3 id="fine-tune-on-private-dataset">Fine-tune on private dataset</h3>
<p>While we now begin encrypted training on model parameters that are already “half-way there” and hence can be expected to require fewer epochs, another benefit of transfer learning, as mentioned above, is that recognition of subcomponents tend to happen in the lower layers of the network and may in some cases be used as-is. As a result, we now freeze the parameters of the feature layers and focus training efforts exclusively on the classification layers.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="n">feature_layers</span><span class="p">:</span>
<span class="n">layer</span><span class="o">.</span><span class="n">trainable</span> <span class="o">=</span> <span class="bp">False</span>
</code></pre></div></div>
<p>Note however that we still need to run all private training samples forward through these layers; the only difference is that we skip them in the backward step and that there are few parameters to train.</p>
<p>Training is then performed as before, although now using a lower learning rate.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">),</span> <span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span> <span class="o">=</span> <span class="n">private_dataset</span>
<span class="n">model</span><span class="o">.</span><span class="nb">compile</span><span class="p">(</span>
<span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">SGD</span><span class="p">(</span><span class="n">clipnorm</span><span class="o">=</span><span class="mi">10000</span><span class="p">,</span> <span class="n">clipvalue</span><span class="o">=</span><span class="mi">10000</span><span class="p">,</span> <span class="n">lr</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">momentum</span><span class="o">=</span><span class="mf">0.0</span><span class="p">),</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">])</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">))</span>
</code></pre></div></div>
<p>In the end we go from 25 epochs to 5 epochs in the simulations.</p>
<h2 id="preprocessing">Preprocessing</h2>
<p>There are few preprocessing optimisations one could also apply but that we won’t consider further here.</p>
<p>The first is to move the computation of the frozen layers to the input provider so that it’s the output of the flatten layer that is shared with the servers instead of the pixels of the images. In this case the layers are said to perform <em>feature extraction</em> and we could potentially also use more powerful layers. However, if we want to keep the model proprietary then this adds significant complexity as the parameters now have to be distributed to the clients in some form.</p>
<p>Another typical approach to speed up training is to first apply dimensionality reduction techniques such as a <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">principal component analysis</a>. This approach is taken in the encrypted setting in <a href="https://eprint.iacr.org/2017/857">BSS+’17</a>.</p>
<h1 id="adapting-the-protocol">Adapting the Protocol</h1>
<p>Having looked at the model we next turn to the protocol: as well shall see, understanding the <a href="https://github.com/wiseodd/hipsternet">operations</a> we want to perform can help speed things up.</p>
<p>In particular, a lot of the computation can be moved to the crypto provider, who’s generated raw material is independent of the private inputs and to some extend even the model. As such, its computation may be done in advance whenever it’s convenient and at large scale.</p>
<p>Recall from earlier that it’s relevant to optimise both round and communication complexity, and the extensions suggested here are often aimed at improving these at the expense of additional local computation. As such, practical experiments are needed to validate their benefits under concrete conditions.</p>
<h2 id="dropout">Dropout</h2>
<p>Starting with the easiest type of layer, we notice that nothing special related to secure computation happens here, and the only thing is to make sure that the two servers agree on which values to drop in each training iteration. This can be done by simply agreeing on a seed value.</p>
<h2 id="average-pooling">Average pooling</h2>
<p>The forward pass of average pooling only requires a summation followed by a division with a public denominator. Hence, it can be implemented by a multiplication with a public value: since the denominator is public we can easily find its inverse and then simply multiply and truncate. Likewise, the backward pass is simply a scaling, and hence both directions are entirely local operations.</p>
<h2 id="dense-layers">Dense layers</h2>
<p>The dot product needed for both the forward and backward pass of dense layers can of course be implemented in the typical fashion using multiplication and addition. If we want to compute <code class="highlighter-rouge">dot(x, y)</code> for matrices <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y</code> with shapes respectively <code class="highlighter-rouge">(m, k)</code> and <code class="highlighter-rouge">(k, n)</code> then this requires <code class="highlighter-rouge">m * n * k</code> multiplications, meaning we have to communicate the same number of masked values. While these can all be sent in parallel so we only need one round, if we allow ourselves to use another kind of preprocessed triple then we can reduce the communication cost by an order of magnitude.</p>
<p>For instance, the second dense layer in our model computes a dot product between a <code class="highlighter-rouge">(32, 128)</code> and a <code class="highlighter-rouge">(128, 5)</code> matrix. Using the typical approach requires sending <code class="highlighter-rouge">32 * 5 * 128 == 22400</code> masked values per batch, but by using the preprocessed triples described below we instead only have to send <code class="highlighter-rouge">32 * 128 + 5 * 128 == 4736</code> values, almost a factor 5 improvement. For the first dense layer it is even greater, namely slightly more than a factor 25.</p>
<p>As also noted <a href="/2017/09/10/the-spdz-protocol-part2/">previously</a>, the trick is to ensure that each private value in the matrices is only sent masked once. To make this work we need triples <code class="highlighter-rouge">(a, b, c)</code> of random matrices <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code> with the appropriate shapes and such that <code class="highlighter-rouge">c == dot(a, b)</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">generate_dot_triple</span><span class="p">(</span><span class="n">x_shape</span><span class="p">,</span> <span class="n">y_shape</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">sample_random_tensor</span><span class="p">(</span><span class="n">x_shape</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">sample_random_tensor</span><span class="p">(</span><span class="n">y_shape</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">b</span><span class="p">),</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
</code></pre></div></div>
<p>Given such a triple we can instead communicate the values of <code class="highlighter-rouge">alpha = x - a</code> and <code class="highlighter-rouge">beta = y - b</code> followed by a local computation to obtain <code class="highlighter-rouge">dot(x, y)</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PrivateTensor</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">dot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PublicTensor</span><span class="p">:</span>
<span class="n">shares0</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">shares0</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">y</span><span class="o">.</span><span class="n">values</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">shares1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">shares1</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">y</span><span class="o">.</span><span class="n">values</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateTensor</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">shares0</span><span class="p">,</span> <span class="n">shares1</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PrivateTensor</span><span class="p">:</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">a_dot_b</span> <span class="o">=</span> <span class="n">generate_dot_triple</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">y</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">a</span><span class="p">)</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">()</span>
<span class="n">beta</span> <span class="o">=</span> <span class="p">(</span><span class="n">y</span> <span class="o">-</span> <span class="n">b</span><span class="p">)</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">()</span>
<span class="k">return</span> <span class="n">alpha</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">beta</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">alpha</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">b</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">a</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">beta</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">a_dot_b</span>
</code></pre></div></div>
<p>Security of using these triples follows the same argument as for multiplication triples: the communicated masked values perfectly hides <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y</code> while <code class="highlighter-rouge">c</code> being an independent fresh sharing makes sure that the result cannot leak anything about its constitutes.</p>
<p>Note that this kind of triple is used in <a href="https://eprint.iacr.org/2017/396">SecureML</a>, which also give techniques allowing the servers to generate them without the help of the crypto provider.</p>
<h2 id="convolutions">Convolutions</h2>
<p>Like dense layers, convolutions can be treated either as a series of scalar multiplications or as <a href="http://cs231n.github.io/convolutional-networks/#conv">a matrix multiplication</a>, although the latter only after first expanding the tensor of training samples into a matrix with significant duplication. Unsurprisingly this leads to communication costs that in both cases can be improved by introducing another kind of triple.</p>
<p>As an example, the first convolution maps a tensor with shape <code class="highlighter-rouge">(m, 28, 28, 1)</code> to one with shape <code class="highlighter-rouge">(m, 28, 28, 32)</code> using <code class="highlighter-rouge">32</code> filters of shape <code class="highlighter-rouge">(3, 3, 1)</code> (excluding the bias vector). For batch size <code class="highlighter-rouge">m == 32</code> this means <code class="highlighter-rouge">7,225,344</code> communicated elements if we’re using only scalar multiplications, and <code class="highlighter-rouge">226,080</code> if using a matrix multiplication. However, since there are only <code class="highlighter-rouge">(32*28*28) + (32*3*3) == 25,376</code> private values involved in total (again not counting bias since they only require addition), we see that there is roughly a factor <code class="highlighter-rouge">9</code> overhead. In other words, each private value is being masked and sent several times. With a new kind of triple we can remove this overhead and save on communication cost: for 64 bit elements this means <code class="highlighter-rouge">200KB</code> per batch instead of respectively <code class="highlighter-rouge">1.7MB</code> and <code class="highlighter-rouge">55MB</code>.</p>
<p>The triples <code class="highlighter-rouge">(a, b, c)</code> we need here are similar to those used in dot products, with <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code> having shapes matching the two inputs, i.e. <code class="highlighter-rouge">(m, 28, 28, 1)</code> and <code class="highlighter-rouge">(32, 3, 3, 1)</code>, and <code class="highlighter-rouge">c</code> matching output shape <code class="highlighter-rouge">(m, 28, 28, 32)</code>.</p>
<h2 id="sigmoid-activations">Sigmoid activations</h2>
<p>As done <a href="/2017/04/17/private-deep-learning-with-mpc/#approximating-sigmoid">earlier</a>, we may use a degree-9 polynomial to approximate the sigmoid activation function with a sufficient level of accuracy. Evaluating this polynomial for a private value <code class="highlighter-rouge">x</code> requires computing a series of powers of <code class="highlighter-rouge">x</code>, which of course may be done by sequential multiplication – but this means several rounds and corresponding amount of communication.</p>
<p>As an alternative we can again use a new kind of preprocessed triple that allows us to compute all required powers in a single round. As shown <a href="/2017/09/10/the-spdz-protocol-part2/">previously</a>, the length of these “triples” is not fixed but equals the highest exponent, such that a triple for e.g. squaring consists of independent sharings of <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">a**2</code>, while one for cubing consists of independent sharings of <code class="highlighter-rouge">a</code>, <code class="highlighter-rouge">a**2</code>, and <code class="highlighter-rouge">a**3</code>.</p>
<p>Once we have these powers of <code class="highlighter-rouge">x</code>, evaluating a polynomial with public coefficients is then just a local weighted sum. The security of this again follows from the fact that all powers in the triple are independently shared.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">pol_public</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">coeffs</span><span class="p">,</span> <span class="n">triple</span><span class="p">):</span>
<span class="n">powers</span> <span class="o">=</span> <span class="n">pows</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">triple</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span> <span class="n">xe</span> <span class="o">*</span> <span class="n">ce</span> <span class="k">for</span> <span class="n">xe</span><span class="p">,</span> <span class="n">ce</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">powers</span><span class="p">,</span> <span class="n">coeffs</span><span class="p">)</span> <span class="p">)</span>
</code></pre></div></div>
<p>We have the same caveat related to fixed-point precision as <a href="/2017/09/10/the-spdz-protocol-part2/">earlier</a> though, namely that we need more room for the higher precision of the powers: <code class="highlighter-rouge">x**n</code> has <code class="highlighter-rouge">n</code> times the precision of <code class="highlighter-rouge">x</code> and we want to make sure that it does not wrap around modulo <code class="highlighter-rouge">Q</code> since then we cannot decode correctly anymore. As done there, we can solve this by introducing a sufficiently larger field <code class="highlighter-rouge">P</code> to which we temporarily <a href="/2017/09/10/the-spdz-protocol-part2/">switch</a> while computing the powers, at the expense of two extra rounds of communication.</p>
<p>Practical experiments can show whether it best to stay in <code class="highlighter-rouge">Q</code> and use a few more multiplication rounds, or perform the switch and pay for conversion and arithmetic on larger numbers. Specifically, for low degree polynomials the former is likely better.</p>
<h1 id="proof-of-concept-implementation">Proof of Concept Implementation</h1>
<p>A <a href="https://github.com/mortendahl/privateml/tree/master/image-analysis/">proof-of-concept implementation</a> without networking is available for experimentation and reproducibility. Still a work in progress, the code currently supports training a new classifier from encrypted features, but not feature extraction on encrypted images. In other words, it assumes that the input providers themselves run their images through the feature extraction layers and send the results in encrypted form to the servers; as such, the weights for that part of the model are currently not kept private. A future version will address this and allow training and predictions directly from images by enabling the feature layers to also run on encrypted data.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">pond.nn</span> <span class="kn">import</span> <span class="n">Sequential</span><span class="p">,</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">Sigmoid</span><span class="p">,</span> <span class="n">Dropout</span><span class="p">,</span> <span class="n">Reveal</span><span class="p">,</span> <span class="n">Softmax</span><span class="p">,</span> <span class="n">CrossEntropy</span>
<span class="kn">from</span> <span class="nn">pond.tensor</span> <span class="kn">import</span> <span class="n">PrivateEncodedTensor</span>
<span class="n">classifier</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">([</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="mi">6272</span><span class="p">),</span>
<span class="n">Sigmoid</span><span class="p">(),</span>
<span class="n">Dropout</span><span class="p">(</span><span class="o">.</span><span class="mi">5</span><span class="p">),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">128</span><span class="p">),</span>
<span class="n">Reveal</span><span class="p">(),</span>
<span class="n">Softmax</span><span class="p">()</span>
<span class="p">])</span>
<span class="n">classifier</span><span class="o">.</span><span class="n">initialize</span><span class="p">()</span>
<span class="n">classifier</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">PrivateEncodedTensor</span><span class="p">(</span><span class="n">x_train_features</span><span class="p">),</span>
<span class="n">PrivateEncodedTensor</span><span class="p">(</span><span class="n">y_train</span><span class="p">),</span>
<span class="n">loss</span><span class="o">=</span><span class="n">CrossEntropy</span><span class="p">(),</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">3</span>
<span class="p">)</span>
</code></pre></div></div>
<p>The code is split into several Python notebooks, and comes with a set of precomputed weights that allows for skipping some of the steps:</p>
<ul>
<li>
<p>The first one deals with <a href="https://github.com/mortendahl/privateml/tree/master/image-analysis/Pre-training.ipynb">pre-training on the public data</a> using Keras, and produces the model used for feature extraction. This step can be skipped by using the repository’s precomputed weights instead.</p>
</li>
<li>
<p>The second one applies the above model to do <a href="https://github.com/mortendahl/privateml/tree/master/image-analysis/Feature%20extraction.ipynb">feature extraction on the private data</a>, thereby producing the features used for training the new encrypted classifier. In future versions this will be done by first encrypting the data. This step cannot be skipped as the extracted data is too large.</p>
</li>
<li>
<p>The third takes the extracted features and <a href="https://github.com/mortendahl/privateml/tree/master/image-analysis/Fine-tuning.ipynb">trains a new encrypted classifier</a>. This is by far the most expensive step and may be skipped by using the repository’s precomputed weights instead.</p>
</li>
<li>
<p>Finally, the fourth notebook uses the new classifier to perform <a href="https://github.com/mortendahl/privateml/tree/master/image-analysis/Prediction.ipynb">encrypted predictions</a> from new images. Again feature extraction is currently done unencrypted.</p>
</li>
</ul>
<p>Running the code is a matter of cloning the repository</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git clone https://github.com/mortendahl/privateml.git <span class="o">&&</span> <span class="se">\</span>
<span class="nb">cd </span>privateml/image-analysis/
</code></pre></div></div>
<p>installing the dependencies</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>pip3 install jupyter numpy tensorflow keras h5py
</code></pre></div></div>
<p>launching a notebook</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>jupyter notebook
</code></pre></div></div>
<p>and navigating to either of the four notebooks mentioned above.</p>
<!--
## Running on GCE
Since especially the encrypted training is a rather lengthy process, it might be worth running at least this part on e.g. a remote cloud instance. To use the [Google Compute Engine](https://cloud.google.com/compute/) one can do the following, after setting up [`gcloud`](https://cloud.google.com/sdk/) (which is also available in Homebrew as `brew cask info google-cloud-sdk`).
We first set up a fresh compute instance to function as out notebook server and connect to it.
```bash
laptop$ gcloud compute instances create server \
--custom-cpu=1 \
--custom-memory=6GB
laptop$ gcloud compute ssh server -- -L 8888:localhost:8888
```
Once connected we install dependencies, pull down the notebooks, and launch Jupyter. Note that we do the latter in a screen to let the notebook computations run even if we disconnect our SSH session.
```bash
server$ sudo apt-get update && \
sudo apt-get install -y python3 python3-pip git && \
sudo pip3 install jupyter numpy tensorflow keras
server$ git clone https://github.com/mortendahl/privateml.git && \
cd privateml/image-analysis/
server$ screen jupyter notebook
```
```bash
```
```bash
server$ screen jupyter notebook
```
`ctrl+a d`
```bash
laptop$ gcloud compute ssh server -- -L 8888:localhost:8888
server$ screen -r
```
```bash
## Stop GCP instance
gcloud compute instances stop server
```
-->
<h1 id="thoughts">Thoughts</h1>
<p>As always, when previous thoughts and questions have been answered there is already a new batch waiting.</p>
<h2 id="generalised-triples">Generalised triples</h2>
<p>When seeking to reduce communication, one may also wonder how much can be pushed to the preprocessing phase in the form of additional types of triples.</p>
<p>As mentioned several times (and also suggested in e.g. <a href="https://eprint.iacr.org/2017/1234">BCG+’17</a>), we typically seek to ensure that each private value is only sent masked once. So if we are e.g. computing both <code class="highlighter-rouge">dot(x, y)</code> and <code class="highlighter-rouge">dot(x, z)</code> then it might make sense to have a triple <code class="highlighter-rouge">(r, s, t, u, v)</code> where <code class="highlighter-rouge">r</code> is used to mask <code class="highlighter-rouge">x</code>, <code class="highlighter-rouge">s</code> to mask <code class="highlighter-rouge">y</code>, <code class="highlighter-rouge">u</code> to mask <code class="highlighter-rouge">z</code>, and <code class="highlighter-rouge">t</code> and <code class="highlighter-rouge">u</code> are used to compute the result. This pattern happens during training for instance, where values computed during the forward pass are sometimes cached and reused during the backward pass.</p>
<p>Perhaps more importantly though is when we are only making predictions with a model, i.e. computing with fixed private weights. In this case we only want to <a href="/2017/09/10/the-spdz-protocol-part2">mask the weights once and then reuse</a> these for each prediction. Doing so means we only have to mask and communicate proportionally to the input tensor flowing through the model, as opposed to propotionally to both the input tensor and the weights, as also done in e.g. <a href="https://arxiv.org/abs/1801.05507">JVC’18</a>. More generally, we ideally want to communicate proportionally only to the values that change, which can be achieved (in an amortised sense) using tailored triples.</p>
<p>Finally, it is in principle also possible to have <a href="/2017/09/10/the-spdz-protocol-part2">triples for more advanced functions</a> such as evaluating both a dense layer and its activation function with a single round of communication, but the big obstacle here seems to be scalability in terms of triple storage and amount of computation needed for the recombination step, especially when working with tensors.</p>
<h2 id="activation-functions">Activation functions</h2>
<p>A natural question is which of the other typical activation functions are efficient in the encrypted setting. As mentioned above, <a href="https://eprint.iacr.org/2017/396">SecureML</a> makes use of ReLU by temporarily switching to garbled circuits, and <a href="https://arxiv.org/abs/1711.05189">CryptoDL</a> gives low-degree polynomial approximations to both Sigmoid, ReLU, and Tanh (using <a href="https://en.wikipedia.org/wiki/Chebyshev_polynomials">Chebyshev polynomials</a> for <a href="http://www.chebfun.org/docs/guide/guide04.html#47-the-runge-phenomenon">better accuracy</a>).</p>
<p>It may also be relevant to consider non-typical but simpler activations functions, such as squaring as in e.g. <a href="https://www.microsoft.com/en-us/research/publication/cryptonets-applying-neural-networks-to-encrypted-data-with-high-throughput-and-accuracy/">CryptoNets</a>, if for nothing else than simplifying both computation and communication.</p>
<h2 id="garbled-circuits">Garbled circuits</h2>
<p>While mentioned above only as a way of securely evaluating more advanced activation functions, <a href="https://oblivc.org/">garbled</a> <a href="https://github.com/encryptogroup/ABY">circuits</a> could in fact also be used for larger parts, including as the main means of secure computation as done in for instance <a href="https://arxiv.org/abs/1705.08963">DeepSecure</a>.</p>
<p>Compared to e.g. SPDZ this technique has the benefit of using only a constant number of communication rounds. The downside is that operations are now often happening on bits instead of on larger field elements, meaning more computation is involved.</p>
<h2 id="precision">Precision</h2>
<p>A lot of the research around <a href="https://research.googleblog.com/2017/04/federated-learning-collaborative.html">federated learning</a> involve <a href="https://arxiv.org/abs/1610.05492">gradient compression</a> in order to save on communication cost. Closer to our setting we have <a href="https://eprint.iacr.org/2017/1114">BMMP’17</a> which uses quantization to apply homomorphic encryption to deep learning, and even <a href="https://arxiv.org/abs/1610.02132">unencrypted</a> <a href="https://www.tensorflow.org/performance/quantization">production-ready</a> systems often consider this technique as a way of improving performance also in terms of <a href="https://ai.intel.com/lowering-numerical-precision-increase-deep-learning-performance/">learning</a>.</p>
<h2 id="floating-point-arithmetic">Floating point arithmetic</h2>
<p>Above we used a fixed-point encoding of real numbers into field elements, yet unencrypted deep learning is typically using a floating point encoding. As shown in <a href="https://eprint.iacr.org/2012/405">ABZS’12</a> and <a href="https://github.com/bristolcrypto/SPDZ-2/issues/7">the reference implementation of SPDZ</a>, it is also possible to use the latter in the encrypted setting, apparently with performance advantages for certain operations.</p>
<h2 id="gpus">GPUs</h2>
<p>Since deep learning is typically done on GPUs today for performance reasons, it is natural to consider whether similar speedups can be achieved by applying them in MPC computations. Some <a href="https://www.cs.virginia.edu/~shelat/papers/hms13-gpuyao.pdf">work</a> exist on this topic for garbled circuits, yet it seems less popular in the secret sharing setting of e.g. SPDZ.</p>
<p>Biggest problem here might be maturity and availability of arbitrary precision arithmetic on GPUs (but see e.g. <a href="http://www.comp.hkbu.edu.hk/~chxw/fgc_2010.pdf">this</a> and <a href="https://github.com/skystar0227/CUMP">that</a>) as needed for computations on field elements larger than e.g. 64 bits. Two things might be worth keeping in mind here though: firstly, while the values we compute on are larger than those natively supported, they are still bounded by the modulus; and secondly, we can do our secure computations over a ring instead of a field.</p>
<!--
One potential remedy is to decompose numbers using the [CRT](https://en.wikipedia.org/wiki/Chinese_remainder_theorem) into several components that are computed on in parallel. For this to work we would need to do our computations over a ring instead of a field, since our modulus must now be a composite number as opposed to a prime.
-->
<!--
# Old
https://eprint.iacr.org/2017/262.pdf
Pooling in MPC:
- doing entirely out of fashion: https://arxiv.org/abs/1412.6806 and http://cs231n.github.io/convolutional-networks/ -- use larger stride in CONV layer once in a while
in numpy:
- https://wiseodd.github.io/techblog/2016/07/16/convnet-conv-layer/
- https://github.com/andersbll/nnet
Gradient Compression
- https://arxiv.org/pdf/1610.02132.pdf
https://eprint.iacr.org/2016/1117.pdf
- https://stackoverflow.com/questions/36515202/why-is-the-cross-entropy-method-preferred-over-mean-squared-error-in-what-cases
- https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/
-->Morten DahlTL;DR: we take a typical CNN deep learning model and go through a series of steps that enable both training and prediction to instead be done on encrypted data.The SPDZ Protocol, Part 12017-09-03T12:00:00+00:002017-09-03T12:00:00+00:00https://mortendahl.github.io/2017/09/03/the-spdz-protocol-part1<p><em><strong>This post is still very much a work in progress.</strong></em></p>
<p><em><strong>TL;DR:</strong> this is the first in a series of posts explaining a state-of-the-art protocol for secure computation.</em></p>
<p>In this blog post we’ll go through the state-of-the-art SPDZ protocol for secure computation. Unlike the protocol used in <a href="/2017/04/17/private-deep-learning-with-mpc/">a previous blog post</a>, SPDZ allows us to have as few as two parties computing on private values. Moreover, it has received significant scientific attention over the last few years and as a result several optimisations are known that can used to speed up our computation.</p>
<p>In this series we’ll go through and describe the state-of-the-art SPDZ protocol for secure computation. Unlike the protocol used in <a href="/2017/04/17/private-deep-learning-with-mpc/">a previous blog post</a>, SPDZ allows us to have as few as two parties computing on private values and it allows us to move parts of the computation to an <em>offline</em> phase in order to gain a more performant <em>online</em> phase. Moreover, it has received significant scientific attention over the last few years that resulted in various optimisations and efficient implementations.</p>
<p>The code for this section is available in <a href="https://github.com/mortendahl/privateml/blob/master/image-analysis/Basic%20SPDZ.ipynb">this associated notebook</a>.</p>
<h1 id="background">Background</h1>
<p>The protocol was first described in <a href="https://eprint.iacr.org/2011/535">SPZD’12</a> and <a href="https://eprint.iacr.org/2012/642">DKLPSS’13</a>, but have also been the subject of at least <a href="https://bristolcrypto.blogspot.fr/2016/10/what-is-spdz-part-1-mpc-circuit.html">one series of blog posts</a>. Several implementations exist, including <a href="https://www.cs.bris.ac.uk/Research/CryptographySecurity/SPDZ/">one</a> from the <a href="http://www.cs.bris.ac.uk/Research/CryptographySecurity/">cryptography group</a> at the University of Bristol providing both high performance and full active security.</p>
<p>As usual, all computations take place in a finite ring, often identified by a prime modulus <code class="highlighter-rouge">Q</code>. As we will see, this means we also need a way to encode the fixed-point numbers used by the CNNs as integers modulo a prime, and we have to take care that these never “wrap around” as we then may not be able to recover the correct result.</p>
<p>Moreover, while the computational resources used by a procedure is often only measured in time complexity, i.e. the time it takes the CPU to perform the computation, with interactive computations such as the SPDZ protocol it also becomes relevant to consider communication and round complexity. The former measures the number of bits sent across the network, which is a relatively slow process, and the latter the number of synchronisation points needed between the two parties, which may block one of them with nothing to do until the other catches up. Both hence also have a big impact on overall executing time.</p>
<p>Concretely, we have an interest in keeping <code class="highlighter-rouge">Q</code> is small as possible, not only because we can then do arithmetic operations using only a single word sized operations (as opposed to arbitrary precision arithmetic which is significantly slower), but also because we have to transmit less bits when sending field elements across the network.</p>
<p>Note that while the protocol in general supports computations between any number of parties we here present it for the two-party setting only. Moreover, as mentioned earlier, we aim only for passive security and assume a crypto provider that will honestly generate the needed triples.</p>
<p>Note that while the protocol in general supports computations between any number of parties we here use and specialise it for the two-party setting only. Moreover, as mentioned earlier, we aim only for passive security and assume a crypto provider that will honestly generate the needed triples.</p>
<h1 id="setting">Setting</h1>
<p>We will assume that the training data set is jointly held by a set of <em>input providers</em> and that the training is performed by two distinct <em>servers</em> (or <em>parties</em>) that are trusted not to collaborate beyond what our protocol specifies. In practice, these servers could for instance be virtual instances in a shared cloud environment operated by two different organisations.</p>
<p>The input providers are only needed in the very beginning to transmit their training data; after that all computations involve only the two servers, meaning it is indeed plausible for the input providers to use e.g. mobile phones. Once trained, the model will remain jointly held in encrypted form by the two servers where anyone can use it to make further encrypted predictions.</p>
<p>For technical reasons we also assume a distinct <em>crypto producer</em> that generates certain raw material used during the computation for increased efficiency; there are ways to eliminate this additional entity but we won’t go into that here.</p>
<p>Finally, in terms of security we aim for a typical notion used in practice, namely <em>honest-but-curious (or passive) security</em>, where the servers are assumed to follow the protocol but may otherwise try to learn as much possible from the data they see. While a slightly weaker notion than <em>fully malicious (or active) security</em> with respect to the servers, this still gives strong protection against anyone who may compromise one of the servers <em>after</em> the computations, despite what they do. Note that for the purpose of this blog post we will actually allow a small privacy leakage during training as detailed later.</p>
<h1 id="secure-computation-with-spdz">Secure Computation with SPDZ</h1>
<h2 id="sharing-and-reconstruction">Sharing and reconstruction</h2>
<p>Sharing a private value between the two servers is done using the simple <a href="/2017/06/04/secret-sharing-part1/#additive-sharing">additive scheme</a>. This may be performed by anyone, including an input provider, and keeps the value <a href="https://en.wikipedia.org/wiki/Information-theoretic_security">perfectly private</a> as long as the servers are not colluding.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">share</span><span class="p">(</span><span class="n">secret</span><span class="p">):</span>
<span class="n">share0</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span>
<span class="n">share1</span> <span class="o">=</span> <span class="p">(</span><span class="n">secret</span> <span class="o">-</span> <span class="n">share0</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="p">[</span><span class="n">share0</span><span class="p">,</span> <span class="n">share1</span><span class="p">]</span>
</code></pre></div></div>
<p>And when specified by the protocol, the private value can be reconstruct by a server sending his share to the other.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">reconstruct</span><span class="p">(</span><span class="n">share0</span><span class="p">,</span> <span class="n">share1</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="n">share0</span> <span class="o">+</span> <span class="n">share1</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
</code></pre></div></div>
<p>Of course, if both parties are to learn the private value then they can send their share simultaneously and hence still only use one round of communication.</p>
<p>Note that the use of an additive scheme means the servers are required to be highly robust, unlike e.g. <a href="/2017/06/04/secret-sharing-part1/">Shamir’s scheme</a> which may handle some servers dropping out. If this is a reasonable assumption though, then additive sharing provides significant advantages.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PrivateValue</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">share0</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">share1</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">value</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">share0</span><span class="p">,</span> <span class="n">share1</span> <span class="o">=</span> <span class="n">share</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">share0</span> <span class="o">=</span> <span class="n">share0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">share1</span> <span class="o">=</span> <span class="n">share1</span>
<span class="k">def</span> <span class="nf">reconstruct</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">PublicValue</span><span class="p">(</span><span class="n">reconstruct</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">share0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">share1</span><span class="p">))</span>
</code></pre></div></div>
<h2 id="linear-operations">Linear operations</h2>
<p>Having obtained sharings of private values we may next perform certain operations on these. The first set of these is what we call linear operations since they allow us to form linear combinations of private values.</p>
<p>The first are addition and subtraction, which are simple local computations on the shares already held by each server. And if one of the values is public then we may simplify.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PrivateValue</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PublicValue</span><span class="p">:</span>
<span class="n">share0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">share0</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">share1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">share1</span>
<span class="k">return</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">share0</span><span class="p">,</span> <span class="n">share1</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PrivateValue</span><span class="p">:</span>
<span class="n">share0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">share0</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">share0</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">share1</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">share1</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">share1</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">share0</span><span class="p">,</span> <span class="n">share1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">sub</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PublicValue</span><span class="p">:</span>
<span class="n">share0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">share0</span> <span class="o">-</span> <span class="n">y</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">share1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">share1</span>
<span class="k">return</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">share0</span><span class="p">,</span> <span class="n">share1</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PrivateValue</span><span class="p">:</span>
<span class="n">share0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">share0</span> <span class="o">-</span> <span class="n">y</span><span class="o">.</span><span class="n">share0</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">share1</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">share1</span> <span class="o">-</span> <span class="n">y</span><span class="o">.</span><span class="n">share1</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">share0</span><span class="p">,</span> <span class="n">share1</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
<span class="k">assert</span> <span class="n">z</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">()</span> <span class="o">==</span> <span class="mi">8</span>
</code></pre></div></div>
<p>Next we may also perform multiplication with a public value by again only performing a local operation on the share already held by each server.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PrivateValue</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PublicValue</span><span class="p">:</span>
<span class="n">share0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">share0</span> <span class="o">*</span> <span class="n">y</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">share1</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">share1</span> <span class="o">*</span> <span class="n">y</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">share0</span><span class="p">,</span> <span class="n">share1</span><span class="p">)</span>
</code></pre></div></div>
<p>Note that the security of these operations is straight-forward since no communication is taking place between the two parties and hence nothing new could have been revealed.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">PublicValue</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="n">y</span>
<span class="k">assert</span> <span class="n">z</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">()</span> <span class="o">==</span> <span class="mi">15</span>
</code></pre></div></div>
<h2 id="multiplication">Multiplication</h2>
<p>Multiplication of two private values is where we really start to deviate from the protocol used <a href="/2017/04/17/private-deep-learning-with-mpc/">previously</a>. The techniques used there inherently need at least three parties so won’t be much help in our two party setting.</p>
<p>Perhaps more interesting though, is that the new techniques used here allow us to shift parts of the computation to an <em>offline phase</em> where <em>raw material</em> that doesn’t depend on any of the private values can be generated at convenience. As we shall see later, this can be used to significantly speed up the <em>online phase</em> where training and prediction is taking place.</p>
<p>This raw material is popularly called a <em>multiplication triple</em> (and sometimes <em>Beaver triple</em> due to their introduction in <a href="https://scholar.google.com/scholar?cluster=14306306930077045887">Beaver’91</a>) and consists of independent sharings of three values <code class="highlighter-rouge">a</code>, <code class="highlighter-rouge">b</code>, and <code class="highlighter-rouge">c</code> such that <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code> are uniformly random values and <code class="highlighter-rouge">c == a * b % Q</code>. Here we assume that these triples are generated by the crypto provider, and the resulting shares distributed to the two parties ahead of running the online phase. In other words, when performing a multiplication we assume that <code class="highlighter-rouge">Pi</code> already knows <code class="highlighter-rouge">a[i]</code>, <code class="highlighter-rouge">b[i]</code>, and <code class="highlighter-rouge">c[i]</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">generate_mul_triple</span><span class="p">():</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="p">(</span><span class="n">a</span> <span class="o">*</span> <span class="n">b</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="n">b</span><span class="p">),</span> <span class="n">PrivateValue</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
</code></pre></div></div>
<p>Note that a large portion of efforts in <a href="https://eprint.iacr.org/2016/505">current</a> <a href="https://eprint.iacr.org/2017/1230">research</a> and the <a href="https://www.cs.bris.ac.uk/Research/CryptographySecurity/SPDZ/">full reference implementation</a> is spent on removing the crypto provider and instead letting the parties generate these triples on their own; we won’t go into that here but see the resources pointed to earlier for details.</p>
<p>To use multiplication triples to compute the product of two private values <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y</code> we proceed as follows. The idea is simply to use <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code> to respectively mask <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y</code> and then reconstruct the masked values as respectively <code class="highlighter-rouge">alpha</code> and <code class="highlighter-rouge">beta</code>. As public values, <code class="highlighter-rouge">alpha</code> and <code class="highlighter-rouge">beta</code> may then be combined locally by each server to form a sharing of <code class="highlighter-rouge">z == x * y</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PrivateValue</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PublicValue</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="ow">is</span> <span class="n">PrivateValue</span><span class="p">:</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">a_mul_b</span> <span class="o">=</span> <span class="n">generate_mul_triple</span><span class="p">()</span>
<span class="c"># local masking followed by communication of the reconstructed values</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">a</span><span class="p">)</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">()</span>
<span class="n">beta</span> <span class="o">=</span> <span class="p">(</span><span class="n">y</span> <span class="o">-</span> <span class="n">b</span><span class="p">)</span><span class="o">.</span><span class="n">reconstruct</span><span class="p">()</span>
<span class="c"># local re-combination</span>
<span class="k">return</span> <span class="n">alpha</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="n">beta</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">alpha</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="n">b</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">a</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="n">beta</span><span class="p">)</span> <span class="o">+</span> \
<span class="n">a_mul_b</span>
</code></pre></div></div>
<p>If we write out the equations we see that <code class="highlighter-rouge">alpha * beta == xy - xb - ay + ab</code>, <code class="highlighter-rouge">a * beta == ay - ab</code>, and <code class="highlighter-rouge">b * alpha == bx - ab</code>, so that the sum of these with <code class="highlighter-rouge">c</code> cancels out everything except <code class="highlighter-rouge">xy</code>. In terms of complexity we see that communication of two field elements in one round is required.</p>
<p>Finally, since <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y</code> are <a href="https://en.wikipedia.org/wiki/Information-theoretic_security">perfectly hidden</a> by <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code>, neither server learns anything new as long as each triple is only used once. Moreover, the newly formed sharing of <code class="highlighter-rouge">z</code> is “fresh” in the sense that it contains no information about the sharings of <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y</code> that were used in its construction, since the sharing of <code class="highlighter-rouge">c</code> was independent of the sharings of <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code>.</p>
<h1 id="encoding-values">Encoding Values</h1>
<h2 id="signed-integers">Signed integers</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">encode_integer</span><span class="p">(</span><span class="n">integer</span><span class="p">):</span>
<span class="n">element</span> <span class="o">=</span> <span class="n">integer</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">element</span>
<span class="k">def</span> <span class="nf">decode_integer</span><span class="p">(</span><span class="n">element</span><span class="p">):</span>
<span class="n">integer</span> <span class="o">=</span> <span class="n">element</span> <span class="k">if</span> <span class="n">element</span> <span class="o"><=</span> <span class="n">Q</span><span class="o">//</span><span class="mi">2</span> <span class="k">else</span> <span class="n">element</span> <span class="o">-</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">integer</span>
</code></pre></div></div>
<h2 id="fixedpoint-numbers">Fixedpoint numbers</h2>
<p>The last step is to provide a mapping between the rational numbers used by the CNNs and the field elements used by the SPDZ protocol. As typically done, we here take a fixed-point approach where rational numbers are scaled by a fixed amount and then rounded off to an integer less than the field size <code class="highlighter-rouge">Q</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">encode</span><span class="p">(</span><span class="n">rational</span><span class="p">,</span> <span class="n">precision</span><span class="o">=</span><span class="mi">6</span><span class="p">):</span>
<span class="n">upscaled</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">rational</span> <span class="o">*</span> <span class="mi">10</span><span class="o">**</span><span class="n">precision</span><span class="p">)</span>
<span class="n">field_element</span> <span class="o">=</span> <span class="n">upscaled</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">field_element</span>
<span class="k">def</span> <span class="nf">decode</span><span class="p">(</span><span class="n">field_element</span><span class="p">,</span> <span class="n">precision</span><span class="o">=</span><span class="mi">6</span><span class="p">):</span>
<span class="n">upscaled</span> <span class="o">=</span> <span class="n">field_element</span> <span class="k">if</span> <span class="n">field_element</span> <span class="o"><=</span> <span class="n">Q</span><span class="o">/</span><span class="mi">2</span> <span class="k">else</span> <span class="n">field_element</span> <span class="o">-</span> <span class="n">Q</span>
<span class="n">rational</span> <span class="o">=</span> <span class="n">upscaled</span> <span class="o">/</span> <span class="mi">10</span><span class="o">**</span><span class="n">precision</span>
<span class="k">return</span> <span class="n">rational</span>
</code></pre></div></div>
<p>In doing this we have to be careful not to “wrap around” by letting any encoding exceed <code class="highlighter-rouge">Q</code>; if this happens our decoding procedure will give wrong results.</p>
<p>To get around this we’ll simply make sure to pick <code class="highlighter-rouge">Q</code> large enough relative to the chosen precision and maximum magnitude. One place where we have to be careful is when doing multiplications as these double the precision. As done earlier we must hence leave enough room for double precision, and additionally include a truncation step after each multiplication where we bring the precision back down. Unlike earlier though, in the two server setting the truncation step can be performed as a local operation as pointed out in <a href="https://eprint.iacr.org/2017/396">SecureML</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">truncate</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">6</span><span class="p">):</span>
<span class="n">y0</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">//</span> <span class="mi">10</span><span class="o">**</span><span class="n">amount</span>
<span class="n">y1</span> <span class="o">=</span> <span class="n">Q</span> <span class="o">-</span> <span class="p">((</span><span class="n">Q</span> <span class="o">-</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">//</span> <span class="mi">10</span><span class="o">**</span><span class="n">amount</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="n">y0</span><span class="p">,</span> <span class="n">y1</span><span class="p">]</span>
</code></pre></div></div>
<p>With this in place we are now (in theory) set to perform any desired computation on encrypted data.</p>
<h1 id="next-steps">Next Steps</h1>
<!--
# Dump
TODO
https://www.youtube.com/watch?v=N80DV3Brds0
https://www.youtube.com/watch?v=Ce45hp24b2E
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1480149/
https://github.com/vyomshm/predicting-coronary-heart-disease-with-tensorflow-and-tensorboard
-->Morten DahlThis post is still very much a work in progress.Secret Sharing, Part 32017-08-13T12:00:00+00:002017-08-13T12:00:00+00:00https://mortendahl.github.io/2017/08/13/secret-sharing-part3<p><em><strong>TL;DR:</strong> due to redundancy in the way shares are generated, we can compensate not only for some of them being lost but also for some being manipulated; here we look at how to do this using decoding methods for Reed-Solomon codes.</em></p>
<p>Returning to our motivation in <a href="/2017/06/04/secret-sharing-part1/">part one</a> for using secret sharing, namely to distribute trust, we recall that the generated shares are given to shareholders that we may not trust individually. As such, if we later ask for the shares back in order to reconstruct the secret then it is natural to consider how reasonable it is to assume that we will receive the original shares back.</p>
<p>Specifically, what if some shares are <em>lost</em>, or what if some shares are <em>manipulated</em> to differ from the initially ones? Both may happen due to simple systems failure, but may also be the result of malicious behaviour on the part of shareholders. Should we in these two cases still expect to be able to recover the secret?</p>
<p>In this blog post we will see how to handle both situations. We will use simpler algorithms, but note towards the end how techniques like those used in <a href="/2017/06/24/secret-sharing-part2/">part two</a> can be used to make the process more efficient.</p>
<p>As usual, all code is available in the <a href="https://github.com/mortendahl/privateml/blob/master/secret-sharing/Reed-Solomon.ipynb">associated Python notebook</a>.</p>
<h1 id="robust-reconstruction">Robust Reconstruction</h1>
<p>In the <a href="/2017/06/04/secret-sharing-part1/#the-missing-pieces">first part</a> we saw how <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial">Lagrange interpolation</a> can be used to answer the first question, in that it allows us to reconstruct the secret as long as only a bounded number of shares are lost. As mentioned in the <a href="/2017/06/24/secret-sharing-part2/#polynomials">second part</a>, this is due to the redundancy that comes with point-value presentations of polynomials, namely that the original polynomial is uniquely defined by <em>any</em> large enough subset of the shares. Concretely, if <code class="highlighter-rouge">D</code> is the degree of the original polynomial then we can reconstruct given <code class="highlighter-rouge">R = D + 1</code> shares in case of Shamir’s scheme and <code class="highlighter-rouge">R = D + K</code> shares in the packed variant; if <code class="highlighter-rouge">N</code> is the total number of shares we can hence afford to loose <code class="highlighter-rouge">N - R</code> shares.</p>
<p>But this is assuming that the received shares are unaltered, and the second question concerning recovery in the face of manipulated shares is intuitively harder as we now cannot easily identify when and where something went wrong. <i>(Note that it is also harder in a more formal sense, namely that a solution for manipulated shares can be used as a solution for lost shares, since dummy values, e.g. a constant, may be substituted for the lost shares and then instead treated as having been manipulated. This however, is not optimal.)</i></p>
<p>To solve this issue we will use techniques from error-correction codes, specifically the well-known <a href="https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction">Reed-Solomon codes</a>. The reason we can do this is that share generation is very similar to (<a href="https://en.wikipedia.org/wiki/Systematic_code">non-systemic</a>) message encoding in these codes, and hence their decoding algorithms can be used to reconstruct even in the face of manipulated shares.</p>
<p>The robust reconstruct method for Shamir’s scheme we end up with is as follows, with a straight forward generalisation to the packed scheme. The input is a complete list of length <code class="highlighter-rouge">N</code> of received shares, where missing shares are represented by <code class="highlighter-rouge">None</code> and manipulated shares by their new value. And if reconstruction goes well then the output is not only the secret, but also the indices of the shares that were manipulated.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">shamir_robust_reconstruct</span><span class="p">(</span><span class="n">shares</span><span class="p">):</span>
<span class="c"># filter missing shares</span>
<span class="n">points_values</span> <span class="o">=</span> <span class="p">[</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span><span class="n">v</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span><span class="p">,</span><span class="n">v</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">POINTS</span><span class="p">,</span> <span class="n">shares</span><span class="p">)</span> <span class="k">if</span> <span class="n">v</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="p">]</span>
<span class="c"># decode remaining faulty</span>
<span class="n">points</span><span class="p">,</span> <span class="n">values</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">points_values</span><span class="p">)</span>
<span class="n">polynomial</span><span class="p">,</span> <span class="n">error_locator</span> <span class="o">=</span> <span class="n">gao_decoding</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">values</span><span class="p">,</span> <span class="n">R</span><span class="p">,</span> <span class="n">MAX_MANIPULATED</span><span class="p">)</span>
<span class="c"># check if recovery was possible</span>
<span class="k">if</span> <span class="n">polynomial</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="c"># there were more errors than assumed by `MAX_ERRORS`</span>
<span class="k">raise</span> <span class="nb">Exception</span><span class="p">(</span><span class="s">"Too many errors, cannot reconstruct"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="c"># recover secret</span>
<span class="n">secret</span> <span class="o">=</span> <span class="n">poly_eval</span><span class="p">(</span><span class="n">polynomial</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="c"># find roots of error locator polynomial</span>
<span class="n">error_indices</span> <span class="o">=</span> <span class="p">[</span> <span class="n">i</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">v</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span> <span class="n">poly_eval</span><span class="p">(</span><span class="n">error_locator</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">POINTS</span> <span class="p">)</span>
<span class="k">if</span> <span class="n">v</span> <span class="o">==</span> <span class="mi">0</span>
<span class="p">]</span>
<span class="k">return</span> <span class="n">secret</span><span class="p">,</span> <span class="n">error_indices</span>
</code></pre></div></div>
<p>Having the error indices may be useful for instance as a deterrent: since we can identify malicious shareholders we may also be able to e.g. publicly shame them, and hence incentivise correct behaviour in the first place. Formally this is known as <a href="https://en.wikipedia.org/wiki/Secure_multi-party_computation#Security_definitions">covert security</a>, where shareholders are willing to cheat only if they are not caught.</p>
<p>Finally note that reconstruction may however fail, yet it can be shown that this only happens when there indeed isn’t enough information left to correctly identify the result; in other words, our method will never give a false negative. Parameters <code class="highlighter-rouge">MAX_MISSING</code> and <code class="highlighter-rouge">MAX_MANIPULATED</code> are used to characterise when failure can happen, giving respectively an upper bound on the number of lost and manipulated shares supported. What must hold in general is that the number of “redundancy shares” <code class="highlighter-rouge">N - R</code> must satisfy <code class="highlighter-rouge">N - R >= MAX_MISSING + 2 * MAX_MANIPULATED</code>, from which we see that we are paying a double price for manipulated shares compared to missing shares.</p>
<h2 id="outline-of-decoding-algorithm">Outline of decoding algorithm</h2>
<p>The specific decoding procedure we use here works by first finding an erroneous polynomial in coefficient representation that matches all received shares, including the manipulated ones. Hence we must first find a way to interpolate not only values but also coefficients from a polynomial given in point-value representation; in other words, we must find a way to convert from point-value representation to coefficient representation. We saw in <a href="/2017/06/24/secret-sharing-part2/">part two</a> how the backward FFT can do this in specific cases, but to handle missing shares we here instead adapt <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial">Lagrange interpolation</a> as used in <a href="/2017/06/04/secret-sharing-part1/">part one</a>.</p>
<p>Given the erroneous polynomial we then extract a corrected polynomial from it to get our desired result. Surprisingly, this may simply be done by running the <a href="https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#Polynomial_extended_Euclidean_algorithm">extended Euclidean algorithm</a> on polynomials as shown below.</p>
<p>Finally, since both of these two steps are using polynomials as objects of computation, similarly to how one typically uses integers as objects of computation, we must first also give algorithms for polynomial arithmetic such as adding and multiplying.</p>
<h1 id="computing-on-polynomials">Computing on Polynomials</h1>
<p>We assume we already have various functions <code class="highlighter-rouge">base_add</code>, <code class="highlighter-rouge">base_sub</code>, <code class="highlighter-rouge">base_mul</code>, etc. for computing in the base field; concretely this simply amounts to <a href="https://en.wikipedia.org/wiki/Modular_arithmetic">integer arithmetic modulo a fixed prime</a> in our case.</p>
<p>We then represent polynomials over this base field by their list of coefficients: <code class="highlighter-rouge">A(x) = (a0) + (a1 * x) + ... + (aD * x^D)</code> is represented by <code class="highlighter-rouge">A = [a0, a1, ..., aD]</code>. Furthermore, we keep as an invariant that <code class="highlighter-rouge">aD != 0</code> and enforce this below through a <code class="highlighter-rouge">canonical</code> procedure that removes all trailing zeros.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">canonical</span><span class="p">(</span><span class="n">A</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">))):</span>
<span class="k">if</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="n">A</span><span class="p">[:</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="k">return</span> <span class="p">[]</span>
</code></pre></div></div>
<p>However, as an intermediate step we will sometimes first need to expand one of two polynomials to ensure they have the same length. This is done by simply appending zero coefficients to the shorter list.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">expand_to_match</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">):</span>
<span class="n">diff</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">B</span><span class="p">)</span>
<span class="k">if</span> <span class="n">diff</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="n">A</span><span class="p">,</span> <span class="n">B</span> <span class="o">+</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">diff</span>
<span class="k">elif</span> <span class="n">diff</span> <span class="o"><</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">diff</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">diff</span><span class="p">)</span>
<span class="k">return</span> <span class="n">A</span> <span class="o">+</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">diff</span><span class="p">,</span> <span class="n">B</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">A</span><span class="p">,</span> <span class="n">B</span>
</code></pre></div></div>
<p>With this we can perform arithmetic on polynomials by simply using the <a href="https://en.wikipedia.org/wiki/Polynomial_arithmetic">standard definitions</a>. Specifically, to add two polynomials <code class="highlighter-rouge">A</code> and <code class="highlighter-rouge">B</code> given by coefficient lists <code class="highlighter-rouge">[a0, ..., aM]</code> and <code class="highlighter-rouge">[b0, ..., bN]</code> we perform component-wise addition of the coefficients <code class="highlighter-rouge">ai + bi</code>. For example, adding <code class="highlighter-rouge">A(x) = 2x + 3x^2</code> to <code class="highlighter-rouge">B(x) = 1 + 4x^3</code> we get <code class="highlighter-rouge">A(x) + B(x) = (0+1) + (2+0)x + (3+0)x^2 + (0+4)x^3</code>; the first two are represented by <code class="highlighter-rouge">[0,2,3]</code> and <code class="highlighter-rouge">[1,0,0,4]</code> respectively, and their sum by <code class="highlighter-rouge">[1,2,3,4]</code>. Subtraction is similarly done component-wise.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">poly_add</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">):</span>
<span class="n">F</span><span class="p">,</span> <span class="n">G</span> <span class="o">=</span> <span class="n">expand_to_match</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">)</span>
<span class="k">return</span> <span class="n">canonical</span><span class="p">([</span> <span class="n">base_add</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">g</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span><span class="p">,</span> <span class="n">g</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">F</span><span class="p">,</span> <span class="n">G</span><span class="p">)</span> <span class="p">])</span>
<span class="k">def</span> <span class="nf">poly_sub</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">):</span>
<span class="n">F</span><span class="p">,</span> <span class="n">G</span> <span class="o">=</span> <span class="n">expand_to_match</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">)</span>
<span class="k">return</span> <span class="n">canonical</span><span class="p">([</span> <span class="n">base_sub</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">g</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span><span class="p">,</span> <span class="n">g</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">F</span><span class="p">,</span> <span class="n">G</span><span class="p">)</span> <span class="p">])</span>
</code></pre></div></div>
<p>We also do scalar multiplication component-wise, i.e. by scaling every coefficient of a polynomial by an element from the base field. For instance, with <code class="highlighter-rouge">A(x) = 1 + 2x + 3x^2</code> we have <code class="highlighter-rouge">2 * A(x) = 2 + 4x + 6x^2</code>, which as expected is the same as <code class="highlighter-rouge">A(x) + A(x)</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">poly_scalarmul</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">canonical</span><span class="p">([</span> <span class="n">base_mul</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">A</span> <span class="p">])</span>
<span class="k">def</span> <span class="nf">poly_scalardiv</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">canonical</span><span class="p">([</span> <span class="n">base_div</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">A</span> <span class="p">])</span>
</code></pre></div></div>
<p>Multiplication of two polynomials is only slightly more complex, with coefficient <code class="highlighter-rouge">cK</code> of the product being defined by <code class="highlighter-rouge">cK = sum( aI * bJ for i,aI in enumerate(A) for j,bJ in enumerate(B) if i + j == K )</code>, and by changing the computation slightly we avoid iterating over <code class="highlighter-rouge">K</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">poly_mul</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">):</span>
<span class="n">C</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o">+</span> <span class="nb">len</span><span class="p">(</span><span class="n">B</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">B</span><span class="p">)):</span>
<span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">base_add</span><span class="p">(</span><span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="p">],</span> <span class="n">base_mul</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">B</span><span class="p">[</span><span class="n">j</span><span class="p">]))</span>
<span class="k">return</span> <span class="n">canonical</span><span class="p">(</span><span class="n">C</span><span class="p">)</span>
</code></pre></div></div>
<p>We also need to be able to divide a polynomial <code class="highlighter-rouge">A</code> by another polynomial <code class="highlighter-rouge">B</code>, effectively finding a <em>quotient polynomial</em> <code class="highlighter-rouge">Q</code> and a <em>remainder polynomial</em> <code class="highlighter-rouge">R</code> such that <code class="highlighter-rouge">A == Q * B + R</code> with <code class="highlighter-rouge">degree(R) < degree(B)</code>. The procedure works like long-division for integers and is explained in details <a href="https://www.khanacademy.org/math/algebra2/arithmetic-with-polynomials#long-division-of-polynomials">elsewhere</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">poly_divmod</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">base_inverse</span><span class="p">(</span><span class="n">lc</span><span class="p">(</span><span class="n">B</span><span class="p">))</span>
<span class="n">Q</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>
<span class="n">R</span> <span class="o">=</span> <span class="n">copy</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">B</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)):</span>
<span class="n">Q</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">base_mul</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">R</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="nb">len</span><span class="p">(</span><span class="n">B</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">])</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">B</span><span class="p">)):</span>
<span class="n">R</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">base_sub</span><span class="p">(</span><span class="n">R</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="p">],</span> <span class="n">base_mul</span><span class="p">(</span><span class="n">Q</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">B</span><span class="p">[</span><span class="n">j</span><span class="p">]))</span>
<span class="k">return</span> <span class="n">canonical</span><span class="p">(</span><span class="n">Q</span><span class="p">),</span> <span class="n">canonical</span><span class="p">(</span><span class="n">R</span><span class="p">)</span>
</code></pre></div></div>
<p>Note that we have used basic algorithms for these operations here but that more efficient versions exist. Some pointers to these are given at the end.</p>
<h1 id="interpolating-polynomials">Interpolating Polynomials</h1>
<p>We next turn to the task of converting a polynomial given in (implicit) point-value representation to its (explicit) coefficient representation. Several procedures exist for this, including efficient algorithms for specific cases such as the backward FFT seen earlier, and general ones based e.g. on <a href="https://en.wikipedia.org/wiki/Newton_polynomial">Newton’s method</a> that seem popular in numerical analysis due to its better efficiency and ability to handle new data points. However, for this post we’ll use Lagrange interpolation and see that although it’s perhaps typically see as a procedure for interpolating the values of polynomials, it also works just as well for interpolating their coefficients.</p>
<p>Recall that we are given points <code class="highlighter-rouge">x0, x1, ..., xD</code> and values <code class="highlighter-rouge">y0, y1, ..., yD</code> implicitly defining a polynomial <code class="highlighter-rouge">F</code>. <a href="/2017/06/04/secret-sharing-part1/">Earlier</a> we then used <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial">Lagrange’s method</a> to find value <code class="highlighter-rouge">F(x)</code> at a potentially different point <code class="highlighter-rouge">x</code>. This works due to the constructive nature of Lagrange’s proof, where a polynomial <code class="highlighter-rouge">H</code> is defined as <code class="highlighter-rouge">H(X) = y0 * L0(X) + ... + yD * LD(X)</code> for indeterminate <code class="highlighter-rouge">X</code> and <em>Lagrange basis polynomials</em> <code class="highlighter-rouge">Li</code>, and then shown identical to <code class="highlighter-rouge">F</code>. To find <code class="highlighter-rouge">F(x)</code> we then simply evaluated <code class="highlighter-rouge">H(x)</code>, although we precomputed <code class="highlighter-rouge">Li(x)</code> as the <em>Lagrange constants</em> <code class="highlighter-rouge">ci</code> so that this step simply reduced to a weighted sum <code class="highlighter-rouge">y1 * c1 + ... yD * cD</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">lagrange_constants_for_point</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">point</span><span class="p">):</span>
<span class="n">constants</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">xi</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">points</span><span class="p">):</span>
<span class="n">numerator</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">denominator</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">j</span><span class="p">,</span> <span class="n">xj</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">points</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">j</span><span class="p">:</span> <span class="k">continue</span>
<span class="n">numerator</span> <span class="o">=</span> <span class="n">base_mul</span><span class="p">(</span><span class="n">numerator</span><span class="p">,</span> <span class="n">base_sub</span><span class="p">(</span><span class="n">point</span><span class="p">,</span> <span class="n">xj</span><span class="p">))</span>
<span class="n">denominator</span> <span class="o">=</span> <span class="n">base_mul</span><span class="p">(</span><span class="n">denominator</span><span class="p">,</span> <span class="n">base_sub</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">xj</span><span class="p">))</span>
<span class="n">constant</span> <span class="o">=</span> <span class="n">base_div</span><span class="p">(</span><span class="n">numerator</span><span class="p">,</span> <span class="n">denominator</span><span class="p">)</span>
<span class="n">constants</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">constant</span><span class="p">)</span>
<span class="k">return</span> <span class="n">constants</span>
</code></pre></div></div>
<p>Now, when we want the coefficients of <code class="highlighter-rouge">F</code> instead of just its value <code class="highlighter-rouge">F(x)</code> at <code class="highlighter-rouge">x</code>, we see that while <code class="highlighter-rouge">H</code> is identical to <code class="highlighter-rouge">F</code> it only gives us a semi-explicit representation, made worse by the fact that the <code class="highlighter-rouge">Li</code> polynomials are also only given in a semi-explicit representation: <code class="highlighter-rouge">Li(X) = (X - x0) * ... * (X - xD) / (xi - x0) * ... * (xi - xD)</code>. However, since we developed algorithms for using polynomials as objects in computations, we can simply evaluate these expression with indeterminate <code class="highlighter-rouge">X</code> to find the reduced explicit form! See for instance the examples <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial#Examples">here</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">lagrange_polynomials</span><span class="p">(</span><span class="n">points</span><span class="p">):</span>
<span class="n">polys</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">xi</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">points</span><span class="p">):</span>
<span class="n">numerator</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">denominator</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">j</span><span class="p">,</span> <span class="n">xj</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">points</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">j</span><span class="p">:</span> <span class="k">continue</span>
<span class="n">numerator</span> <span class="o">=</span> <span class="n">poly_mul</span><span class="p">(</span><span class="n">numerator</span><span class="p">,</span> <span class="p">[</span><span class="n">base_sub</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">xj</span><span class="p">),</span> <span class="mi">1</span><span class="p">])</span>
<span class="n">denominator</span> <span class="o">=</span> <span class="n">base_mul</span><span class="p">(</span><span class="n">denominator</span><span class="p">,</span> <span class="n">base_sub</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">xj</span><span class="p">))</span>
<span class="n">poly</span> <span class="o">=</span> <span class="n">poly_scalardiv</span><span class="p">(</span><span class="n">numerator</span><span class="p">,</span> <span class="n">denominator</span><span class="p">)</span>
<span class="n">polys</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">poly</span><span class="p">)</span>
<span class="k">return</span> <span class="n">polys</span>
</code></pre></div></div>
<p>Doing this also for <code class="highlighter-rouge">H</code> gives us the interpolated polynomial in explicit coefficient representation.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">lagrange_interpolation</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">values</span><span class="p">):</span>
<span class="n">ls</span> <span class="o">=</span> <span class="n">lagrange_polynomials</span><span class="p">(</span><span class="n">points</span><span class="p">)</span>
<span class="n">poly</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">values</span><span class="p">):</span>
<span class="n">term</span> <span class="o">=</span> <span class="n">poly_scalarmul</span><span class="p">(</span><span class="n">ls</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">yi</span><span class="p">)</span>
<span class="n">poly</span> <span class="o">=</span> <span class="n">poly_add</span><span class="p">(</span><span class="n">poly</span><span class="p">,</span> <span class="n">term</span><span class="p">)</span>
<span class="k">return</span> <span class="n">poly</span>
</code></pre></div></div>
<p>While this may not be the most efficient way (see notes later), it is hard to beat its simplicity.</p>
<h1 id="correcting-errors">Correcting Errors</h1>
<p>In the non-systemic variants of <a href="https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction">Reed-Solomon codes</a>, a message <code class="highlighter-rouge">m</code> represented by a vector <code class="highlighter-rouge">[m0, ..., mD]</code> is encoded by interpreting it as a polynomial <code class="highlighter-rouge">F(X) = (m0) + (m1 * X) + ... + (mD * X^D)</code> and then evaluating <code class="highlighter-rouge">F</code> at a fixed set of points to get the code word. Unlike share generation, no randomness is used in this process since the purpose is only to provide redundancy and not privacy (in fact, in the systemic variants, the message is directly readable from the code word), yet this doesn’t change the fact that we can use decoding procedures to correct errors in shares.</p>
<p>Several such <a href="https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Error_correction_algorithms">decoding procedures</a> exist, some of which are explained <a href="https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_for_coders">here</a> and <a href="https://jeremykun.com/2015/09/07/welch-berlekamp/">there</a>, yet the one we’ll use here is conceptually simple and has a certain beauty to it. Also keep in mind that some of the typical optimizations used in implementations of the alternative approaches get their speed-up by relying on properties of the more common setting over binary extension fields, while we here are interested in the setting over prime fields as we would like to simulate (bounded) integer arithmetic in our application of secret sharing to secure computation – which is straight forward in prime fields but less clear in binary extension fields.</p>
<p>The approach we will use was first described in <a href="https://doi.org/10.1016/S0019-9958(75)90090-X">SKHN’75</a>, yet we’ll follow the algorithm given in <a href="http://www.math.clemson.edu/~sgao/papers/RS.pdf">Gao’02</a> (see also Section 17.5 in <a href="http://shoup.net/ntb/ntb-v2.pdf">Shoup’08</a>). It works by first interpolating a potentially faulty polynomial <code class="highlighter-rouge">H</code> from all the available shares and then running the extended Euclidean algorithm to either extract the original polynomial <code class="highlighter-rouge">G</code> or (rightly) declare it impossible. That the algorithm can be used for this is surprising and is strongly related to <a href="https://en.wikipedia.org/wiki/Rational_reconstruction_(mathematics)">rational reconstruction</a>.</p>
<h2 id="extended-euclidean-algorithm-on-polynomials">Extended Euclidean algorithm on polynomials</h2>
<p>Assume that we have two polynomials <code class="highlighter-rouge">H</code> and <code class="highlighter-rouge">F</code> and we would like to find linear combinations of these in the form of triples <code class="highlighter-rouge">(R, T, S)</code> of polynomials such that <code class="highlighter-rouge">R == H * T + F * S</code>. This may of course be done in many different ways, but one particular interesting approach is to consider the list of triples <code class="highlighter-rouge">(R0, T0, S0), ..., (RM, TM, SM)</code> generated by the <a href="https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#Polynomial_extended_Euclidean_algorithm">extended Euclidean algorithm</a> (EEA).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">poly_eea</span><span class="p">(</span><span class="n">F</span><span class="p">,</span> <span class="n">H</span><span class="p">):</span>
<span class="n">R0</span><span class="p">,</span> <span class="n">R1</span> <span class="o">=</span> <span class="n">F</span><span class="p">,</span> <span class="n">H</span>
<span class="n">S0</span><span class="p">,</span> <span class="n">S1</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="p">[]</span>
<span class="n">T0</span><span class="p">,</span> <span class="n">T1</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">triples</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">R1</span> <span class="o">!=</span> <span class="p">[]:</span>
<span class="n">Q</span><span class="p">,</span> <span class="n">R2</span> <span class="o">=</span> <span class="n">poly_divmod</span><span class="p">(</span><span class="n">R0</span><span class="p">,</span> <span class="n">R1</span><span class="p">)</span>
<span class="n">triples</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="p">(</span><span class="n">R0</span><span class="p">,</span> <span class="n">S0</span><span class="p">,</span> <span class="n">T0</span><span class="p">)</span> <span class="p">)</span>
<span class="n">R0</span><span class="p">,</span> <span class="n">S0</span><span class="p">,</span> <span class="n">T0</span><span class="p">,</span> <span class="n">R1</span><span class="p">,</span> <span class="n">S1</span><span class="p">,</span> <span class="n">T1</span> <span class="o">=</span> \
<span class="n">R1</span><span class="p">,</span> <span class="n">S1</span><span class="p">,</span> <span class="n">T1</span><span class="p">,</span> \
<span class="n">R2</span><span class="p">,</span> <span class="n">poly_sub</span><span class="p">(</span><span class="n">S0</span><span class="p">,</span> <span class="n">poly_mul</span><span class="p">(</span><span class="n">S1</span><span class="p">,</span> <span class="n">Q</span><span class="p">)),</span> <span class="n">poly_sub</span><span class="p">(</span><span class="n">T0</span><span class="p">,</span> <span class="n">poly_mul</span><span class="p">(</span><span class="n">T1</span><span class="p">,</span> <span class="n">Q</span><span class="p">))</span>
<span class="k">return</span> <span class="n">triples</span>
</code></pre></div></div>
<p>The reason for this is that this list turns out to represent <em>all</em> triples up to a certain size that satisfy the equation, in the sense that every “small” triple <code class="highlighter-rouge">(R, T, S)</code> for which <code class="highlighter-rouge">R == T * H + S * F</code> is actually just a scaled version of a triple <code class="highlighter-rouge">(Ri, Ti, Si)</code> occurring in the list generated by the EEA: for some constant <code class="highlighter-rouge">a</code> we have <code class="highlighter-rouge">R == a * Ri</code>, <code class="highlighter-rouge">T == a * Ti</code>, and <code class="highlighter-rouge">S == a * Si</code>. Moreover, given a concrete interpretation of “small” in the form of a degree bound on <code class="highlighter-rouge">R</code> and <code class="highlighter-rouge">T</code>, we may find the unique <code class="highlighter-rouge">(Ri, Ti, Si)</code> that this holds for.</p>
<p>Why this is useful in decoding becomes apparent next.</p>
<h2 id="euclidean-decoding">Euclidean decoding</h2>
<p>Say that <code class="highlighter-rouge">T</code> is the unknown error locator polynomial, i.e. <code class="highlighter-rouge">T(xi) == 0</code> exactly when share <code class="highlighter-rouge">yi</code> has been manipulated. Say also that <code class="highlighter-rouge">R = T * G</code> where <code class="highlighter-rouge">G</code> is the original polynomial that was used to generate the shares. Clearly, if we actually knew <code class="highlighter-rouge">T</code> and <code class="highlighter-rouge">R</code> then we could get what we’re after by a simple division <code class="highlighter-rouge">R / T</code> – but since we don’t we have to do something else.</p>
<p>Because we’re only after the ratio <code class="highlighter-rouge">R / T</code>, we see that knowing <code class="highlighter-rouge">Ri</code> and <code class="highlighter-rouge">Ti</code> such that <code class="highlighter-rouge">R == a * Ri</code> and <code class="highlighter-rouge">T == a * Ti</code> actually gives us the same result: <code class="highlighter-rouge">R / T == (a * Ri) / (a * Ti) == Ri / Ti</code>, and these we could potentially get from the EEA! The only obstacles are that we need to define polynomials <code class="highlighter-rouge">H</code> and <code class="highlighter-rouge">F</code>, and we need to be sure that there is a “small” triple with the <code class="highlighter-rouge">R</code> and <code class="highlighter-rouge">T</code> as defined here that satisfies the linear equation, which in turn means making sure there exists a suitable <code class="highlighter-rouge">S</code>. Once done, the output of <code class="highlighter-rouge">poly_eea(H, F)</code> will give us the needed <code class="highlighter-rouge">Ri</code> and <code class="highlighter-rouge">Ti</code>.</p>
<p>Perhaps unsurprisingly, <code class="highlighter-rouge">H</code> is the polynomial interpolated using all available values, which may potentially be faulty in case some of them have been manipulated. <code class="highlighter-rouge">F = F1 * ... * FN</code> is the product of polynomials <code class="highlighter-rouge">Fi(X) = X - xi</code> where <code class="highlighter-rouge">X</code> it the indeterminate and <code class="highlighter-rouge">x1, ..., xN</code> are the points.</p>
<p>Having defined <code class="highlighter-rouge">H</code> and <code class="highlighter-rouge">F</code> like this, we can then show that our <code class="highlighter-rouge">R</code> and <code class="highlighter-rouge">T</code> as defined above are “small” when the number of errors that have occurred are below the bounds discussed earlier. Likewise it can be shown that there is an <code class="highlighter-rouge">S</code> such that <code class="highlighter-rouge">R == T * H + S * F</code>; this involves showing that <code class="highlighter-rouge">R - T * H == S * F</code>, which follows from <code class="highlighter-rouge">R == H * T mod F</code> and in turn <code class="highlighter-rouge">R == H * T mod Fi</code> for all <code class="highlighter-rouge">Fi</code>. See standard textbooks for further details.</p>
<p>With this in place we have our decoding algorithm!</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">gao_decoding</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">values</span><span class="p">,</span> <span class="n">max_degree</span><span class="p">,</span> <span class="n">max_error_count</span><span class="p">):</span>
<span class="c"># interpolate faulty polynomial</span>
<span class="n">H</span> <span class="o">=</span> <span class="n">lagrange_interpolation</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">values</span><span class="p">)</span>
<span class="c"># compute f</span>
<span class="n">F</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">for</span> <span class="n">xi</span> <span class="ow">in</span> <span class="n">points</span><span class="p">:</span>
<span class="n">Fi</span> <span class="o">=</span> <span class="p">[</span><span class="n">base_sub</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">xi</span><span class="p">),</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">F</span> <span class="o">=</span> <span class="n">poly_mul</span><span class="p">(</span><span class="n">F</span><span class="p">,</span> <span class="n">Fi</span><span class="p">)</span>
<span class="c"># run EEA-like algorithm on (F,H) to find EEA triple</span>
<span class="n">R0</span><span class="p">,</span> <span class="n">R1</span> <span class="o">=</span> <span class="n">F</span><span class="p">,</span> <span class="n">H</span>
<span class="n">S0</span><span class="p">,</span> <span class="n">S1</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="p">[]</span>
<span class="n">T0</span><span class="p">,</span> <span class="n">T1</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">Q</span><span class="p">,</span> <span class="n">R2</span> <span class="o">=</span> <span class="n">poly_divmod</span><span class="p">(</span><span class="n">R0</span><span class="p">,</span> <span class="n">R1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">deg</span><span class="p">(</span><span class="n">R0</span><span class="p">)</span> <span class="o"><</span> <span class="n">max_degree</span> <span class="o">+</span> <span class="n">max_error_count</span><span class="p">:</span>
<span class="n">G</span><span class="p">,</span> <span class="n">leftover</span> <span class="o">=</span> <span class="n">poly_divmod</span><span class="p">(</span><span class="n">R0</span><span class="p">,</span> <span class="n">T0</span><span class="p">)</span>
<span class="k">if</span> <span class="n">leftover</span> <span class="o">==</span> <span class="p">[]:</span>
<span class="n">decoded_polynomial</span> <span class="o">=</span> <span class="n">G</span>
<span class="n">error_locator</span> <span class="o">=</span> <span class="n">T0</span>
<span class="k">return</span> <span class="n">decoded_polynomial</span><span class="p">,</span> <span class="n">error_locator</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">R0</span><span class="p">,</span> <span class="n">S0</span><span class="p">,</span> <span class="n">T0</span><span class="p">,</span> <span class="n">R1</span><span class="p">,</span> <span class="n">S1</span><span class="p">,</span> <span class="n">T1</span> <span class="o">=</span> \
<span class="n">R1</span><span class="p">,</span> <span class="n">S1</span><span class="p">,</span> <span class="n">T1</span><span class="p">,</span> \
<span class="n">R2</span><span class="p">,</span> <span class="n">poly_sub</span><span class="p">(</span><span class="n">S0</span><span class="p">,</span> <span class="n">poly_mul</span><span class="p">(</span><span class="n">S1</span><span class="p">,</span> <span class="n">Q</span><span class="p">)),</span> <span class="n">poly_sub</span><span class="p">(</span><span class="n">T0</span><span class="p">,</span> <span class="n">poly_mul</span><span class="p">(</span><span class="n">T1</span><span class="p">,</span> <span class="n">Q</span><span class="p">))</span>
</code></pre></div></div>
<p>Note however that it actually does more than promised above: it breaks down gracefully, by returning <code class="highlighter-rouge">None</code> instead of a wrong result, in case our assumption on the maximum number of errors turns out to be false. The intuition behind this is that if the assumption is true then <code class="highlighter-rouge">T</code> by definition is “small” and hence the properties of the EEA triple kick in to imply that the division is the same as <code class="highlighter-rouge">R / T</code>, which by definition of <code class="highlighter-rouge">R</code> has a zero remainder. And vice versa, if the remainder was zero then the returned polynomial is in fact less than the assumed number of errors away from <code class="highlighter-rouge">H</code> and hence <code class="highlighter-rouge">T</code> by definition is “small”. In other words, <code class="highlighter-rouge">None</code> is returned if and only if our assumption was false, which is pretty neat. See <a href="http://www.math.clemson.edu/~sgao/papers/RS.pdf">Gao’02</a> for further details.</p>
<p>Finally, note that it also gives us the error locations in the form of the roots of <code class="highlighter-rouge">T</code>. As mentioned earlier this is very useful from an application point of view, but could also have been obtained by simply comparing the received shares against a re-sharing based on the decoded polynomial.</p>
<h1 id="efficiency-improvements">Efficiency Improvements</h1>
<p>The algorithms presented above have time complexity <code class="highlighter-rouge">Oh(N^2)</code> but are not the most efficient. Based on the <a href="/2017/06/24/secret-sharing-part2/">second part</a> we may straight away see how interpolation can be sped up by using the <a href="https://en.wikipedia.org/wiki/Fast_Fourier_transform">Fast Fourier Transform</a> instead of Lagrange’s method. One downside is that we then need to assume that <code class="highlighter-rouge">x1, ..., xN</code> are Fourier points, i.e. with a special structure, and we need to fill in dummy values for the missing shares and hence pay the double price. <a href="https://en.wikipedia.org/wiki/Newton_polynomial">Newton’s method</a> alternatively avoids this constraint while potentially giving better concrete performance than Lagrange’s.</p>
<p>However, there are also other fast interpolation algorithms without these constraints, as detailed in for instance Modern Computer Algebra or <a href="http://cr.yp.to/f2mult/mateer-thesis.pdf">this thesis</a>, which also reduces the asymptotic complexity to <code class="highlighter-rouge">Oh(N * log N)</code>. This former reference also contains fast <code class="highlighter-rouge">Oh(N * log N)</code> methods for arithmetic and the EEA.</p>
<h1 id="next-steps">Next Steps</h1>
<p>The first three posts have been a lot of theory and it’s now time to turn to applications.</p>Morten DahlTL;DR: due to redundancy in the way shares are generated, we can compensate not only for some of them being lost but also for some being manipulated; here we look at how to do this using decoding methods for Reed-Solomon codes.Recent Talks on Privacy2017-08-12T12:00:00+00:002017-08-12T12:00:00+00:00https://mortendahl.github.io/2017/08/12/recent-talks-on-privacy<p>During winter and spring I was fortunate enough to have a few occasions to talk about some of the work done at <a href="https://snips.ai">Snips</a> on applying <a href="https://en.wikipedia.org/wiki/Privacy-enhancing_technologies">privacy-enhancing technologies</a> to concrete problems encountered as a start-up building privacy-aware machine learning systems for mobile devices.</p>
<p>These were mainly centered around the <a href="https://github.com/snipsco/sda"><em>Secure Distributed Aggregator</em></a> (SDA) for learning from user data distributed on mobile devices in a privacy-preserving manner, i.e. without learning any individual data only the final aggregation, but there was also room for discussion around privacy from a broader perspective, including how it has played into decisions made by the company.</p>
<h1 id="what-privacy-has-meant-for-snips">What Privacy Has Meant For Snips</h1>
<p>Given at the workshop on <a href="http://wwwf.imperial.ac.uk/~nadams/events/ic-rss2017/ic-rss2017.html"><em>Privacy in Statistical Analysis (PSA’17)</em></a>, this invited <a href="https://github.com/mortendahl/privateml/raw/master/talks/PSA17-slides.pdf">talk</a> aimed at giving an industrial perspective on privacy, including how it has played a role at Snips from its beginning. To this end the talk was divided into four areas where privacy had been involved, three of which briefly discussed below.</p>
<h3 id="accessing-data">Accessing Data</h3>
<p>Access to personal data was essential for the success of its first mobile app, so to ensure that this was given the company decided to earn users’ trust by focusing on privacy. To this end, it was decided to keep all data locally on users’ devices and do the processing there instead of on company servers.</p>
<p>These on-device privacy solutions have the extra benefit of being easy to explain, and may have accounted for the high percentage of users willing to give the mobile app access to sensitive information such as emails, chats, location tracking, and even screen content.</p>
<h3 id="protecting-the-company">Protecting the Company</h3>
<p>By the principle of <a href="https://www.schneier.com/blog/archives/2016/03/data_is_a_toxic.html"><em>Data is a Toxic Asset</em></a>, not storing any user data means less to worry about if company servers are ever compromised. However, some services hosted by third parties, including the company, may build up a set of metadata that in itself could reveal something about the users and e.g. damage reputation. One such example is <em>point-of-interest</em> services where a user reveals his location in order to obtain e.g. a list of nearby restaurants.</p>
<p>Powerful cryptographic techniques, such as the <a href="https://www.torproject.org/">Tor network</a> and <a href="https://en.wikipedia.org/wiki/Private_information_retrieval">private information retrieval</a>, may make it possible for companies to make private versions of these services, yet also impose a significant overhead. Instead, by assuming that the company is generally honest, a more efficient compromise can be reached by shifting the focus from deliberate malicious behaviour to easier problems such as accidental storing or logging.</p>
<p>One concrete approach taken for this was to strip sensitive information at the server entry point so that it was never exposed to subcomponents.</p>
<h3 id="learning-from-data">Learning from Data</h3>
<p>While it is great for user privacy to only have locally stored data sets, it is also relevant for both users and the company to get insights from these, for instance as a way of making cross-user recommendations or getting model feedback.</p>
<p>The key to this contradiction is that often there is no need to share individual data as long as a global view can be computed. A brief comparison between techniques was made, including:</p>
<ul>
<li>
<p><strong>sensor networks</strong>: high performance but requires a lot of coordination between users</p>
</li>
<li>
<p><strong>differential privacy</strong>: high performance and strong privacy guarantees, but a lot of data is needed for the signal to overcome the noise</p>
</li>
<li>
<p><strong>homomorphic encryption</strong>: flexible and explainable, but still not very efficient and has the issue of who’s holding the decryption keys</p>
</li>
<li>
<p><strong>multi-party computation</strong>: flexible and decent performance, but requires several players to distribute trust to</p>
</li>
</ul>
<p>and concluding with the specialised multi-party computation protocol underlying SDA and further detailed below.</p>
<h1 id="private-data-aggregation-on-a-budget">Private Data Aggregation on a Budget</h1>
<p>Given at the workshop on <a href="http://www.multipartycomputation.com/tpmpc-2017"><em>Theory and Practice of Multi-Party Computation (TPMPC’17)</em></a>, this <a href="https://github.com/mortendahl/privateml/raw/master/talks/TPMPC17-slides.pdf">talk</a> was technical in nature in that it presented the <a href="https://eprint.iacr.org/2017/643">SDA protocol</a>, but also aimed at illustrating the problem that a company may experience when wanting to solve a privacy problem by employing a secure multi-party computation (MPC) protocol: namely, that it may find itself to be the only party that is naturally motivated to invest resources into it.</p>
<p>Moreover, to remain open to as many potential other parties as possible, it is interesting to minimise the requirements on these in terms of computation, communication, and coordination. By doing so parties running e.g. mobile devices or web browsers may be considered. These concerns however, are not always considered in typical MPC protocols.</p>
<h3 id="community-based-mpc">Community-based MPC</h3>
<p>To this end SDA presents a simple but concrete proposal in a <em>community-based model</em> where members from a community are used as parties.</p>
<p>These parties only have to make a minimum of investment as most of the computation is out-sourced to the company and very little coordination is required between the selected members. Furthermore, a mechanism for distributing work is also presented that allows for lowering the individual load by involving more members.</p>
<p>The result is a practical protocol for <em>aggregating high-dimensional vectors</em> that is suitable for a single company with a community of sporadic members.</p>
<h3 id="applications">Applications</h3>
<p>Concrete and realistic applications was also considered, including analytics, surveys, and place discovery based on users’ location history.</p>
<p>As illustrated, the load on community members in these applications were low enough to be reasonably run on mobile phones and even web browsers.</p>
<p>This work was also presented at <a href="https://pmpml.github.io/PMPML16/"><em>Private Multi-Party Machine Learning (PMPML’16)</em></a> in the form of a <a href="https://github.com/mortendahl/privateml/raw/master/talks/PMPML16-poster.pdf">poster</a>.</p>Morten DahlDuring winter and spring I was fortunate enough to have a few occasions to talk about some of the work done at Snips on applying privacy-enhancing technologies to concrete problems encountered as a start-up building privacy-aware machine learning systems for mobile devices.Secret Sharing, Part 22017-06-24T12:00:00+00:002017-06-24T12:00:00+00:00https://mortendahl.github.io/2017/06/24/secret-sharing-part2<p><em><strong>TL;DR:</strong> efficient secret sharing requires fast polynomial evaluation and interpolation; here we go through what it takes to use the well-known Fast Fourier Transform for this.</em></p>
<p>In the <a href="/2017/06/04/secret-sharing-part1/">first part</a> we looked at Shamir’s scheme, as well as its packed variant where several secrets are shared together. We saw that polynomials lie at the core of both schemes, and that implementation is basically a question of (partially) converting back and forth between two different representations of these. We also gave typical algorithms for doing this.</p>
<p>For this part we will look at somewhat more complex algorithms in an attempt to speed up the computations needed for generating shares. Specifically, we will implement and apply the Fast Fourier Transform, detailing all the essential steps. Performance measurements performed with <a href="https://github.com/mortendahl/rust-threshold-secret-sharing">our Rust implementation</a> shows that this yields orders of magnitude of efficiency improvements when either the number of shares or the number of secrets is high.</p>
<p>There is also an <a href="https://github.com/mortendahl/privateml/blob/master/secret-sharing/Fast%20Fourier%20Transform.ipynb">associated Python notebook</a> to better see how the code samples fit together in the bigger picture.</p>
<h1 id="polynomials">Polynomials</h1>
<p>If we <a href="/2017/06/04/secret-sharing-part1/">look back</a> at Shamir’s scheme we see that it’s all about polynomials: a random polynomial embedding the secret is sampled and the shares are taken as its values at a certain set of points.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">shamir_share</span><span class="p">(</span><span class="n">secret</span><span class="p">):</span>
<span class="n">polynomial</span> <span class="o">=</span> <span class="n">sample_shamir_polynomial</span><span class="p">(</span><span class="n">secret</span><span class="p">)</span>
<span class="n">shares</span> <span class="o">=</span> <span class="p">[</span> <span class="n">evaluate_at_point</span><span class="p">(</span><span class="n">polynomial</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">SHARE_POINTS</span> <span class="p">]</span>
<span class="k">return</span> <span class="n">shares</span>
</code></pre></div></div>
<p>The same goes for the packed variant, where several secrets are embedded in the sampled polynomial.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">packed_share</span><span class="p">(</span><span class="n">secrets</span><span class="p">):</span>
<span class="n">polynomial</span> <span class="o">=</span> <span class="n">sample_packed_polynomial</span><span class="p">(</span><span class="n">secrets</span><span class="p">)</span>
<span class="n">shares</span> <span class="o">=</span> <span class="p">[</span> <span class="n">interpolate_at_point</span><span class="p">(</span><span class="n">polynomial</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">SHARE_POINTS</span> <span class="p">]</span>
<span class="k">return</span> <span class="n">shares</span>
</code></pre></div></div>
<p>Notice however that they differ slightly in the second steps where the shares are computed: Shamir’s scheme uses <code class="highlighter-rouge">evaluate_at_point</code> while the packed uses <code class="highlighter-rouge">interpolate_at_point</code>. The reason is that the sampled polynomial in the former case is in <em>coefficient representation</em> while in the latter it is in <em>point-value representation</em>.</p>
<p>Specifically, we often represent a polynomial <code class="highlighter-rouge">f</code> of degree <code class="highlighter-rouge">D == L-1</code> by a list of <code class="highlighter-rouge">L</code> coefficients <code class="highlighter-rouge">a0, ..., aD</code> such that <code class="highlighter-rouge">f(x) = (a0) + (a1 * x) + (a2 * x^2) + ... + (aD * x^D)</code>. This representation is convenient for many things, including efficiently evaluating the polynomial at a given point using e.g. <a href="https://en.wikipedia.org/wiki/Horner%27s_method">Horner’s method</a>.</p>
<p>However, every such polynomial may also be represented by a set of <code class="highlighter-rouge">L</code> point-value pairs <code class="highlighter-rouge">(p1, v1), ..., (pL, vL)</code> where <code class="highlighter-rouge">vi == f(pi)</code> and all the <code class="highlighter-rouge">pi</code> are distinct. Evaluating the polynomial at a given point is still possible, yet now requires a more involved <em>interpolation</em> procedure that may be less efficient.</p>
<p>But the point-value representation also has several advantages, most importantly that every element intuitively contributes with the same amount of information, unlike the coefficient representation where, in the case of secret sharing, a few elements are the actual secrets; this property gives us the privacy guarantee we are after. Moreover, a degree <code class="highlighter-rouge">L-1</code> polynomial may also be represented by <em>more than</em> <code class="highlighter-rouge">L</code> pairs; in this case there is some redundancy in the representation that we may for instance take advantage of in secret sharing (to reconstruct even if some shares are lost) and in coding theory (to decode correctly even if some errors occur during transmission).</p>
<p>The reason this works is that the result of interpolation on a point-value representation with <code class="highlighter-rouge">L</code> pairs is technically speaking defined with respect to the <em>least degree</em> polynomial <code class="highlighter-rouge">g</code> such that <code class="highlighter-rouge">g(pi) == vi</code> for all pairs in the set, which is <a href="https://en.wikipedia.org/wiki/Polynomial_interpolation#Uniqueness_of_the_interpolating_polynomial">unique</a> and has at most degree <code class="highlighter-rouge">L-1</code>. This means that if two point-value representations are generated using the same polynomial <code class="highlighter-rouge">g</code> then interpolation on these will yield identical results, even when the two sets are of different sizes or use different points, since the least degree polynomial is the same.</p>
<p>It is also why we can use the two representations somewhat interchangeably: if a point-value representation with <code class="highlighter-rouge">L</code> pairs where generated by a degree <code class="highlighter-rouge">L-1</code> polynomial <code class="highlighter-rouge">f</code>, then the unique least degree polynomial agreeing with these must be <code class="highlighter-rouge">f</code>. And since, for a fixed set of points, the set of coefficient lists of length <code class="highlighter-rouge">L</code> and the set of value lists of length <code class="highlighter-rouge">L</code> has the same cardinality (in our case <code class="highlighter-rouge">Q^L</code>) we must have a bijection between them.</p>
<h1 id="fast-fourier-transform">Fast Fourier Transform</h1>
<p>With the two presentation of polynomials in mind we move on to how the <a href="https://en.wikipedia.org/wiki/Fast_Fourier_transform">Fast Fourier Transform</a> (FFT) over finite fields – <em>also known as the <a href="https://en.wikipedia.org/wiki/Discrete_Fourier_transform_(general)#Number-theoretic_transform">Number Theoretic Transform</a> (NTT)</em> – can be used to perform efficient conversion between them. And for me the best way of understanding this is through an example that can later be generalised into an algorithm.</p>
<h2 id="walk-through-example">Walk-through example</h2>
<p>Recall that all our computations happen in a prime field determined by a fixed prime <code class="highlighter-rouge">Q</code>, i.e. using the numbers <code class="highlighter-rouge">0, 1, ..., Q-1</code>. In this example we will use <code class="highlighter-rouge">Q = 433</code>, who’s order <code class="highlighter-rouge">Q-1</code> is divisible by <code class="highlighter-rouge">4</code>: <code class="highlighter-rouge">Q-1 == 432 == 4 * k</code> with <code class="highlighter-rouge">k = 108</code>.</p>
<p>Assume then that we have a polynomial <code class="highlighter-rouge">A(x) = 1 + 2x + 3x^2 + 4x^3</code> over this field of with <code class="highlighter-rouge">L == 4</code> coefficients and degree <code class="highlighter-rouge">L-1 == 3</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">A_coeffs</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span> <span class="p">]</span>
</code></pre></div></div>
<p>Our goal is to turn this list of coefficients into a list of values <code class="highlighter-rouge">[ A(w0), A(w1), A(w2), A(w3) ]</code> of equal length, for points <code class="highlighter-rouge">w = [w0, w1, w2, w3]</code>.</p>
<p>The standard way of evaluating polynomials is of course one way of during this, which using Horner’s rule can be done in a total of <code class="highlighter-rouge">Oh(L * L)</code> operations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">A</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">horner_evaluate</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
<span class="k">assert</span><span class="p">([</span> <span class="n">A</span><span class="p">(</span><span class="n">wi</span><span class="p">)</span> <span class="k">for</span> <span class="n">wi</span> <span class="ow">in</span> <span class="n">w</span> <span class="p">]</span>
<span class="o">==</span> <span class="p">[</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">73</span><span class="p">,</span> <span class="mi">431</span><span class="p">,</span> <span class="mi">356</span> <span class="p">])</span>
</code></pre></div></div>
<p>But as we will see, the FFT allows us to do so more efficiently when the length is sufficiently large and the points are chosen with a certain structure; asymptotically we can compute the values in <code class="highlighter-rouge">Oh(L * log L)</code> operations.</p>
<p>The first insight we need is that there is an alternative evaluation strategy that breaks <code class="highlighter-rouge">A</code> into two smaller polynomials. In particular, if we define polynomials <code class="highlighter-rouge">B(y) = 1 + 3y</code> and <code class="highlighter-rouge">C(y) = 2 + 4y</code> by taking every other coefficient from <code class="highlighter-rouge">A</code> then we have <code class="highlighter-rouge">A(x) == B(x * x) + x * C(x * x)</code>, which is straight-forward to verify by simply writing out the right-hand side.</p>
<p>This means that if we know values of <code class="highlighter-rouge">B(y)</code> and <code class="highlighter-rouge">C(y)</code> at the <em>squares</em> <code class="highlighter-rouge">v</code> of the <code class="highlighter-rouge">w</code> points, then we can use these to compute the values of <code class="highlighter-rouge">A(x)</code> at the <code class="highlighter-rouge">w</code> points using table look-ups: <code class="highlighter-rouge">A_values[i] = B_values[i] + w[i] * C_values[i]</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># split A into B and C</span>
<span class="n">B_coeffs</span> <span class="o">=</span> <span class="n">A_coeffs</span><span class="p">[</span><span class="mi">0</span><span class="p">::</span><span class="mi">2</span><span class="p">]</span> <span class="c"># == [ 1, 3, ]</span>
<span class="n">C_coeffs</span> <span class="o">=</span> <span class="n">A_coeffs</span><span class="p">[</span><span class="mi">1</span><span class="p">::</span><span class="mi">2</span><span class="p">]</span> <span class="c"># == [ 2, 4 ]</span>
<span class="c"># square the w points</span>
<span class="n">v</span> <span class="o">=</span> <span class="p">[</span> <span class="n">wi</span> <span class="o">*</span> <span class="n">wi</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">wi</span> <span class="ow">in</span> <span class="n">w</span> <span class="p">]</span>
<span class="c"># somehow compute the values of B and C at the v points</span>
<span class="c"># ...</span>
<span class="k">assert</span><span class="p">(</span> <span class="n">B_values</span> <span class="o">==</span> <span class="p">[</span> <span class="n">B</span><span class="p">(</span><span class="n">vi</span><span class="p">)</span> <span class="k">for</span> <span class="n">vi</span> <span class="ow">in</span> <span class="n">v</span> <span class="p">]</span> <span class="p">)</span>
<span class="k">assert</span><span class="p">(</span> <span class="n">C_values</span> <span class="o">==</span> <span class="p">[</span> <span class="n">C</span><span class="p">(</span><span class="n">vi</span><span class="p">)</span> <span class="k">for</span> <span class="n">vi</span> <span class="ow">in</span> <span class="n">v</span> <span class="p">]</span> <span class="p">)</span>
<span class="c"># combine results into values of A at the w points</span>
<span class="n">A_values</span> <span class="o">=</span> <span class="p">[</span> <span class="p">(</span> <span class="n">B_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">w</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*</span> <span class="n">C_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">_</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">w</span><span class="p">)</span> <span class="p">]</span>
<span class="k">assert</span><span class="p">(</span> <span class="n">A_values</span> <span class="o">==</span> <span class="p">[</span> <span class="n">A</span><span class="p">(</span><span class="n">wi</span><span class="p">)</span> <span class="k">for</span> <span class="n">wi</span> <span class="ow">in</span> <span class="n">w</span> <span class="p">]</span> <span class="p">)</span>
</code></pre></div></div>
<p>So far we haven’t saved much, but the second insight fixes that: by picking the points <code class="highlighter-rouge">w</code> to be the elements of a subgroup of order 4, the <code class="highlighter-rouge">v</code> points used for <code class="highlighter-rouge">B</code> and <code class="highlighter-rouge">C</code> will form a subgroup of order 2 due to the squaring; hence, we will have <code class="highlighter-rouge">v[0] == v[2]</code> and <code class="highlighter-rouge">v[1] == v[3]</code> and so only need the first halves of <code class="highlighter-rouge">B_values</code> and <code class="highlighter-rouge">C_values</code> – as such we have cut the subproblems in half!</p>
<p>Such subgroups are typically characterized by a generator, i.e. an element of the field that when raised to powers will take on exactly the values of the subgroup elements. Historically such generators are denoted by the omega symbol so let’s follow that convention here as well.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># generator of subgroup of order 4</span>
<span class="n">omega4</span> <span class="o">=</span> <span class="mi">179</span>
<span class="n">w</span> <span class="o">=</span> <span class="p">[</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega4</span><span class="p">,</span> <span class="n">e</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span> <span class="p">]</span>
<span class="k">assert</span><span class="p">(</span> <span class="n">w</span> <span class="o">==</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">179</span><span class="p">,</span> <span class="mi">432</span><span class="p">,</span> <span class="mi">254</span><span class="p">]</span> <span class="p">)</span>
</code></pre></div></div>
<p>We shall return to how to find such generator below, but note that once we know one of order 4 then it’s easy to find one of order 2: we simply square.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># generator of subgroup of order 2</span>
<span class="n">omega2</span> <span class="o">=</span> <span class="n">omega4</span> <span class="o">*</span> <span class="n">omega4</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">v</span> <span class="o">=</span> <span class="p">[</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega2</span><span class="p">,</span> <span class="n">e</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="p">]</span>
<span class="k">assert</span><span class="p">(</span> <span class="n">v</span> <span class="o">==</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">432</span><span class="p">]</span> <span class="p">)</span>
</code></pre></div></div>
<p>As a quick test we may also check that the orders are indeed as claimed. Specifically, if we keep raising <code class="highlighter-rouge">omega4</code> to higher powers then we except to keep visiting the same four numbers, and likewise we expect to keep visiting the same two numbers for <code class="highlighter-rouge">omega2</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">assert</span><span class="p">(</span> <span class="p">[</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega4</span><span class="p">,</span> <span class="n">e</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">8</span><span class="p">)</span> <span class="p">]</span> <span class="o">==</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">179</span><span class="p">,</span> <span class="mi">432</span><span class="p">,</span> <span class="mi">254</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">179</span><span class="p">,</span> <span class="mi">432</span><span class="p">,</span> <span class="mi">254</span><span class="p">]</span> <span class="p">)</span>
<span class="k">assert</span><span class="p">(</span> <span class="p">[</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega2</span><span class="p">,</span> <span class="n">e</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">8</span><span class="p">)</span> <span class="p">]</span> <span class="o">==</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">432</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">432</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">432</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">432</span><span class="p">]</span> <span class="p">)</span>
</code></pre></div></div>
<p>Using generators we also see that there is no need to explicitly calculate the lists <code class="highlighter-rouge">w</code> and <code class="highlighter-rouge">v</code> anymore as they are now implicitly defined by the generator. So, with these change we come back to our mission of computing the values of <code class="highlighter-rouge">A</code> at the points determined by the powers of <code class="highlighter-rouge">omega4</code>, which may then be done via <code class="highlighter-rouge">A_values[i] = B_values[i % 2] + pow(omega4, i, Q) * C_values[i % 2]</code>.</p>
<p>The third and final insight we need is that we can of course continue this process of diving the polynomial in half: to compute e.g. <code class="highlighter-rouge">B_values</code> we break <code class="highlighter-rouge">B</code> into two polynomials <code class="highlighter-rouge">D</code> and <code class="highlighter-rouge">E</code> and then follow the same procedure; in this case <code class="highlighter-rouge">D</code> and <code class="highlighter-rouge">E</code> will be simple constants but it works in the general case as well. The only requirement is that the length <code class="highlighter-rouge">L</code> is a power of 2 and that we can find a generator <code class="highlighter-rouge">omegaL</code> of a subgroup of this size.</p>
<h2 id="algorithm-for-powers-of-2">Algorithm for powers of 2</h2>
<p>Putting the above into an algorithm we get the following, where <code class="highlighter-rouge">omega</code> is assumed to be a generator of order <code class="highlighter-rouge">len(A_coeffs)</code>. Note that some typical optimizations are omitted for clarity (but see e.g. <a href="https://github.com/mortendahl/privateml/blob/master/secret-sharing/Fast%20Fourier%20Transform.ipynb">the Python notebook</a>).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">fft2_forward</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">,</span> <span class="n">omega</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">A_coeffs</span>
<span class="c"># split A into B and C such that A(x) = B(x^2) + x * C(x^2)</span>
<span class="n">B_coeffs</span> <span class="o">=</span> <span class="n">A_coeffs</span><span class="p">[</span><span class="mi">0</span><span class="p">::</span><span class="mi">2</span><span class="p">]</span>
<span class="n">C_coeffs</span> <span class="o">=</span> <span class="n">A_coeffs</span><span class="p">[</span><span class="mi">1</span><span class="p">::</span><span class="mi">2</span><span class="p">]</span>
<span class="c"># apply recursively</span>
<span class="n">omega_squared</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">B_values</span> <span class="o">=</span> <span class="n">fft2_forward</span><span class="p">(</span><span class="n">B_coeffs</span><span class="p">,</span> <span class="n">omega_squared</span><span class="p">)</span>
<span class="n">C_values</span> <span class="o">=</span> <span class="n">fft2_forward</span><span class="p">(</span><span class="n">C_coeffs</span><span class="p">,</span> <span class="n">omega_squared</span><span class="p">)</span>
<span class="c"># combine subresults</span>
<span class="n">A_values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">)</span>
<span class="n">L_half</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">)</span> <span class="o">//</span> <span class="mi">2</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">L_half</span><span class="p">):</span>
<span class="n">j</span> <span class="o">=</span> <span class="n">i</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">A_values</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">B_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span> <span class="o">*</span> <span class="n">C_values</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="n">L_half</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">A_values</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">B_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span> <span class="o">*</span> <span class="n">C_values</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">A_values</span>
</code></pre></div></div>
<p>With this procedure we may convert a polynomial in coefficient form to its point-value form, i.e. evaluate the polynomial, in <code class="highlighter-rouge">Oh(L * log L)</code> operations.</p>
<p>The freedom we gave up to achieve this is that the number of coefficients <code class="highlighter-rouge">L</code> must now be a power of 2; but of course, some of the them may be zero so we are still free to choose the degree of the polynomial as we wish up to <code class="highlighter-rouge">L-1</code>. Also, we are no longer free to choose any set of evaluation points but have to choose a set with a certain subgroup structure.</p>
<p>Finally, it turns out that we can also use the above procedure to go in the opposite direction from point-value form to coefficient form, i.e. interpolate the least degree polynomial. We see that this is simply done by essentially treating the values as coefficients followed by a scaling, but won’t go into the details here.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">fft2_backward</span><span class="p">(</span><span class="n">A_values</span><span class="p">,</span> <span class="n">omega</span><span class="p">):</span>
<span class="n">L_inv</span> <span class="o">=</span> <span class="n">inverse</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">A_values</span><span class="p">))</span>
<span class="n">A_coeffs</span> <span class="o">=</span> <span class="p">[</span> <span class="p">(</span><span class="n">a</span> <span class="o">*</span> <span class="n">L_inv</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">fft2_forward</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">inverse</span><span class="p">(</span><span class="n">omega</span><span class="p">))</span> <span class="p">]</span>
<span class="k">return</span> <span class="n">A_coeffs</span>
</code></pre></div></div>
<p>Here however we may feel a stronger impact of the constraints implied by the FFT: while we can use zero coefficients to “patch up” the coefficient representation of a lower degree polynomial to make its length match our target length <code class="highlighter-rouge">L</code> but keeping its identity, we cannot simply add e.g. zero pairs to a point-value representation as it may change the implicit least degree polynomial; as we will see in the next blog post this has implications for our application to secret sharing if we also want to use the FFT for reconstruction.</p>
<h2 id="algorithm-for-powers-of-3">Algorithm for powers of 3</h2>
<p>Unsurprisingly there is nothing in the principles behind the FFT that means it will only work for powers of 2, and other bases can indeed be used as well. Luckily perhaps, since this plays a big part in our application to secret sharing as we will see below.</p>
<p>To adapt the FFT algorithm to powers of 3 we instead assume that the list of coefficients of <code class="highlighter-rouge">A</code> has such a length, and split it into three polynomials <code class="highlighter-rouge">B</code>, <code class="highlighter-rouge">C</code>, and <code class="highlighter-rouge">D</code> such that <code class="highlighter-rouge">A(x) = B(x^3) + x * C(x^3) + x^2 * D(x^3)</code>, and we use the cube of <code class="highlighter-rouge">omega</code> in the recursive calls instead of the square. Here <code class="highlighter-rouge">omega</code> is again assumed be a generator of order <code class="highlighter-rouge">len(A_coeffs)</code>, but this time a power of 3.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">fft3_forward</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">,</span> <span class="n">omega</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">A_coeffs</span>
<span class="c"># split A into B, C, and D such that A(x) = B(x^3) + x * C(x^3) + x^2 * D(x^3)</span>
<span class="n">B_coeffs</span> <span class="o">=</span> <span class="n">A_coeffs</span><span class="p">[</span><span class="mi">0</span><span class="p">::</span><span class="mi">3</span><span class="p">]</span>
<span class="n">B_coeffs</span> <span class="o">=</span> <span class="n">A_coeffs</span><span class="p">[</span><span class="mi">1</span><span class="p">::</span><span class="mi">3</span><span class="p">]</span>
<span class="n">B_coeffs</span> <span class="o">=</span> <span class="n">A_coeffs</span><span class="p">[</span><span class="mi">2</span><span class="p">::</span><span class="mi">3</span><span class="p">]</span>
<span class="c"># apply recursively</span>
<span class="n">omega_cubed</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">B_values</span> <span class="o">=</span> <span class="n">fft3_forward</span><span class="p">(</span><span class="n">B_coeffs</span><span class="p">,</span> <span class="n">omega_cubed</span><span class="p">)</span>
<span class="n">C_values</span> <span class="o">=</span> <span class="n">fft3_forward</span><span class="p">(</span><span class="n">B_coeffs</span><span class="p">,</span> <span class="n">omega_cubed</span><span class="p">)</span>
<span class="n">D_values</span> <span class="o">=</span> <span class="n">fft3_forward</span><span class="p">(</span><span class="n">B_coeffs</span><span class="p">,</span> <span class="n">omega_cubed</span><span class="p">)</span>
<span class="c"># combine subresults</span>
<span class="n">A_values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">)</span>
<span class="n">L_third</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">A_coeffs</span><span class="p">)</span> <span class="o">//</span> <span class="mi">3</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">L_third</span><span class="p">):</span>
<span class="n">j</span> <span class="o">=</span> <span class="n">i</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">xx</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">A_values</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">B_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span> <span class="o">*</span> <span class="n">C_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">xx</span> <span class="o">*</span> <span class="n">D_values</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="n">L_third</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">xx</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">A_values</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">B_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span> <span class="o">*</span> <span class="n">C_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">xx</span> <span class="o">*</span> <span class="n">D_values</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="n">L_third</span> <span class="o">+</span> <span class="n">L_third</span>
<span class="n">x</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">omega</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">xx</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">Q</span><span class="p">)</span>
<span class="n">A_values</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">B_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span> <span class="o">*</span> <span class="n">C_values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">xx</span> <span class="o">*</span> <span class="n">D_values</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">A_values</span>
</code></pre></div></div>
<p>And again we may go in the opposite direction and perform interpolation by simply treating the values as coefficients and performing a scaling.</p>
<h2 id="optimizations">Optimizations</h2>
<p>For easy of presentation we have omitted some typical optimizations here, perhaps most typically the fact that for powers of 2 we have the property that <code class="highlighter-rouge">pow(omega, i, Q) == -pow(omega, i + L/2, Q)</code>, meaning we can cut the number of exponentiations in <code class="highlighter-rouge">fft2</code> in half compared to what we did above.</p>
<p>More interestingly, the FFTs can be also run in-place and hence reusing the list in which the input is provided. This saves memory allocations and has a significant impact on performance. Likewise, we may gain improvements by switching to another number representation such as <a href="https://en.wikipedia.org/wiki/Montgomery_modular_multiplication">Montgomery form</a>. Both of these approaches are described in further detail <a href="https://medium.com/snips-ai/optimizing-threshold-secret-sharing-c877901231e5">elsewhere</a>.</p>
<h1 id="application-to-secret-sharing">Application to Secret Sharing</h1>
<p>We can now return to applying the FFT to the secret sharing schemes. As mentioned earlier, using this instead of the more traditional approaches makes most sense when the vectors we are dealing with are above a certain size, such as if we are generating many shares or sharing many secrets together.</p>
<h2 id="shamirs-scheme">Shamir’s scheme</h2>
<p>In this scheme we can easily sample our polynomial directly in coefficient representation, and hence the FFT is only relevant in the second step where we generate the shares. Concretely, we can directly sample the polynomial with the desired number of coefficients to match our privacy threshold, and add extra zeros to get a number of coefficients matching the number of shares we want; below the former list is denoted as <code class="highlighter-rouge">small</code> and the latter as <code class="highlighter-rouge">large</code>. We then apply the forward FFT to turn this into a list of values that we take as the shares.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">shamir_share</span><span class="p">(</span><span class="n">secret</span><span class="p">):</span>
<span class="n">small_coeffs</span> <span class="o">=</span> <span class="p">[</span><span class="n">secret</span><span class="p">]</span> <span class="o">+</span> <span class="p">[</span><span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">T</span><span class="p">)]</span>
<span class="n">large_coeffs</span> <span class="o">=</span> <span class="n">small_coeffs</span> <span class="o">+</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">ORDER_LARGE</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">small_coeffs</span><span class="p">))</span>
<span class="n">large_values</span> <span class="o">=</span> <span class="n">fft3_forward</span><span class="p">(</span><span class="n">large_coeffs</span><span class="p">,</span> <span class="n">OMEGA_LARGE</span><span class="p">)</span>
<span class="n">shares</span> <span class="o">=</span> <span class="n">large_values</span>
<span class="k">return</span> <span class="n">shares</span>
</code></pre></div></div>
<p>Besides the privacy threshold <code class="highlighter-rouge">T</code> and the number of shares <code class="highlighter-rouge">N</code>, the parameters needed for the scheme is hence a prime <code class="highlighter-rouge">Q</code> and a generator <code class="highlighter-rouge">OMEGA_LARGE</code> of order <code class="highlighter-rouge">ORDER_LARGE == N + 1</code>.</p>
<p>Note that we’ve used the FFT for powers of 3 here to be consistent with the next scheme; the FFT for powers of 2 would of course also have worked.</p>
<h2 id="packed-scheme">Packed scheme</h2>
<p>Recall that for this scheme it is less obvious how we can sample our polynomial directly in coefficient representation, and hence we instead do so in point-value representation. Specifically, we first use the backward FFT for powers of 2 to turn such a polynomial into coefficient representation, and then as above use the forward FFT for powers of 3 on this to generate the shares.</p>
<p>We are hence dealing with two sets of points: those used during sampling, and those used during share generation – and these cannot overlap! If they did the privacy guarantee would no longer be satisfied and some of the shares might literally equal some of the secrets.</p>
<p>Preventing this from happening is the reason we use the two different bases 2 and 3: by picking co-prime bases, i.e. <code class="highlighter-rouge">gcd(2, 3) == 1</code>, the subgroups will only have the point 1 in common (as the two generators raised to the zeroth power). As such we are safe if we simply make sure to exclude the value at point 1 from being used. Recalling our walk-through example, this is the reason we used prime <code class="highlighter-rouge">Q == 433</code> since its order <code class="highlighter-rouge">Q-1 == 432 == 4 * 9 * k</code> is divided by both a power of 2 and a power of 3.</p>
<p>So to do sharing we first sample the values of the polynomial, fixing the value at point 1 to be a constant (in this case zero). Using the backward FFT we then turn this into a <code class="highlighter-rouge">small</code> list of coefficients, which we then as in Shamir’s scheme extend with zero coefficients to get a <code class="highlighter-rouge">large</code> list of coefficients suitable for running through the forward FFT. Finally, since the first value obtained from this corresponds to point 1, and hence is the same as the constant used before, we remove it before returning the values as shares.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">packed_share</span><span class="p">(</span><span class="n">secrets</span><span class="p">):</span>
<span class="n">small_values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">secrets</span> <span class="o">+</span> <span class="p">[</span><span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">T</span><span class="p">)]</span>
<span class="n">small_coeffs</span> <span class="o">=</span> <span class="n">fft2_backward</span><span class="p">(</span><span class="n">small_values</span><span class="p">,</span> <span class="n">OMEGA_SMALL</span><span class="p">)</span>
<span class="n">large_coeffs</span> <span class="o">=</span> <span class="n">small_coeffs</span> <span class="o">+</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">ORDER_LARGE</span> <span class="o">-</span> <span class="n">ORDER_SMALL</span><span class="p">)</span>
<span class="n">large_values</span> <span class="o">=</span> <span class="n">fft3_forward</span><span class="p">(</span><span class="n">large_coeffs</span><span class="p">,</span> <span class="n">OMEGA_LARGE</span><span class="p">)</span>
<span class="n">shares</span> <span class="o">=</span> <span class="n">large_values</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">return</span> <span class="n">shares</span>
</code></pre></div></div>
<p>For this scheme, besides <code class="highlighter-rouge">T</code>, <code class="highlighter-rouge">N</code>, and the number <code class="highlighter-rouge">K</code> of secrets packed together, the parameters for this scheme is hence the prime <code class="highlighter-rouge">Q</code> and the two generators <code class="highlighter-rouge">OMEGA_SMALL</code> and <code class="highlighter-rouge">OMEGA_LARGE</code> of order respectively <code class="highlighter-rouge">ORDER_SMALL == T + K + 1</code> and <code class="highlighter-rouge">ORDER_LARGE == N + 1</code>.</p>
<p>We will talk more about how to do efficient reconstruction in the next blog post, but note that if all the shares are known then the above sharing procedure can efficiently be run backwards by simply running the two FFTs in their opposite direction.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">packed_reconstruct</span><span class="p">(</span><span class="n">shares</span><span class="p">):</span>
<span class="n">large_values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">shares</span>
<span class="n">large_coeffs</span> <span class="o">=</span> <span class="n">fft3_backward</span><span class="p">(</span><span class="n">large_values</span><span class="p">,</span> <span class="n">OMEGA_LARGE</span><span class="p">)</span>
<span class="n">small_coeffs</span> <span class="o">=</span> <span class="n">large_coeffs</span><span class="p">[:</span><span class="n">ORDER_SMALL</span><span class="p">]</span>
<span class="n">small_values</span> <span class="o">=</span> <span class="n">fft2_forward</span><span class="p">(</span><span class="n">small_coeffs</span><span class="p">,</span> <span class="n">OMEGA_SMALL</span><span class="p">)</span>
<span class="n">secrets</span> <span class="o">=</span> <span class="n">small_values</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">K</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="k">return</span> <span class="n">secrets</span>
</code></pre></div></div>
<p>However this only works if all shares are known and correct: any loss or tampering will get in the way of using the FFT for reconstruction, unless we add an additional ingredient. Fixing this is the topic of the next blog post.</p>
<h2 id="performance-evaluation">Performance evaluation</h2>
<p>To test the performance impact of using the FFT for share generation in Shamir’s scheme, we let the number of shares <code class="highlighter-rouge">N</code> take on values <code class="highlighter-rouge">2</code>, <code class="highlighter-rouge">8</code>, <code class="highlighter-rouge">26</code>, <code class="highlighter-rouge">80</code> and <code class="highlighter-rouge">242</code>, and for each of them compare against the typical approach of using Horner’s rule. For the former we have an asymptotic complexity of <code class="highlighter-rouge">Oh(N * log N)</code> while for the latter we have <code class="highlighter-rouge">Oh(N * T)</code>, and as such it is also interesting to vary <code class="highlighter-rouge">T</code>; we do so with <code class="highlighter-rouge">T = N/2</code> and <code class="highlighter-rouge">T = N/4</code>, representing respectively a medium and low privacy threshold.</p>
<p>All measures are in nanoseconds (1/1,000,000 milliseconds) and performed with <a href="https://github.com/mortendahl/rust-threshold-secret-sharing">our Rust implementation</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">10</span><span class="p">))</span>
<span class="n">shares</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">26</span><span class="p">,</span> <span class="mi">80</span> <span class="p">]</span> <span class="c">#, 242 ]</span>
<span class="n">n2_fft</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">214</span><span class="p">,</span> <span class="mi">402</span><span class="p">,</span> <span class="mi">1012</span><span class="p">,</span> <span class="mi">2944</span> <span class="p">]</span> <span class="c">#, 10525 ]</span>
<span class="n">n2_horner</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">51</span><span class="p">,</span> <span class="mi">289</span><span class="p">,</span> <span class="mi">2365</span><span class="p">,</span> <span class="mi">22278</span> <span class="p">]</span> <span class="c">#, 203630 ]</span>
<span class="n">n4_fft</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">227</span><span class="p">,</span> <span class="mi">409</span><span class="p">,</span> <span class="mi">1038</span><span class="p">,</span> <span class="mi">3105</span> <span class="p">]</span> <span class="c">#, 10470 ]</span>
<span class="n">n4_horner</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">54</span><span class="p">,</span> <span class="mi">180</span><span class="p">,</span> <span class="mi">1380</span><span class="p">,</span> <span class="mi">11631</span> <span class="p">]</span> <span class="c">#, 104388 ]</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">shares</span><span class="p">,</span> <span class="n">n2_fft</span><span class="p">,</span> <span class="s">'ro--'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'b'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'T = N/2: FFT'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">shares</span><span class="p">,</span> <span class="n">n2_horner</span><span class="p">,</span> <span class="s">'rs--'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'r'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'T = N/2: Horner'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">shares</span><span class="p">,</span> <span class="n">n4_fft</span><span class="p">,</span> <span class="s">'ro--'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'c'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'T = N/4: FFT'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">shares</span><span class="p">,</span> <span class="n">n4_horner</span><span class="p">,</span> <span class="s">'rs--'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'y'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'T = N/4: Horner'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p>Note that the numbers for <code class="highlighter-rouge">N = 242</code> are omitted in the graph to avoid hiding the results for the smaller values.</p>
<center><img src="https://mortendahl.github.io/assets/secret-sharing/share-performance-shamir.png" /></center>
<p>For the packed scheme we keep <code class="highlighter-rouge">T = N/4</code> and <code class="highlighter-rouge">K = N/2</code> fixed (meaning <code class="highlighter-rouge">R = 3N/4</code>) and let <code class="highlighter-rouge">N</code> vary as above. We then compare three different approaches for generating shares, all starting out with sampling a polynomial in point-value representation:</p>
<ol>
<li><code class="highlighter-rouge">FFT + FFT</code>: Backward FFT to convert into coefficient representation, followed by forward FFT for evaluation</li>
<li><code class="highlighter-rouge">FFT + Horner</code>: Backward FFT to convert into coefficient representation, followed by Horner’s rule for evaluation</li>
<li><code class="highlighter-rouge">Lagrange</code>: Use precomputed Lagrange constants for share points to directly obtain shares</li>
</ol>
<p>where the third option requires additional storage for the precomputed constants (computing them on the fly increases the running time significantly but can of course be amortized away if processing a large number of batches).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">10</span><span class="p">))</span>
<span class="n">shares</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">26</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="mi">242</span> <span class="p">]</span>
<span class="n">fft_fft</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">840</span><span class="p">,</span> <span class="mi">1998</span><span class="p">,</span> <span class="mi">5288</span><span class="p">,</span> <span class="mi">15102</span> <span class="p">]</span>
<span class="n">fft_horner</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">898</span><span class="p">,</span> <span class="mi">3612</span><span class="p">,</span> <span class="mi">37641</span><span class="p">,</span> <span class="mi">207087</span> <span class="p">]</span>
<span class="n">lagrange_pre</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">246</span><span class="p">,</span> <span class="mi">1367</span><span class="p">,</span> <span class="mi">16510</span><span class="p">,</span> <span class="mi">102317</span> <span class="p">]</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">shares</span><span class="p">,</span> <span class="n">fft_fft</span><span class="p">,</span> <span class="s">'ro--'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'b'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'FFT + FFT'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">shares</span><span class="p">,</span> <span class="n">fft_horner</span><span class="p">,</span> <span class="s">'ro--'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'r'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'FFT + Horner'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">shares</span><span class="p">,</span> <span class="n">lagrange_pre</span><span class="p">,</span> <span class="s">'rs--'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'y'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Lagrange (precomp.)'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p>We note that the Lagrange approach remains superior up to the setting with 26 shares, after which it’s interesting to use the two step FFT.</p>
<center><img src="https://mortendahl.github.io/assets/secret-sharing/share-performance-packed.png" /></center>
<p>From this small amount of empirical data the FFT seems like the obvious choice as soon as the number of shares is sufficiently high. Question of course, is in which applications this is the case. We will explore this further in a future blog post (or see e.g. <a href="https://eprint.iacr.org/2017/643">our paper</a>).</p>
<h1 id="parameter-generation">Parameter Generation</h1>
<p>Since there are no security implications in re-using the same fixed set of parameters (i.e. <code class="highlighter-rouge">Q</code>, <code class="highlighter-rouge">OMEGA_SMALL</code>, and <code class="highlighter-rouge">OMEGA_LARGE</code>) across applications, parameter generation is perhaps less important compared to for instance key generation in encryption schemes. Nonetheless, one of the benefits of secret sharing schemes is their ability to avoid big expansion factors by using parameters tailored to the use case; concretely, to pick a field of just the right size. As such we shall now fill in this final piece of the puzzle and see how a set of parameters fitting with the FFTs used in the packed scheme can be generated.</p>
<p>Our main abstraction is the <code class="highlighter-rouge">generate_parameters</code> function which takes a desired minimum field size in bits, as well as the number of secrets <code class="highlighter-rouge">k</code> we which to packed together, the privacy threshold <code class="highlighter-rouge">t</code> we want, and the number <code class="highlighter-rouge">n</code> of shares to generate. Accounting for the value at point 1 that we are throwing away (see earlier), to be suitable for the two FFTs, we must then have that <code class="highlighter-rouge">k + t + 1</code> is a power of 2 and that <code class="highlighter-rouge">n + 1</code> is a power of 3.</p>
<p>To next make sure that our field has two subgroups with those number of elements, we simply need to find a field whose order is divided by both numbers. Specifically, since we’re considering prime fields, we need to find a prime <code class="highlighter-rouge">q</code> such that its order <code class="highlighter-rouge">q-1</code> is divided by both sizes. Finally, we also need a generator <code class="highlighter-rouge">g</code> of the field, which can be turned into generators <code class="highlighter-rouge">omega_small</code> and <code class="highlighter-rouge">omega_large</code> of the subgroups.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">generate_parameters</span><span class="p">(</span><span class="n">min_bitsize</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="n">order_small</span> <span class="o">=</span> <span class="n">k</span> <span class="o">+</span> <span class="n">t</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">order_large</span> <span class="o">=</span> <span class="n">n</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">order_divisor</span> <span class="o">=</span> <span class="n">order_small</span> <span class="o">*</span> <span class="n">order_large</span>
<span class="n">q</span><span class="p">,</span> <span class="n">g</span> <span class="o">=</span> <span class="n">find_prime_field</span><span class="p">(</span><span class="n">min_bitsize</span><span class="p">,</span> <span class="n">order_divisor</span><span class="p">)</span>
<span class="n">order</span> <span class="o">=</span> <span class="n">q</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">omega_small</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">g</span><span class="p">,</span> <span class="n">order</span> <span class="o">//</span> <span class="n">order_small</span><span class="p">,</span> <span class="n">q</span><span class="p">)</span>
<span class="n">omega_large</span> <span class="o">=</span> <span class="nb">pow</span><span class="p">(</span><span class="n">g</span><span class="p">,</span> <span class="n">order</span> <span class="o">//</span> <span class="n">order_large</span><span class="p">,</span> <span class="n">q</span><span class="p">)</span>
<span class="k">return</span> <span class="n">q</span><span class="p">,</span> <span class="n">omega_small</span><span class="p">,</span> <span class="n">omega_large</span>
</code></pre></div></div>
<p>Finding our <code class="highlighter-rouge">q</code> and <code class="highlighter-rouge">g</code> is done by <code class="highlighter-rouge">find_prime_field</code>, which works by first finding a prime of the right size and with the right order. To then also find the generator we need a piece of auxiliary information, namely the prime factors in the order.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">find_prime_field</span><span class="p">(</span><span class="n">min_bitsize</span><span class="p">,</span> <span class="n">order_divisor</span><span class="p">):</span>
<span class="n">q</span><span class="p">,</span> <span class="n">order_prime_factors</span> <span class="o">=</span> <span class="n">find_prime</span><span class="p">(</span><span class="n">min_bitsize</span><span class="p">,</span> <span class="n">order_divisor</span><span class="p">)</span>
<span class="n">g</span> <span class="o">=</span> <span class="n">find_generator</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">order_prime_factors</span><span class="p">)</span>
<span class="k">return</span> <span class="n">q</span><span class="p">,</span> <span class="n">g</span>
</code></pre></div></div>
<p>The reason for this is that we can use the prime factors of the order to efficiently test whether an arbitrary candidate element in the field is in fact a generator with that order. This follows from <a href="https://en.wikipedia.org/wiki/Lagrange%27s_theorem_(group_theory)">Lagrange’s theorem</a> as detailed in standard textbooks on the matter.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">find_generator</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">order_prime_factors</span><span class="p">):</span>
<span class="n">order</span> <span class="o">=</span> <span class="n">q</span> <span class="o">-</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">candidate</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">q</span><span class="p">):</span>
<span class="k">for</span> <span class="n">factor</span> <span class="ow">in</span> <span class="n">order_prime_factors</span><span class="p">:</span>
<span class="n">exponent</span> <span class="o">=</span> <span class="n">order</span> <span class="o">//</span> <span class="n">factor</span>
<span class="k">if</span> <span class="nb">pow</span><span class="p">(</span><span class="n">candidate</span><span class="p">,</span> <span class="n">exponent</span><span class="p">,</span> <span class="n">q</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">break</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">candidate</span>
</code></pre></div></div>
<p>This leaves us with only a few remaining question regarding finding prime numbers as explained next.</p>
<h2 id="finding-primes">Finding primes</h2>
<p>To find a prime <code class="highlighter-rouge">q</code> with the desired structure (i.e. of a certain minimum size and whose order <code class="highlighter-rouge">q-1</code> has a given divisor) we may either do rejection sampling of primes until we hit one that satisfies our need, or we may construct it from smaller parts so that it by design fits with what we need. The latter appears more efficient so that is what we will do here.</p>
<p>Specifically, given <code class="highlighter-rouge">min_bitsize</code> and <code class="highlighter-rouge">order_divisor</code> we will do rejection sampling over two values <code class="highlighter-rouge">k1</code> and <code class="highlighter-rouge">k2</code> until <code class="highlighter-rouge">q = k1 * k2 * order_divisor + 1</code> is a <a href="https://en.wikipedia.org/wiki/Probable_prime">probable prime</a>. The <code class="highlighter-rouge">k1</code> is used to ensure that the minimum size is met, and <code class="highlighter-rouge">k2</code> is used to give us a bit of wiggle room – it can in principle be omitted, but empirical tests show that it doesn’t have to be very large it give an efficiency boost, at the expense of potentially overshooting the desired field size by a few bits. Finally, since we also need to know the prime factorization of <code class="highlighter-rouge">q - 1</code>, and since this in general is believed to be an <a href="https://en.wikipedia.org/wiki/Integer_factorization">inherently slow process</a>, we by construction ensure that <code class="highlighter-rouge">k1</code> is a prime so that we only have to factor <code class="highlighter-rouge">k2</code> and <code class="highlighter-rouge">order_divisor</code>, which we assume to be somewhat small and hence doable.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">find_prime</span><span class="p">(</span><span class="n">min_bitsize</span><span class="p">,</span> <span class="n">order_divisor</span><span class="p">):</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">k1</span> <span class="o">=</span> <span class="n">sample_prime</span><span class="p">(</span><span class="n">min_bitsize</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k2</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">128</span><span class="p">):</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">k1</span> <span class="o">*</span> <span class="n">k2</span> <span class="o">*</span> <span class="n">order_divisor</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">is_prime</span><span class="p">(</span><span class="n">q</span><span class="p">):</span>
<span class="n">order_prime_factors</span> <span class="o">=</span> <span class="p">[</span><span class="n">k1</span><span class="p">]</span>
<span class="n">order_prime_factors</span> <span class="o">+=</span> <span class="n">prime_factor</span><span class="p">(</span><span class="n">k2</span><span class="p">)</span>
<span class="n">order_prime_factors</span> <span class="o">+=</span> <span class="n">prime_factor</span><span class="p">(</span><span class="n">order_divisor</span><span class="p">)</span>
<span class="k">return</span> <span class="n">q</span><span class="p">,</span> <span class="n">order_prime_factors</span>
</code></pre></div></div>
<p>Sampling primes are done using a standard <a href="https://en.wikipedia.org/wiki/Primality_test">randomized primality test</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">sample_prime</span><span class="p">(</span><span class="n">bitsize</span><span class="p">):</span>
<span class="n">lower</span> <span class="o">=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="p">(</span><span class="n">bitsize</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="n">upper</span> <span class="o">=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="p">(</span><span class="n">bitsize</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">candidate</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">lower</span><span class="p">,</span> <span class="n">upper</span><span class="p">)</span>
<span class="k">if</span> <span class="n">is_prime</span><span class="p">(</span><span class="n">candidate</span><span class="p">):</span>
<span class="k">return</span> <span class="n">candidate</span>
</code></pre></div></div>
<p>And factoring a number is done by simply trying a fixed set of all small primes in sequence; this will of course not work if the input is too large, but that is not likely to happen in real-world applications.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">prime_factor</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">factors</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">prime</span> <span class="ow">in</span> <span class="n">SMALL_PRIMES</span><span class="p">:</span>
<span class="k">if</span> <span class="n">prime</span> <span class="o">></span> <span class="n">x</span><span class="p">:</span> <span class="k">break</span>
<span class="k">if</span> <span class="n">x</span> <span class="o">%</span> <span class="n">prime</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">factors</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">prime</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">remove_factor</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">prime</span><span class="p">)</span>
<span class="k">assert</span><span class="p">(</span><span class="n">x</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">factors</span>
</code></pre></div></div>
<p>Putting these pieces together we end up with an efficient procedure for generating parameters for use with FFTs: finding large fields of size e.g. 128bits is a matter of milliseconds.</p>
<h1 id="next-steps">Next Steps</h1>
<p>While we have seen that the Fast Fourier Transform can be used to greatly speed up the sharing process, it has a serious limitation when it comes to speeding up the reconstruction process: in its current form it requires all shares to be present and untampered with. As such, for some applications we may be forced to resort to the more traditional and slower approaches of <a href="https://en.wikipedia.org/wiki/Newton_polynomial">Newton</a> or <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial">Laplace</a> interpolation.</p>
<p>In <a href="/2017/08/13/secret-sharing-part3/">the next blog post</a> we will look at a technique for also using the Fast Fourier Transform for reconstruction, using techniques from error correction codes to account for missing or faulty shares, yet get similar speedup benefits to what we achieved here.</p>Morten DahlTL;DR: efficient secret sharing requires fast polynomial evaluation and interpolation; here we go through what it takes to use the well-known Fast Fourier Transform for this.Secret Sharing, Part 12017-06-04T12:00:00+00:002017-06-04T12:00:00+00:00https://mortendahl.github.io/2017/06/04/secret-sharing-part1<p><em><strong>TL;DR:</strong> first part in a series where we look at secret sharing schemes, including the lesser known packed variant of Shamir’s scheme, and give full and efficient implementations; here we start with the textbook approaches, with follow-up posts focusing on improvements from more advanced techniques for <a href="/2017/06/24/secret-sharing-part2">sharing</a> and <a href="/2017/08/13/secret-sharing-part3">reconstruction</a>.</em></p>
<p><a href="https://en.wikipedia.org/wiki/Secret_sharing">Secret sharing</a> is an old well-known cryptographic primitive, with existing real-world applications in e.g. <a href="https://bitcoinmagazine.com/articles/threshold-signatures-new-standard-wallet-security-1425937098">Bitcoin signatures</a> and <a href="https://www.vaultproject.io/docs/internals/security.html">password management</a>. But perhaps more interestingly, secret sharing also has strong links to <a href="https://en.wikipedia.org/wiki/Secure_multi-party_computation">secure computation</a> and may for instance be used for <a href="/2017/04/17/private-deep-learning-with-mpc/">private machine learning</a>.</p>
<p>The essence of the primitive is that a <em>dealer</em> wants to split a <em>secret</em> into several <em>shares</em> given to <em>shareholders</em>, in such a way that each individual shareholder learns nothing about the secret, yet if sufficiently many re-combine their shares then the secret can be reconstructed. Intuitively, the question of <em>trust</em> changes from being about the integrity of a single individual to the non-collaboration of several parties: it becomes distributed.</p>
<p>Secret sharing schemes are also interesting from a performance point of view, as they typically rely on a bare minimum of cryptographic assumptions. In particular, by not having to make any assumptions about the hardness of certain problems such as <a href="https://en.wikipedia.org/wiki/RSA_problem">factoring integers</a>, <a href="https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange">computing discrete logarithms</a>, or <a href="https://en.wikipedia.org/wiki/Ring_Learning_with_Errors">finding short vectors</a>, secret sharing schemes can provide a computational advantage compared to other cryptographic tools such as <a href="https://en.wikipedia.org/wiki/Homomorphic_encryption">homomorphic encryption</a>.</p>
<p>In this post we’ll look at a few concrete secret sharing schemes, as well as hints on how to implement them efficiently (with a later post going into more detail). We won’t focus too much on applications but simply use private aggregation of large vectors as a running example – see e.g. our <a href="TODO">paper</a> for more use cases.</p>
<p>There is a Python notebook containing <a href="https://github.com/mortendahl/privateml/blob/master/secret-sharing/Schemes.ipynb">the code samples</a>, yet for better performance our <a href="https://crates.io/crates/threshold-secret-sharing">open source Rust library</a> is recommended.</p>
<p><em>
Parts of this blog post are derived from work done at <a href="https://snips.ai/">Snips</a> and <a href="https://medium.com/snips-ai/high-volume-secret-sharing-2e7dc5b41e9a">originally appearing in another blog post</a>. That work also included parts of the Rust implementation.
</em></p>
<h1 id="additive-sharing">Additive Sharing</h1>
<p>Let’s first assume that we have fixed a <a href="https://en.wikipedia.org/wiki/Finite_field">finite field</a> to which all secrets and shares belong, and in which all computation take place; this could for instance be <a href="https://en.wikipedia.org/wiki/Modular_arithmetic">the integers modulo a prime number</a>, i.e. <code class="highlighter-rouge">{ 0, 1, ..., Q-1 }</code> for a prime <code class="highlighter-rouge">Q</code>.</p>
<p>An easy way to split a secret <code class="highlighter-rouge">x</code> from this field into say three shares <code class="highlighter-rouge">x1</code>, <code class="highlighter-rouge">x2</code>, <code class="highlighter-rouge">x3</code>, is to simply pick <code class="highlighter-rouge">x1</code> and <code class="highlighter-rouge">x2</code> at random and let <code class="highlighter-rouge">x3 = x - x1 - x2</code>. As argued below, this hides the secret as long as no one knows more than two shares, yet if all three shares are known then <code class="highlighter-rouge">x</code> can be reconstructed by simply computing <code class="highlighter-rouge">x1 + x2 + x3</code>. More generally, this scheme is known as <em>additive sharing</em> and works for any <code class="highlighter-rouge">N</code> number of shares by picking <code class="highlighter-rouge">T = N - 1</code> random values.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">additive_share</span><span class="p">(</span><span class="n">secret</span><span class="p">):</span>
<span class="n">shares</span> <span class="o">=</span> <span class="p">[</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">]</span>
<span class="n">shares</span> <span class="o">+=</span> <span class="p">[</span> <span class="p">(</span><span class="n">secret</span> <span class="o">-</span> <span class="nb">sum</span><span class="p">(</span><span class="n">shares</span><span class="p">))</span> <span class="o">%</span> <span class="n">Q</span> <span class="p">]</span>
<span class="k">return</span> <span class="n">shares</span>
<span class="k">def</span> <span class="nf">additive_reconstruct</span><span class="p">(</span><span class="n">shares</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">shares</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
</code></pre></div></div>
<p>That the secret remains hidden as long as at most <code class="highlighter-rouge">T = N - 1</code> shareholders collaborate follows from the marginal distribution of the view of up to <code class="highlighter-rouge">T</code> shareholders being independent of the secret. More intuitively, given at most <code class="highlighter-rouge">T</code> shares, <em>any</em> guess one may make at what the secret could be, can be explained by the remaining unseen share, and is hence an equally valid guess.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">explain</span><span class="p">(</span><span class="n">seen_shares</span><span class="p">,</span> <span class="n">guess</span><span class="p">):</span>
<span class="c"># compute the unseen share that justifies the seen shares and the guess</span>
<span class="n">simulated_unseen_share</span> <span class="o">=</span> <span class="p">(</span><span class="n">guess</span> <span class="o">-</span> <span class="nb">sum</span><span class="p">(</span><span class="n">seen_shares</span><span class="p">))</span> <span class="o">%</span> <span class="n">Q</span>
<span class="c"># and the would-be sharing by combining seen and unseen shares</span>
<span class="n">simulated_shares</span> <span class="o">=</span> <span class="n">seen_shares</span> <span class="o">+</span> <span class="p">[</span><span class="n">simulated_unseen_share</span><span class="p">]</span>
<span class="k">if</span> <span class="n">additive_reconstruct</span><span class="p">(</span><span class="n">simulated_shares</span><span class="p">)</span> <span class="o">==</span> <span class="n">guess</span><span class="p">:</span>
<span class="c"># found an explanation</span>
<span class="k">return</span> <span class="n">simulated_unseen_share</span>
<span class="n">seen_shares</span> <span class="o">=</span> <span class="n">shares</span><span class="p">[:</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">for</span> <span class="n">guess</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">Q</span><span class="p">):</span>
<span class="n">explanation</span> <span class="o">=</span> <span class="n">explain</span><span class="p">(</span><span class="n">seen_shares</span><span class="p">,</span> <span class="n">guess</span><span class="p">)</span>
<span class="k">if</span> <span class="n">explanation</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"guess </span><span class="si">%</span><span class="s">d can be explained by </span><span class="si">%</span><span class="s">d"</span> <span class="o">%</span> <span class="p">(</span><span class="n">guess</span><span class="p">,</span> <span class="n">explanation</span><span class="p">))</span>
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>guess 0 can be explained by 28
guess 1 can be explained by 29
guess 2 can be explained by 30
guess 3 can be explained by 31
guess 4 can be explained by 32
guess 5 can be explained by 33
...
</code></pre></div></div>
<p>And since all we need for this argument to go through is the ability to sample random field elements, with no additional constraints on the size of the field due to e.g. hardness assumptions, this scheme is highly efficient both in terms of time and space.</p>
<h2 id="homomorphic-addition">Homomorphic addition</h2>
<p>While it is also about as simple as it gets, notice that the scheme already has a homomorphic property that allows for certain degrees of secure computation: we can add secrets together, so if e.g. <code class="highlighter-rouge">x1</code>, <code class="highlighter-rouge">x2</code>, <code class="highlighter-rouge">x3</code> is a sharing of <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y1</code>, <code class="highlighter-rouge">y2</code>, <code class="highlighter-rouge">y3</code> is a sharing of <code class="highlighter-rouge">y</code>, then <code class="highlighter-rouge">x1+y1</code>, <code class="highlighter-rouge">x2+y2</code>, <code class="highlighter-rouge">x3+y3</code> is a sharing of <code class="highlighter-rouge">x + y</code>, which can be computed individually by the three shareholders simply by adding the shares they already have (respectively <code class="highlighter-rouge">x1</code> and <code class="highlighter-rouge">y1</code>, <code class="highlighter-rouge">x2</code> and <code class="highlighter-rouge">y2</code>, and <code class="highlighter-rouge">x3</code> and <code class="highlighter-rouge">y3</code>). Then, once added, these new shares can be used reconstruct the result of the addition but not the addends.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">additive_add</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span> <span class="p">(</span><span class="n">xi</span> <span class="o">+</span> <span class="n">yi</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">xi</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">]</span>
</code></pre></div></div>
<p>More generally, we can ask the shareholders to compute linear functions of secret inputs without them seeing anything but the shares, and without learning anything besides the final output of the function.</p>
<h1 id="comparing-schemes">Comparing schemes</h1>
<p>While the above scheme is particularly simple, below are two examples of slightly more advanced schemes. One way to compare these is through the following four parameters:</p>
<ul>
<li>
<p><code class="highlighter-rouge">N</code>: the number of shares that each secret is split into</p>
</li>
<li>
<p><code class="highlighter-rouge">R</code>: the minimum number of shares needed to reconstruct the secret</p>
</li>
<li>
<p><code class="highlighter-rouge">T</code>: the maximum number of shares that may be seen without learning nothing about the secret, also known as the <em>privacy threshold</em></p>
</li>
<li>
<p><code class="highlighter-rouge">K</code>: the number of secrets shared together</p>
</li>
</ul>
<p>where, logically, we must have <code class="highlighter-rouge">R <= N</code> since otherwise reconstruction is never possible, and we must have <code class="highlighter-rouge">T < R</code> since otherwise privacy makes little sense.</p>
<p>For the additive scheme we have <code class="highlighter-rouge">R = N</code>, <code class="highlighter-rouge">K = 1</code>, and <code class="highlighter-rouge">T = R - K</code>, but below we will get rid of the first two of these constraints so that in the end we are free to choose the parameters any way we like as long as <code class="highlighter-rouge">T + K = R <= N</code>.</p>
<h1 id="shamirs-scheme">Shamir’s Scheme</h1>
<p>The additive scheme lacks some robustness by the constraint that <code class="highlighter-rouge">R = N</code>, meaning that if one of the shareholders for some reason becomes unavailable or losses his share then reconstruction is no longer possible. By moving to a different scheme we can remove this constraint and let <code class="highlighter-rouge">R</code> (and hence also <code class="highlighter-rouge">T</code>) be free to choose for any particular application.</p>
<p>In <a href="https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing">Shamir’s scheme</a>, instead of picking random field elements that sum up to the secret <code class="highlighter-rouge">x</code> as we did above, to share <code class="highlighter-rouge">x</code> we sample a random polynomial <code class="highlighter-rouge">f</code> with the condition that <code class="highlighter-rouge">f(0) = x</code> and evaluate this polynomial at <code class="highlighter-rouge">N</code> non-zero points to obtain the shares as <code class="highlighter-rouge">f(1)</code>, <code class="highlighter-rouge">f(2)</code>, …, <code class="highlighter-rouge">f(N)</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">shamir_share</span><span class="p">(</span><span class="n">secret</span><span class="p">):</span>
<span class="n">polynomial</span> <span class="o">=</span> <span class="n">sample_shamir_polynomial</span><span class="p">(</span><span class="n">secret</span><span class="p">)</span>
<span class="n">shares</span> <span class="o">=</span> <span class="p">[</span> <span class="n">evaluate_at_point</span><span class="p">(</span><span class="n">polynomial</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">SHARE_POINTS</span> <span class="p">]</span>
<span class="k">return</span> <span class="n">shares</span>
</code></pre></div></div>
<p>And by varying the degree of <code class="highlighter-rouge">f</code> we can choose how many shares are needed before reconstruction is possible, thereby removing the <code class="highlighter-rouge">R = N</code> constraint. More specifically, if the degree of <code class="highlighter-rouge">f</code> is <code class="highlighter-rouge">T</code> then we know from <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial">interpolation</a> that it is uniquely identified by either its <code class="highlighter-rouge">T+1</code> coefficients or by its value at <code class="highlighter-rouge">T+1</code> points, so that <code class="highlighter-rouge">R = T+1</code> shares allow us to reliably reconstruct.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">shamir_reconstruct</span><span class="p">(</span><span class="n">shares</span><span class="p">):</span>
<span class="n">polynomial</span> <span class="o">=</span> <span class="p">[</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">SHARE_POINTS</span><span class="p">,</span> <span class="n">shares</span><span class="p">)</span> <span class="k">if</span> <span class="n">v</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="p">]</span>
<span class="n">secret</span> <span class="o">=</span> <span class="n">interpolate_at_point</span><span class="p">(</span><span class="n">polynomial</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">secret</span>
</code></pre></div></div>
<p>And at the same time, given at most <code class="highlighter-rouge">T</code> shares, the secret is again guaranteed to be hidden since we also here can find an explanation for any guess: a guess is the value of <code class="highlighter-rouge">f</code> at point zero, so together with the <code class="highlighter-rouge">T</code> known shares, interpolation allows us to find a polynomial with the right degree that matches all values.</p>
<p>Before discussing how these operations can be done efficiently, let’s first see the properties this scheme has in terms of secure computation.</p>
<h2 id="homomorphic-addition-and-multiplication">Homomorphic addition and multiplication</h2>
<p>Since it holds for polynomials in general that <code class="highlighter-rouge">f(i) + g(i) = (f + g)(i)</code>, we also here have an additive homomorphic property that allows us to compute linear functions of secrets by simply adding the individual shares.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">shamir_add</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span> <span class="p">(</span><span class="n">xi</span> <span class="o">+</span> <span class="n">yi</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">xi</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">]</span>
</code></pre></div></div>
<p>And because it also holds that <code class="highlighter-rouge">f(i) * g(i) = (f * g)(i)</code>, we in fact now have an additional multiplicative property that allows us to compute products in the same fashion.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">shamir_mul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span> <span class="p">(</span><span class="n">xi</span> <span class="o">*</span> <span class="n">yi</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">xi</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">]</span>
</code></pre></div></div>
<p>But while this is in principle enough to perform <em>any</em> computation without seeing the inputs (since addition and multiplication can be used to express any <a href="https://en.wikipedia.org/wiki/Boolean_circuit">boolean circuit</a>), it also comes with a caveat: unlike addition, every multiplication doubles the degree of the polynomial, so we need <code class="highlighter-rouge">2T+1</code> shares to reconstruct a product instead of <code class="highlighter-rouge">T+1</code>.</p>
<p>As a result, when used in secure computation, additional steps must be taken to reduce the degree after even a small number of multiplications, which typically involve some level of interaction between the shareholders. In this light, when compared to homomorphic encryption, secret sharing in some respect replaces heavy computation with interaction.</p>
<h2 id="the-missing-pieces">The missing pieces</h2>
<p>Above we ignored the questions of how to efficiently sample, evaluate, and interpolate polynomials. The first one is easy. We want a random <code class="highlighter-rouge">T</code> degree polynomial with the constraint that <code class="highlighter-rouge">f(0) = x</code>, and we may obtain that by simply letting the zero-degree coefficient be <code class="highlighter-rouge">x</code> and picking the remaining <code class="highlighter-rouge">T</code> coefficients at random: <code class="highlighter-rouge">f(X) = (x) + (r1 * X^1) + (r2 * X^2) + ... + (rT * X^T)</code> where <code class="highlighter-rouge">x</code> is the secret, <code class="highlighter-rouge">X</code> the indeterminate, and <code class="highlighter-rouge">r1</code>, …, <code class="highlighter-rouge">rT</code> the random coefficients.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">sample_shamir_polynomial</span><span class="p">(</span><span class="n">zero_value</span><span class="p">):</span>
<span class="n">coefs</span> <span class="o">=</span> <span class="p">[</span><span class="n">zero_value</span><span class="p">]</span> <span class="o">+</span> <span class="p">[</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="p">]</span>
<span class="k">return</span> <span class="n">coefs</span>
</code></pre></div></div>
<p>This gives us the polynomial in coefficient representation, which means we can perform the second task of evaluating the polynomial at <code class="highlighter-rouge">N</code> points somewhat efficiently using e.g. <a href="https://en.wikipedia.org/wiki/Horner%27s_method">Horner’s rule</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">evaluate_at_point</span><span class="p">(</span><span class="n">coefs</span><span class="p">,</span> <span class="n">point</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">coef</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="n">coefs</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">(</span><span class="n">coef</span> <span class="o">+</span> <span class="n">point</span> <span class="o">*</span> <span class="n">result</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">result</span>
</code></pre></div></div>
<p>The interpolation step needed in reconstruction is slightly trickier. Here the polynomial is instead given in a point-value representation consisting of <code class="highlighter-rouge">T+1</code> pairs <code class="highlighter-rouge">(pi, vi)</code> that is less obviously suitable for computing <code class="highlighter-rouge">f(0)</code>.</p>
<p>However, using <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial">Lagrange interpolation</a> we can express the value of a polynomial at any point by a weighted sum of a set of constants and its value at <code class="highlighter-rouge">T+1</code> other points.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">interpolate_at_point</span><span class="p">(</span><span class="n">points_values</span><span class="p">,</span> <span class="n">point</span><span class="p">):</span>
<span class="n">points</span><span class="p">,</span> <span class="n">values</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">points_values</span><span class="p">)</span>
<span class="n">constants</span> <span class="o">=</span> <span class="n">lagrange_constants_for_point</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">point</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span> <span class="n">ci</span> <span class="o">*</span> <span class="n">vi</span> <span class="k">for</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vi</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">constants</span><span class="p">,</span> <span class="n">values</span><span class="p">)</span> <span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
</code></pre></div></div>
<p>Moreover, since these <em>Lagrange constants</em> depend only on the points and not on the values, their computation can be amortized away in case we have to perform several interpolations, as in our running example with large vectors of secrets.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">lagrange_constants_for_point</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">point</span><span class="p">):</span>
<span class="n">constants</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">points</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">points</span><span class="p">)):</span>
<span class="n">xi</span> <span class="o">=</span> <span class="n">points</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">num</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">denum</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">points</span><span class="p">)):</span>
<span class="k">if</span> <span class="n">j</span> <span class="o">!=</span> <span class="n">i</span><span class="p">:</span>
<span class="n">xj</span> <span class="o">=</span> <span class="n">points</span><span class="p">[</span><span class="n">j</span><span class="p">]</span>
<span class="n">num</span> <span class="o">=</span> <span class="p">(</span><span class="n">num</span> <span class="o">*</span> <span class="p">(</span><span class="n">xj</span> <span class="o">-</span> <span class="n">point</span><span class="p">))</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">denum</span> <span class="o">=</span> <span class="p">(</span><span class="n">denum</span> <span class="o">*</span> <span class="p">(</span><span class="n">xj</span> <span class="o">-</span> <span class="n">xi</span><span class="p">))</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">constants</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">num</span> <span class="o">*</span> <span class="n">inverse</span><span class="p">(</span><span class="n">denum</span><span class="p">))</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">constants</span>
</code></pre></div></div>
<p>Looking back at the sharing and reconstruction operations, we then see that the former takes <code class="highlighter-rouge">Oh(N * T)</code> steps (for each secret) and the latter <code class="highlighter-rouge">Oh(T)</code> steps (for each secret) if precomputation is allowed.</p>
<h1 id="packed-variant">Packed Variant</h1>
<p>While Shamir’s scheme gets rid of the <code class="highlighter-rouge">R = N</code> constraint and gives us flexibility in choosing <code class="highlighter-rouge">T</code> or <code class="highlighter-rouge">R</code>, it still has the limitation that <code class="highlighter-rouge">K = 1</code>. This means that each shareholder receives one share per secret, so a large number of secrets means a large number of shares for each shareholder. By using a generalised variant of Shamir’s scheme known as packed or ramp sharing, we can remove this limitation and reduce the load on each individual shareholder.</p>
<p>To share a vector of <code class="highlighter-rouge">K</code> secrets <code class="highlighter-rouge">x = [x1, x2, ..., xK]</code>, the shares are still computed as <code class="highlighter-rouge">f(1)</code>, <code class="highlighter-rouge">f(2)</code>, …, <code class="highlighter-rouge">f(N)</code> but the random polynomial is now sampled such that it satisfies <code class="highlighter-rouge">f(-1) = x1</code>, <code class="highlighter-rouge">f(-2) = x2</code>, …, <code class="highlighter-rouge">f(-K) = xK</code>.</p>
<p>Since it’s less obvious how to sample such a polynomial in coefficient representation as we did before, to achieve the desires privacy threshold we instead add <code class="highlighter-rouge">T</code> additional constraints <code class="highlighter-rouge">f(-K-1) = r1</code>, …, <code class="highlighter-rouge">f(-K-T) = rT</code> and simply use a point-value representation of the degree <code class="highlighter-rouge">T+K-1</code> polynomial.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">sample_packed_polynomial</span><span class="p">(</span><span class="n">secrets</span><span class="p">):</span>
<span class="n">points</span> <span class="o">=</span> <span class="n">SECRET_POINTS</span> <span class="o">+</span> <span class="n">RANDOMNESS_POINTS</span>
<span class="n">values</span> <span class="o">=</span> <span class="n">secrets</span> <span class="o">+</span> <span class="p">[</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="p">]</span>
<span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">values</span><span class="p">))</span>
</code></pre></div></div>
<p>This however means that we now have to perform interpolation instead of evaluation during sharing, which has an impact on efficiency, even when using precomputation as it now means storing <code class="highlighter-rouge">N</code> different sets of constants.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">packed_share</span><span class="p">(</span><span class="n">secrets</span><span class="p">):</span>
<span class="n">polynomial</span> <span class="o">=</span> <span class="n">sample_packed_polynomial</span><span class="p">(</span><span class="n">secrets</span><span class="p">)</span>
<span class="n">shares</span> <span class="o">=</span> <span class="p">[</span> <span class="n">interpolate_at_point</span><span class="p">(</span><span class="n">polynomial</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">SHARE_POINTS</span> <span class="p">]</span>
<span class="k">return</span> <span class="n">shares</span>
</code></pre></div></div>
<p>As we will see in the next blog post it is in fact also possible to sample a packed polynomial in the coefficient representation and regain efficient sharing, but it requires slightly more advanced techniques.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">packed_reconstruct</span><span class="p">(</span><span class="n">shares</span><span class="p">):</span>
<span class="n">points</span> <span class="o">=</span> <span class="n">SHARE_POINTS</span>
<span class="n">values</span> <span class="o">=</span> <span class="n">shares</span>
<span class="n">polynomial</span> <span class="o">=</span> <span class="p">[</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span><span class="n">v</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span><span class="p">,</span><span class="n">v</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">points</span><span class="p">,</span> <span class="n">values</span><span class="p">)</span> <span class="k">if</span> <span class="n">v</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="p">]</span>
<span class="k">return</span> <span class="p">[</span> <span class="n">interpolate_at_point</span><span class="p">(</span><span class="n">polynomial</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">SECRET_POINTS</span> <span class="p">]</span>
</code></pre></div></div>
<p>Leaving computational efficiency aside, with this scheme we have reduced the number of shares each shareholder gets by a factor of <code class="highlighter-rouge">K</code>, which is useful in our running example of aggregating large vectors.</p>
<p>However there’s another a caveat: since the degree of the polynomial increased, from <code class="highlighter-rouge">T</code> to <code class="highlighter-rouge">T + K - 1</code>, we also have to adjust either the privacy threshold or the number of shares needed to reconstruct.</p>
<p>For example, say we use Shamir’s scheme to share a secret between <code class="highlighter-rouge">N = 10</code> shareholders and want a privacy guarantee against up to half of them collaborating, i.e. <code class="highlighter-rouge">T = 5</code>. Plugging this into our equation we get <code class="highlighter-rouge">5 + 1 = 6 <= 10</code> for Shamir’s scheme, meaning we can tolerate that up to <code class="highlighter-rouge">N - R = 4</code>, or 40%, of them go missing. However, if we use the packed scheme to share <code class="highlighter-rouge">K = 3</code> secrets together then we get <code class="highlighter-rouge">5 + 3 = 8 <= 10</code> and the tolerance drops to 20%.</p>
<p>One remedy is to simply multiply all parameters by <code class="highlighter-rouge">K</code>; in the example we get <code class="highlighter-rouge">15 + 3 = 18 <= 30</code> and we are back to the original privacy threshold of half the shareholders and tolerance of 40%. The cost is that we now also need <code class="highlighter-rouge">K</code> times as many shareholders, so we have effectively kept the same number of shares but distributed them across a larger population.</p>
<p>(Note that a similar distribution may be achieved by partitioning the secrets and shareholders into <code class="highlighter-rouge">K</code> groups; this however has a negative effect on overall tolerance as we need <code class="highlighter-rouge">R</code> shares from each group.)</p>
<h2 id="homomorphic-addition-and-multiplication-1">Homomorphic addition and multiplication</h2>
<p>The scheme has the same homomorphic properties as Shamir’s, yet now operate in a <a href="https://en.wikipedia.org/wiki/SIMD">SIMD</a> fashion where each addition or multiplication is simultaneously performed on every secret shared together. This in itself can have benefits if it fits naturally with the application.</p>
<h1 id="next-steps">Next Steps</h1>
<p>Although an old and simple primitive, secret sharing has several properties that makes it interesting as a way of delegating trust and computation to e.g. a community of users, even if the devices of these users are somewhat inefficient and unreliable.</p>
<p>In this post we have seen a few classical schemes as well as a typical textbook algorithms to implement them. <a href="/2017/06/24/secret-sharing-part2">The next blog post</a> will improve on these algorithms and obtain significantly better performance.</p>Morten DahlTL;DR: first part in a series where we look at secret sharing schemes, including the lesser known packed variant of Shamir’s scheme, and give full and efficient implementations; here we start with the textbook approaches, with follow-up posts focusing on improvements from more advanced techniques for sharing and reconstruction.Private Deep Learning with MPC2017-04-17T12:00:00+00:002017-04-17T12:00:00+00:00https://mortendahl.github.io/2017/04/17/private-deep-learning-with-mpc<p>Inspired by a recent blog post about mixing deep learning and homomorphic encryption (see <a href="http://iamtrask.github.io/2017/03/17/safe-ai/"><em>Building Safe A.I.</em></a>) I thought it’d be interesting do to the same using <em>secure multi-party computation</em> instead of homomorphic encryption.</p>
<p>In this blog post we’ll build a simple secure computation protocol from scratch, and experiment with it for training simple neural networks for basic boolean functions. There is also a Python notebook with the <a href="https://github.com/mortendahl/privateml/tree/master/simple-boolean-functions">associated source code</a>.</p>
<p>We will assume that we have three non-colluding parties <code class="highlighter-rouge">P0</code>, <code class="highlighter-rouge">P1</code>, and <code class="highlighter-rouge">P2</code> that are willing to perform computations together, namely training neural networks and using them for predictions afterwards; however, for unspecified reasons they do not wish to reveal the learned models. We will also assume that some users are willing to provide training data if it is kept private, and likewise that some are interested in using the learned models if their inputs are kept private.</p>
<p>To be able to do this we will need to compute securely on rational numbers with a certain precision; in particular, to add and multiply these. We will also need to compute the <a href="http://mathworld.wolfram.com/SigmoidFunction.html">Sigmoid function</a> <code class="highlighter-rouge">1/(1+np.exp(-x))</code>, which in its traditional form results in surprisingly heavy operations in the secure setting. As a result we’ll follow the approach of <em>Building Safe A.I.</em> and approximate it using polynomials, yet look at a few optimizations.</p>
<p><em>This post also exist <a href="https://www.jqr.com/article/000109">in Chinese</a> thanks to <a href="https://weakish.github.io/">Jakukyo Friel</a>.</em></p>
<h1 id="secure-multi-party-computation">Secure Multi-Party Computation</h1>
<p><a href="https://en.wikipedia.org/wiki/Homomorphic_encryption">Homomorphic encryption</a> (HE) and <a href="https://en.wikipedia.org/wiki/Secure_multi-party_computation">secure multi-party computation</a> (MPC) are closely related fields in modern cryptography, with one often using techniques from the other in order to solve roughly the same problem: computing a function of private input data without revealing anything, except (optionally) the final output. For instance, in our setting of private machine learning, both technologies could be used to train our model and perform predictions (although there are a few technicalities to deal with in the case of HE if the data comes from several users with different encryption keys).</p>
<p>As such, at a high level, HE is often replaceable by MPC, and vice versa. Where they differ however, at least today, can roughly be characterized by HE requiring little interaction but expensive computation, whereas MPC uses cheap computation but a significant amount of interaction. Or in other words, MPC replaces expensive computation with interaction between two or more parties.</p>
<p>This currently offers better practical performance, to the point where one can argue that MPC is a significantly more mature technology – as a testimony to that claim, <a href="https://sepior.com/">several</a> <a href="https://www.dyadicsec.com/">companies</a> <a href="https://sharemind.cyber.ee/">already</a> <a href="https://z.cash/technology/paramgen.html">exist</a> offering services based on MPC.</p>
<h2 id="fixed-point-arithmetic">Fixed-point arithmetic</h2>
<p>The computation is going to take place over a <a href="https://en.wikipedia.org/wiki/Finite_field">finite field</a> and hence we first need to decide on how to represent rational numbers <code class="highlighter-rouge">r</code> as field elements, i.e. as integers <code class="highlighter-rouge">x</code> from <code class="highlighter-rouge">0, 1, ..., Q-1</code> for some prime <code class="highlighter-rouge">Q</code>. Taking a typical approach, we’re going to scale every rational number by a constant corresponding to a fixed precision, say <code class="highlighter-rouge">10**6</code> in the case of <code class="highlighter-rouge">6</code> digit precision, and let the integer part of the result be our fixed-point presentation. For instance, with <code class="highlighter-rouge">Q = 10000019</code> we get <code class="highlighter-rouge">encode(0.5) == 500000</code> and <code class="highlighter-rouge">encode(-0.5) == 10000019 - 500000 == 9500019</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">encode</span><span class="p">(</span><span class="n">rational</span><span class="p">):</span>
<span class="n">upscaled</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">rational</span> <span class="o">*</span> <span class="mi">10</span><span class="o">**</span><span class="mi">6</span><span class="p">)</span>
<span class="n">field_element</span> <span class="o">=</span> <span class="n">upscaled</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="n">field_element</span>
<span class="k">def</span> <span class="nf">decode</span><span class="p">(</span><span class="n">field_element</span><span class="p">):</span>
<span class="n">upscaled</span> <span class="o">=</span> <span class="n">field_element</span> <span class="k">if</span> <span class="n">field_element</span> <span class="o"><=</span> <span class="n">Q</span><span class="o">/</span><span class="mi">2</span> <span class="k">else</span> <span class="n">field_element</span> <span class="o">-</span> <span class="n">Q</span>
<span class="n">rational</span> <span class="o">=</span> <span class="n">upscaled</span> <span class="o">/</span> <span class="mi">10</span><span class="o">**</span><span class="mi">6</span>
<span class="k">return</span> <span class="n">rational</span>
</code></pre></div></div>
<p>Note that addition in this representation is straight-forward, <code class="highlighter-rouge">(r * 10**6) + (s * 10**6) == (r + s) * 10**6</code>, while multiplication adds an extra scaling factor that we will have to get rid of to keep the precision and avoid exploding the numbers: <code class="highlighter-rouge">(r * 10**6) * (s * 10**6) == (r * s) * 10**6 * 10**6</code>.</p>
<h2 id="sharing-and-reconstructing-data">Sharing and reconstructing data</h2>
<p>Having encoded an input, each user next needs a way of sharing it with the parties so that they may be used in the computation, yet remain private.</p>
<p>The ingredient we need for this is <a href=""><em>secret sharing</em></a>, which splits a value into three shares in such a way that if anyone sees less than the three shares, then nothing at all is revealed about the value; yet, by seeing all three shares, the value can easily be reconstructed.</p>
<p>To keep it simple we’ll use <em>replicated secret sharing</em> here, where each party receives more than one share. Concretely, private value <code class="highlighter-rouge">x</code> is split into three shares <code class="highlighter-rouge">x0</code>, <code class="highlighter-rouge">x1</code>, <code class="highlighter-rouge">x2</code> such that <code class="highlighter-rouge">x == x0 + x1 + x2</code>. Party <code class="highlighter-rouge">P0</code> then receives (<code class="highlighter-rouge">x0</code>, <code class="highlighter-rouge">x1</code>), <code class="highlighter-rouge">P1</code> receives (<code class="highlighter-rouge">x1</code>, <code class="highlighter-rouge">x2</code>), and <code class="highlighter-rouge">P2</code> receives (<code class="highlighter-rouge">x2</code>, <code class="highlighter-rouge">x0</code>). For this tutorial we’ll keep this implicit though, and simply store a sharing of <code class="highlighter-rouge">x</code> as a vector of the three shares <code class="highlighter-rouge">[x0, x1, x2]</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">share</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">x0</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span>
<span class="n">x1</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span>
<span class="n">x2</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">x0</span> <span class="o">-</span> <span class="n">x1</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
<span class="k">return</span> <span class="p">[</span><span class="n">x0</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">]</span>
</code></pre></div></div>
<p>And when two or more parties agree to reveal a value to someone, they simply send their shares so that reconstruction may be performed.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">reconstruct</span><span class="p">(</span><span class="n">shares</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">shares</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span>
</code></pre></div></div>
<p>However, if the shares are the result of one or more of the secure computations given in the subsections below, then for privacy reasons we perform a resharing before reconstructing.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">reshare</span><span class="p">(</span><span class="n">xs</span><span class="p">):</span>
<span class="n">Y</span> <span class="o">=</span> <span class="p">[</span> <span class="n">share</span><span class="p">(</span><span class="n">xs</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">share</span><span class="p">(</span><span class="n">xs</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">share</span><span class="p">(</span><span class="n">xs</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span> <span class="p">]</span>
<span class="k">return</span> <span class="p">[</span> <span class="nb">sum</span><span class="p">(</span><span class="n">row</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">Y</span><span class="p">)</span> <span class="p">]</span>
</code></pre></div></div>
<p>This is strictly speaking not necessary, but doing so makes it easier below to see why the protocols are secure; intuitively, it makes sure the shares look fresh, containing no information about the data that were used to compute them.</p>
<h2 id="addition-and-subtraction">Addition and subtraction</h2>
<p>With this we already have a way to do secure addition and subtraction: each party simply adds or subtracts its two shares. This works works since e.g. <code class="highlighter-rouge">(x0 + x1 + x2) + (y0 + y1 + y2) == (x0 + y0) + (x1 + y1) + (x2 + y2)</code>, which gives the three new shares of <code class="highlighter-rouge">x + y</code> (technically speaking this should be <code class="highlighter-rouge">reconstruct(x) + reconstruct(y)</code>, but it’s easier to read when implicit).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span> <span class="p">(</span><span class="n">xi</span> <span class="o">+</span> <span class="n">yi</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">xi</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">]</span>
<span class="k">def</span> <span class="nf">sub</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span> <span class="p">(</span><span class="n">xi</span> <span class="o">-</span> <span class="n">yi</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">xi</span><span class="p">,</span> <span class="n">yi</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">]</span>
</code></pre></div></div>
<p>Note that no communication is needed since these are local computations.</p>
<h2 id="multiplication">Multiplication</h2>
<p>Since each party has two shares, multiplication can be done in a similar way to addition and subtraction above, i.e. by each party computing a new share based on the two it already has. Specifically, for <code class="highlighter-rouge">z0</code>, <code class="highlighter-rouge">z1</code>, and <code class="highlighter-rouge">z2</code> as defined in the code below we have <code class="highlighter-rouge">x * y == z0 + z1 + z2</code> (technically speaking …).</p>
<p>However, our invariant of each party having two shares is not satisfied, and it wouldn’t be secure for e.g. <code class="highlighter-rouge">P1</code> simply to send <code class="highlighter-rouge">z1</code> to <code class="highlighter-rouge">P0</code>. One easy fix is to simply share each <code class="highlighter-rouge">zi</code> as if it was a private input, and then have each party add its three shares together; this gives a correct and secure sharing <code class="highlighter-rouge">w</code> of the product.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="c"># local computation</span>
<span class="n">z0</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">z1</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">%</span> <span class="n">Q</span>
<span class="n">z2</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span> <span class="o">%</span> <span class="n">Q</span>
<span class="c"># reshare and distribute; this requires communication</span>
<span class="n">Z</span> <span class="o">=</span> <span class="p">[</span> <span class="n">share</span><span class="p">(</span><span class="n">z0</span><span class="p">),</span> <span class="n">share</span><span class="p">(</span><span class="n">z1</span><span class="p">),</span> <span class="n">share</span><span class="p">(</span><span class="n">z2</span><span class="p">)</span> <span class="p">]</span>
<span class="n">w</span> <span class="o">=</span> <span class="p">[</span> <span class="nb">sum</span><span class="p">(</span><span class="n">row</span><span class="p">)</span> <span class="o">%</span> <span class="n">Q</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">Z</span><span class="p">)</span> <span class="p">]</span>
<span class="c"># bring precision back down from double to single</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">truncate</span><span class="p">(</span><span class="n">w</span><span class="p">)</span>
<span class="k">return</span> <span class="n">v</span>
</code></pre></div></div>
<p>One problem remains however, and as mentioned earlier this is the double precision of <code class="highlighter-rouge">reconstruct(w)</code>: it is an encoding with scaling factor <code class="highlighter-rouge">10**6 * 10**6</code> instead of <code class="highlighter-rouge">10**6</code>. In the unsecured setting over rationals we would fix this by a standard division by <code class="highlighter-rouge">10**6</code>, but since we’re operating on secret shared elements in a finite field this becomes less straight-forward.</p>
<p>Division by a public constant, in this case <code class="highlighter-rouge">10**6</code>, is easy enough: we simply multiply the shares by its field inverse <code class="highlighter-rouge">10**(-6)</code>. If we write <code class="highlighter-rouge">reconstruct(w) == v * 10**6 + u</code> for some <code class="highlighter-rouge">v</code> and <code class="highlighter-rouge">u < 10**6</code>, then this multiplication gives us shares of <code class="highlighter-rouge">v + u * 10**(-6)</code>, where <code class="highlighter-rouge">v</code> is the value we’re after. But unlike the unsecured setting, where the leftover value <code class="highlighter-rouge">u * 10**(-6)</code> is small and removed by rounding, in the secure setting with finite field elements this meaning is lost and we need to rid of it some other way.</p>
<p>One way is to ensure that <code class="highlighter-rouge">u == 0</code>. Specifically, if we knew <code class="highlighter-rouge">u</code> in advance then by doing the division on <code class="highlighter-rouge">w' == (w - share(u))</code> instead of on <code class="highlighter-rouge">w</code>, then we would get <code class="highlighter-rouge">v' == v</code> and <code class="highlighter-rouge">u' == 0</code> as desired, i.e. without any leftover value.</p>
<p>The question of course is how to securely get <code class="highlighter-rouge">u</code> so we may compute <code class="highlighter-rouge">w'</code>. The details are in <a href="https://www1.cs.fau.de/filepool/publications/octavian_securescm/secfp-fc10.pdf">CS’10</a> but the basic idea is to first add a large mask to <code class="highlighter-rouge">w</code>, reveal this masked value to one of the parties who may then compute a masked <code class="highlighter-rouge">u</code>. Finally, this masked value is shared and unmasked, and then used to compute <code class="highlighter-rouge">w'</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">truncate</span><span class="p">(</span><span class="n">a</span><span class="p">):</span>
<span class="c"># map to the positive range</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">share</span><span class="p">(</span><span class="mi">10</span><span class="o">**</span><span class="p">(</span><span class="mi">6</span><span class="o">+</span><span class="mi">6</span><span class="o">-</span><span class="mi">1</span><span class="p">)))</span>
<span class="c"># apply mask known only by P0, and reconstruct masked b to P1 or P2</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randrange</span><span class="p">(</span><span class="n">Q</span><span class="p">)</span> <span class="o">%</span> <span class="mi">10</span><span class="o">**</span><span class="p">(</span><span class="mi">6</span><span class="o">+</span><span class="mi">6</span><span class="o">+</span><span class="n">KAPPA</span><span class="p">)</span>
<span class="n">mask_low</span> <span class="o">=</span> <span class="n">mask</span> <span class="o">%</span> <span class="mi">10</span><span class="o">**</span><span class="mi">6</span>
<span class="n">b_masked</span> <span class="o">=</span> <span class="n">reconstruct</span><span class="p">(</span><span class="n">add</span><span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="n">share</span><span class="p">(</span><span class="n">mask</span><span class="p">)))</span>
<span class="c"># extract lower digits</span>
<span class="n">b_masked_low</span> <span class="o">=</span> <span class="n">b_masked</span> <span class="o">%</span> <span class="mi">10</span><span class="o">**</span><span class="mi">6</span>
<span class="n">b_low</span> <span class="o">=</span> <span class="n">sub</span><span class="p">(</span><span class="n">share</span><span class="p">(</span><span class="n">b_masked_low</span><span class="p">),</span> <span class="n">share</span><span class="p">(</span><span class="n">mask_low</span><span class="p">))</span>
<span class="c"># remove lower digits</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">sub</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b_low</span><span class="p">)</span>
<span class="c"># division</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">imul</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">INVERSE</span><span class="p">)</span>
<span class="k">return</span> <span class="n">d</span>
</code></pre></div></div>
<p>Note that <code class="highlighter-rouge">imul</code> in the above is the local operation that multiplies each share with a public constant, is this case the field inverse of <code class="highlighter-rouge">10**6</code>.</p>
<h2 id="secure-data-type">Secure data type</h2>
<p>As a final step we wrap the above procedures in a custom abstract data type, allowing us to use NumPy later when we express the neural network.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SecureRational</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">secret</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">shares</span> <span class="o">=</span> <span class="n">share</span><span class="p">(</span><span class="n">encode</span><span class="p">(</span><span class="n">secret</span><span class="p">))</span> <span class="k">if</span> <span class="n">secret</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="k">else</span> <span class="p">[]</span>
<span class="k">return</span> <span class="n">z</span>
<span class="k">def</span> <span class="nf">reveal</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">decode</span><span class="p">(</span><span class="n">reconstruct</span><span class="p">(</span><span class="n">reshare</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">shares</span><span class="p">)))</span>
<span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s">"SecureRational(</span><span class="si">%</span><span class="s">f)"</span> <span class="o">%</span> <span class="bp">self</span><span class="o">.</span><span class="n">reveal</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">__add__</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">()</span>
<span class="n">z</span><span class="o">.</span><span class="n">shares</span> <span class="o">=</span> <span class="n">add</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shares</span><span class="p">,</span> <span class="n">y</span><span class="o">.</span><span class="n">shares</span><span class="p">)</span>
<span class="k">return</span> <span class="n">z</span>
<span class="k">def</span> <span class="nf">__sub__</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">()</span>
<span class="n">z</span><span class="o">.</span><span class="n">shares</span> <span class="o">=</span> <span class="n">sub</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shares</span><span class="p">,</span> <span class="n">y</span><span class="o">.</span><span class="n">shares</span><span class="p">)</span>
<span class="k">return</span> <span class="n">z</span>
<span class="k">def</span> <span class="nf">__mul__</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">()</span>
<span class="n">z</span><span class="o">.</span><span class="n">shares</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shares</span><span class="p">,</span> <span class="n">y</span><span class="o">.</span><span class="n">shares</span><span class="p">)</span>
<span class="k">return</span> <span class="n">z</span>
<span class="k">def</span> <span class="nf">__pow__</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">e</span><span class="p">):</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">e</span><span class="p">):</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">z</span> <span class="o">*</span> <span class="n">x</span>
<span class="k">return</span> <span class="n">z</span>
</code></pre></div></div>
<p>With this type we can operate securely on values as we would any other type:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="o">.</span><span class="mi">5</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="o">-.</span><span class="mi">25</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="n">y</span>
<span class="k">assert</span><span class="p">(</span><span class="n">z</span><span class="o">.</span><span class="n">reveal</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="o">.</span><span class="mi">5</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="o">-.</span><span class="mi">25</span><span class="p">))</span>
</code></pre></div></div>
<p>Moreover, for debugging purposes we could switch to an unsecured type without changing the rest of the (neural network) code, or we could isolated the use of counters to for instance see how many multiplications are performed, in turn allowing us to simulate how much communication is needed.</p>
<h1 id="deep-learning">Deep Learning</h1>
<p>The term “deep learning” is a massive exaggeration of what we’ll be doing here, as we’ll simply play with the two and three layer neural networks from <em>Building Safe A.I.</em> (which in turn is from <a href="http://iamtrask.github.io/2015/07/12/basic-python-network/">here</a> and <a href="http://iamtrask.github.io/2015/07/27/python-network-part2/">here</a>) to learn basic boolean functions.</p>
<h2 id="a-simple-function">A simple function</h2>
<p>The first experiment is about training a network to recognize the first bit in a sequence. The four rows in <code class="highlighter-rouge">X</code> below are used as the input training data, with the corresponding row in <code class="highlighter-rouge">y</code> as the desired output.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span>
<span class="p">])</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([[</span>
<span class="mi">0</span><span class="p">,</span>
<span class="mi">0</span><span class="p">,</span>
<span class="mi">1</span><span class="p">,</span>
<span class="mi">1</span>
<span class="p">]])</span><span class="o">.</span><span class="n">T</span>
</code></pre></div></div>
<p>We’ll use the same simple two layer network, but parameterize it by a Sigmoid approximation to be defined below. Note the use of the <code class="highlighter-rouge">secure</code> function, which is a simple helper converting all values to our secure data type.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">TwoLayerNetwork</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sigmoid</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span> <span class="o">=</span> <span class="n">sigmoid</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">iterations</span><span class="o">=</span><span class="mi">1000</span><span class="p">):</span>
<span class="c"># initial weights</span>
<span class="bp">self</span><span class="o">.</span><span class="n">synapse0</span> <span class="o">=</span> <span class="n">secure</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">((</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="c"># training</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">iterations</span><span class="p">):</span>
<span class="c"># forward propagation</span>
<span class="n">layer0</span> <span class="o">=</span> <span class="n">X</span>
<span class="n">layer1</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">synapse0</span><span class="p">))</span>
<span class="c"># back propagation</span>
<span class="n">layer1_error</span> <span class="o">=</span> <span class="n">y</span> <span class="o">-</span> <span class="n">layer1</span>
<span class="n">layer1_delta</span> <span class="o">=</span> <span class="n">layer1_error</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">derive</span><span class="p">(</span><span class="n">layer1</span><span class="p">)</span>
<span class="c"># update</span>
<span class="bp">self</span><span class="o">.</span><span class="n">synapse0</span> <span class="o">+=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer0</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">layer1_delta</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="n">layer0</span> <span class="o">=</span> <span class="n">X</span>
<span class="n">layer1</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">synapse0</span><span class="p">))</span>
<span class="k">return</span> <span class="n">layer1</span>
</code></pre></div></div>
<p>We’ll also follow the suggested Sigmoid approximation, namely the <a href="http://mathworld.wolfram.com/SigmoidFunction.html">standard Maclaurin/Taylor polynomial</a> with five terms. I’ve used a simple polynomial evaluation here for readability, leaving room for improvement by for instance lower the number of multiplications using <a href="https://en.wikipedia.org/wiki/Horner%27s_method">Horner’s method</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SigmoidMaclaurin5</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">ONE</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">W0</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span>
<span class="n">W1</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="mi">4</span><span class="p">)</span>
<span class="n">W3</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="o">/</span><span class="mi">48</span><span class="p">)</span>
<span class="n">W5</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="mi">480</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">W0</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span> <span class="o">*</span> <span class="n">W1</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">3</span> <span class="o">*</span> <span class="n">W3</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">5</span> <span class="o">*</span> <span class="n">W5</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sigmoid_deriv</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">(</span><span class="n">ONE</span> <span class="o">-</span> <span class="n">x</span><span class="p">)</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">evaluate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">derive</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid_deriv</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<p>With this in place we can train and evaluate the network (see <a href="https://github.com/mortendahl/privateml/tree/master/simple-boolean-functions">the notebook</a> for the details), in this case using 10,000 iterations.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># reseed to get reproducible results</span>
<span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="c"># pick approximation</span>
<span class="n">sigmoid</span> <span class="o">=</span> <span class="n">SigmoidMaclaurin5</span><span class="p">()</span>
<span class="c"># train</span>
<span class="n">network</span> <span class="o">=</span> <span class="n">TwoLayerNetwork</span><span class="p">(</span><span class="n">sigmoid</span><span class="p">)</span>
<span class="n">network</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">secure</span><span class="p">(</span><span class="n">X</span><span class="p">),</span> <span class="n">secure</span><span class="p">(</span><span class="n">y</span><span class="p">),</span> <span class="mi">10000</span><span class="p">)</span>
<span class="c"># evaluate predictions</span>
<span class="n">evaluate</span><span class="p">(</span><span class="n">network</span><span class="p">)</span>
</code></pre></div></div>
<p>Note that the training data is secured (i.e. secret shared) before inputting it to the network, and the learned weights are never revealed. The same applies to predictions, where the user of the network is the only one knowing the input and output.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Error: 0.00539115
Error: 0.0025606125
Error: 0.00167358
Error: 0.001241815
Error: 0.00098674
Error: 0.000818415
Error: 0.0006990725
Error: 0.0006100825
Error: 0.00054113
Error: 0.0004861775
Layer 0 weights:
[[SecureRational(4.974135)]
[SecureRational(-0.000854)]
[SecureRational(-2.486387)]]
Prediction on [0 0 0]: 0 (0.50000000)
Prediction on [0 0 1]: 0 (0.00066431)
Prediction on [0 1 0]: 0 (0.49978657)
Prediction on [0 1 1]: 0 (0.00044076)
Prediction on [1 0 0]: 1 (5.52331855)
Prediction on [1 0 1]: 1 (0.99969213)
Prediction on [1 1 0]: 1 (5.51898314)
Prediction on [1 1 1]: 1 (0.99946841)
</code></pre></div></div>
<p>And based on the evaluation above, the network does indeed seem to have learned the desired function, giving correct predictions also on unseen inputs.</p>
<h2 id="slightly-more-advanced-function">Slightly more advanced function</h2>
<p>Turning to the (negated) parity experiment next, the network cannot simply mirror one of the three components as before, but intuitively has to compute the xor between the first and second bit (the third being for the bias).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span>
<span class="p">])</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([[</span>
<span class="mi">0</span><span class="p">,</span>
<span class="mi">1</span><span class="p">,</span>
<span class="mi">1</span><span class="p">,</span>
<span class="mi">0</span>
<span class="p">]])</span><span class="o">.</span><span class="n">T</span>
</code></pre></div></div>
<p>As explained in <a href="http://iamtrask.github.io/2015/07/12/basic-python-network/"><em>A Neural Network in 11 lines of Python</em></a>, using the two layer network here gives rather useless results, essentially saying “let’s just flip a coin”.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Error: 0.500000005
Error: 0.5
Error: 0.5000000025
Error: 0.5000000025
Error: 0.5
Error: 0.5
Error: 0.5
Error: 0.5
Error: 0.5
Error: 0.5
Layer 0 weights:
[[SecureRational(0.000000)]
[SecureRational(0.000000)]
[SecureRational(0.000000)]]
Prediction on [0 0 0]: 0 (0.50000000)
Prediction on [0 0 1]: 0 (0.50000000)
Prediction on [0 1 0]: 0 (0.50000000)
Prediction on [0 1 1]: 0 (0.50000000)
Prediction on [1 0 0]: 0 (0.50000000)
Prediction on [1 0 1]: 0 (0.50000000)
Prediction on [1 1 0]: 0 (0.50000000)
Prediction on [1 1 1]: 0 (0.50000000)
</code></pre></div></div>
<p>The suggested remedy is to introduce another layer in the network as follows.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">ThreeLayerNetwork</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sigmoid</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span> <span class="o">=</span> <span class="n">sigmoid</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">iterations</span><span class="o">=</span><span class="mi">1000</span><span class="p">):</span>
<span class="c"># initial weights</span>
<span class="bp">self</span><span class="o">.</span><span class="n">synapse0</span> <span class="o">=</span> <span class="n">secure</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">((</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">synapse1</span> <span class="o">=</span> <span class="n">secure</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">((</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="c"># training</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">iterations</span><span class="p">):</span>
<span class="c"># forward propagation</span>
<span class="n">layer0</span> <span class="o">=</span> <span class="n">X</span>
<span class="n">layer1</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">synapse0</span><span class="p">))</span>
<span class="n">layer2</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">synapse1</span><span class="p">))</span>
<span class="c"># back propagation</span>
<span class="n">layer2_error</span> <span class="o">=</span> <span class="n">y</span> <span class="o">-</span> <span class="n">layer2</span>
<span class="n">layer2_delta</span> <span class="o">=</span> <span class="n">layer2_error</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">derive</span><span class="p">(</span><span class="n">layer2</span><span class="p">)</span>
<span class="n">layer1_error</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer2_delta</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">synapse1</span><span class="o">.</span><span class="n">T</span><span class="p">)</span>
<span class="n">layer1_delta</span> <span class="o">=</span> <span class="n">layer1_error</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">derive</span><span class="p">(</span><span class="n">layer1</span><span class="p">)</span>
<span class="c"># update</span>
<span class="bp">self</span><span class="o">.</span><span class="n">synapse1</span> <span class="o">+=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer1</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">layer2_delta</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">synapse0</span> <span class="o">+=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer0</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">layer1_delta</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="n">layer0</span> <span class="o">=</span> <span class="n">X</span>
<span class="n">layer1</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer0</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">synapse0</span><span class="p">))</span>
<span class="n">layer2</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">layer1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">synapse1</span><span class="p">))</span>
<span class="k">return</span> <span class="n">layer2</span>
</code></pre></div></div>
<p>However, if we train this network the same way as we did before, even if only for 100 iterations, we run into a strange phenomenon: all of a sudden the errors, weights, and prediction scores explode, giving garbled results.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Error: 0.496326875
Error: 0.4963253375
Error: 0.50109445
Error: 4.50917445533e+22
Error: 4.20017387687e+22
Error: 4.38235385094e+22
Error: 4.65389939428e+22
Error: 4.25720845129e+22
Error: 4.50520005372e+22
Error: 4.31568874384e+22
Layer 0 weights:
[[SecureRational(970463188850515564822528.000000)
SecureRational(1032362386093871682551808.000000)
SecureRational(1009706886834648285970432.000000)
SecureRational(852352894255113084862464.000000)]
[SecureRational(999182403614802557534208.000000)
SecureRational(747418473813466924711936.000000)
SecureRational(984098986255565992230912.000000)
SecureRational(865284701475152213311488.000000)]
[SecureRational(848400149667429499273216.000000)
SecureRational(871252067688430631387136.000000)
SecureRational(788722871059090631557120.000000)
SecureRational(868480811373827731750912.000000)]]
Layer 1 weights:
[[SecureRational(818092877308528183738368.000000)]
[SecureRational(940782003999550335877120.000000)]
[SecureRational(909882533376693496709120.000000)]
[SecureRational(955267264038446787723264.000000)]]
Prediction on [0 0 0]: 1 (41452089757570437218304.00000000)
Prediction on [0 0 1]: 1 (46442301971509056372736.00000000)
Prediction on [0 1 0]: 1 (37164015478651618328576.00000000)
Prediction on [0 1 1]: 1 (43504970843252146044928.00000000)
Prediction on [1 0 0]: 1 (35282926617309558603776.00000000)
Prediction on [1 0 1]: 1 (47658769913438164484096.00000000)
Prediction on [1 1 0]: 1 (35957624290517111013376.00000000)
Prediction on [1 1 1]: 1 (47193714919561920249856.00000000)
</code></pre></div></div>
<p>The reason for this is simple, but perhaps not obvious at first (it wasn’t for me). Namely, while the (five term) Maclaurin/Taylor approximation of the Sigmoid function is good around the origin, it completely collapses as we move further away, yielding results that are not only inaccurate but also of large magnitude. As a result we quickly blow any finite number representation we may use, even in the unsecured setting, and start wrapping around.</p>
<p>Technically speaking it’s the dot products on which the Sigmoid function is evaluated that become too large, which as far as I understand can be interpreted as the network growing more confident. In this light, the problem is that our approximation doesn’t allow it to get confident enough, leaving us with poor accuracy.</p>
<p>How this is avoided in <em>Building Safe A.I.</em> is not clear to me, but my best guess is that a combination of lower initial weights and an <em>alpha</em> update parameter makes it possible to avoid the issue for a low number of iterations (less then 300 it seems). Any comments on this are more than welcome.</p>
<h1 id="approximating-sigmoid">Approximating Sigmoid</h1>
<p>So, the fact that we have to approximate the Sigmoid function seems to get in the way of learning more advanced functions. But since the Maclaurin/Taylor polynomial is accurate in the limit, one natural next thing to try is to use more of its terms.</p>
<p>As shown below, adding terms up to the 9th degree instead of only up to the 5th actually gets us a big further, but far from enough. Moreover, when the collapse happens, it happens even faster.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Error: 0.49546145
Error: 0.4943132225
Error: 0.49390536
Error: 0.50914575
Error: 7.29251498137e+22
Error: 7.97702462371e+22
Error: 7.01752029207e+22
Error: 7.41001528681e+22
Error: 7.33032620012e+22
Error: 7.3022511184e+22
...
</code></pre></div></div>
<p>Alternatively one may instead remove terms in an attempt to contain the collapse better, and e.g. only use terms up to the 3rd degree. This actually helps a bit and allows us to train for 500 iterations instead of 100 before collapsing.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Error: 0.4821573275
Error: 0.46344183
Error: 0.4428059575
Error: 0.4168092675
Error: 0.388153325
Error: 0.3619875475
Error: 0.3025045425
Error: 0.2366579675
Error: 0.19651228
Error: 0.1748352775
Layer 0 weights:
[[SecureRational(1.455894) SecureRational(1.376838)
SecureRational(-1.445690) SecureRational(-2.383619)]
[SecureRational(-0.794408) SecureRational(-2.069235)
SecureRational(-1.870023) SecureRational(-1.734243)]
[SecureRational(0.712099) SecureRational(-0.688947)
SecureRational(0.740605) SecureRational(2.890812)]]
Layer 1 weights:
[[SecureRational(-2.893681)]
[SecureRational(6.238205)]
[SecureRational(-7.945379)]
[SecureRational(4.674321)]]
Prediction on [0 0 0]: 1 (0.50918230)
Prediction on [0 0 1]: 0 (0.16883382)
Prediction on [0 1 0]: 0 (0.40589161)
Prediction on [0 1 1]: 1 (0.82447640)
Prediction on [1 0 0]: 1 (0.83164009)
Prediction on [1 0 1]: 1 (0.83317334)
Prediction on [1 1 0]: 1 (0.74354671)
Prediction on [1 1 1]: 0 (0.18736629)
</code></pre></div></div>
<p>However, the errors and predictions are poor, and there is little room left for increasing the number of iterations (it collapses around 550 iterations).</p>
<h2 id="interpolation">Interpolation</h2>
<p>An alternative approach is to drop the standard approximation polynomial and instead try interpolation over an interval. The main parameter here is the max degree of the polynomial, which we want to keep somewhat low for efficiency, but the precision of the coefficients is also relevant.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># function we wish to approximate</span>
<span class="n">f_real</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="mi">1</span><span class="o">/</span><span class="p">(</span><span class="mi">1</span><span class="o">+</span><span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="p">))</span>
<span class="c"># interval over which we wish to optimize</span>
<span class="n">interval</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="c"># interpolate polynomial of given max degree</span>
<span class="n">degree</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">coefs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">polyfit</span><span class="p">(</span><span class="n">interval</span><span class="p">,</span> <span class="n">f_real</span><span class="p">(</span><span class="n">interval</span><span class="p">),</span> <span class="n">degree</span><span class="p">)</span>
<span class="c"># reduce precision of interpolated coefficients</span>
<span class="n">precision</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">coefs</span> <span class="o">=</span> <span class="p">[</span> <span class="nb">int</span><span class="p">(</span><span class="n">x</span> <span class="o">*</span> <span class="mi">10</span><span class="o">**</span><span class="n">precision</span><span class="p">)</span> <span class="o">/</span> <span class="mi">10</span><span class="o">**</span><span class="n">precision</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">coefs</span> <span class="p">]</span>
<span class="c"># approximation function</span>
<span class="n">f_interpolated</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">poly1d</span><span class="p">(</span><span class="n">coefs</span><span class="p">)</span>
</code></pre></div></div>
<p>By plotting this polynomial (red line) together with the standard approximations we see a hope for improvement: we cannot avoid collapsing at some point, but it is now on significantly larger values.</p>
<center><img src="https://mortendahl.github.io/assets/private-deep-learning-with-mpc/taylor-approximations.png" /></center>
<p>Of course, we could also experiment with other degrees, precisions, and intervals as shown below, yet for our immediate application the above set of parameters seem sufficient.</p>
<center><img src="https://mortendahl.github.io/assets/private-deep-learning-with-mpc/interpolations.png" /></center>
<p>So, returning to our three layer network, we define a new Sigmoid approximate:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SigmoidInterpolated10</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">ONE</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">W0</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mf">0.5</span><span class="p">)</span>
<span class="n">W1</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mf">0.2159198015</span><span class="p">)</span>
<span class="n">W3</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="o">-</span><span class="mf">0.0082176259</span><span class="p">)</span>
<span class="n">W5</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mf">0.0001825597</span><span class="p">)</span>
<span class="n">W7</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="o">-</span><span class="mf">0.0000018848</span><span class="p">)</span>
<span class="n">W9</span> <span class="o">=</span> <span class="n">SecureRational</span><span class="p">(</span><span class="mf">0.0000000072</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> \
<span class="n">W0</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span> <span class="o">*</span> <span class="n">W1</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">3</span> <span class="o">*</span> <span class="n">W3</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">5</span> <span class="o">*</span> <span class="n">W5</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">7</span> <span class="o">*</span> <span class="n">W7</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">9</span> <span class="o">*</span> <span class="n">W9</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sigmoid_deriv</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:(</span><span class="n">ONE</span> <span class="o">-</span> <span class="n">x</span><span class="p">)</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">evaluate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">derive</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">sigmoid_deriv</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<p>… and rerun the training:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># reseed to get reproducible results</span>
<span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="c"># pick approximation</span>
<span class="n">sigmoid</span> <span class="o">=</span> <span class="n">SigmoidInterpolated10</span><span class="p">()</span>
<span class="c"># train</span>
<span class="n">network</span> <span class="o">=</span> <span class="n">ThreeLayerNetwork</span><span class="p">(</span><span class="n">sigmoid</span><span class="p">)</span>
<span class="n">network</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">secure</span><span class="p">(</span><span class="n">X</span><span class="p">),</span> <span class="n">secure</span><span class="p">(</span><span class="n">y</span><span class="p">),</span> <span class="mi">10000</span><span class="p">)</span>
<span class="c"># evaluate predictions</span>
<span class="n">evaluate</span><span class="p">(</span><span class="n">network</span><span class="p">)</span>
</code></pre></div></div>
<p>And now, despite running for 10,000 iterations, no collapse occurs and the predictions improve, with only one wrong prediction on <code class="highlighter-rouge">[0 1 0]</code>.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Error: 0.0384136825
Error: 0.01946007
Error: 0.0141456075
Error: 0.0115575225
Error: 0.010008035
Error: 0.0089747225
Error: 0.0082400825
Error: 0.00769687
Error: 0.007286195
Error: 0.00697363
Layer 0 weights:
[[SecureRational(3.208028) SecureRational(3.359444)
SecureRational(-3.632461) SecureRational(-4.094379)]
[SecureRational(-1.552827) SecureRational(-4.403901)
SecureRational(-3.997194) SecureRational(-3.271171)]
[SecureRational(0.695226) SecureRational(-1.560569)
SecureRational(1.758733) SecureRational(5.425429)]]
Layer 1 weights:
[[SecureRational(-4.674311)]
[SecureRational(5.910466)]
[SecureRational(-9.854162)]
[SecureRational(6.508941)]]
Prediction on [0 0 0]: 0 (0.28170669)
Prediction on [0 0 1]: 0 (0.00638341)
Prediction on [0 1 0]: 0 (0.33542098)
Prediction on [0 1 1]: 1 (0.99287968)
Prediction on [1 0 0]: 1 (0.74297185)
Prediction on [1 0 1]: 1 (0.99361066)
Prediction on [1 1 0]: 0 (0.03599433)
Prediction on [1 1 1]: 0 (0.00800036)
</code></pre></div></div>
<p>Note that the score for the wrong case is not entirely off, and is somewhat distinct from the correctly predicted zeroes. Running for another 5,000 iterations didn’t seem to improve this, at which point we get close to the collapse.</p>
<h1 id="conclusion">Conclusion</h1>
<p>The focus of this tutorial has been on a simple secure multi-party computation protocol, and while we haven’t explicitly addressed the initial claim that it is computationally more efficient than homomorphic encryption, we have still seen that it is indeed possible to achieve private machine learning using very basic operations.</p>
<p>Perhaps more critically, we haven’t measured the amount of communication required to run the protocols, which most significantly boils down to a few messages for each multiplication. To run any extensive computation using the simple protocols above it is clearly preferable to have the three parties connected by a high-speed local network, yet more advanced protocols not only reduce the amount of data sent back and forth, but also improve other properties such as the number of rounds (down to a small constant in the case of <a href="https://en.wikipedia.org/wiki/Garbled_circuit">garbled circuits</a>).</p>
<p>Finally, we have mostly treated the protocols and the machine learning processes orthogonally, letting the latter use the former only in a black box fashion except for computing the Sigmoid. Further adapting one to the other requires expertise in both domains but may yield significant improvements in the overall performance.</p>Morten DahlInspired by a recent blog post about mixing deep learning and homomorphic encryption (see Building Safe A.I.) I thought it’d be interesting do to the same using secure multi-party computation instead of homomorphic encryption.