CSC110/07-cryptography/02-one-time-pad.html

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>7.2 The One-Time Pad and Perfect Secrecy</title>
  <style>
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    span.underline{text-decoration: underline;}
    div.column{display: inline-block; vertical-align: top; width: 50%;}
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
    ul.task-list{list-style: none;}
  </style>
  <link rel="stylesheet" href="../tufte.css" />
  <script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js" type="text/javascript"></script>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<div style="display:none">
\(
\newcommand{\NOT}{\neg}
\newcommand{\AND}{\wedge}
\newcommand{\OR}{\vee}
\newcommand{\XOR}{\oplus}
\newcommand{\IMP}{\Rightarrow}
\newcommand{\IFF}{\Leftrightarrow}
\newcommand{\TRUE}{\text{True}\xspace}
\newcommand{\FALSE}{\text{False}\xspace}
\newcommand{\IN}{\,{\in}\,}
\newcommand{\NOTIN}{\,{\notin}\,}
\newcommand{\TO}{\rightarrow}
\newcommand{\DIV}{\mid}
\newcommand{\NDIV}{\nmid}
\newcommand{\MOD}[1]{\pmod{#1}}
\newcommand{\MODS}[1]{\ (\text{mod}\ #1)}
\newcommand{\N}{\mathbb N}
\newcommand{\Z}{\mathbb Z}
\newcommand{\Q}{\mathbb Q}
\newcommand{\R}{\mathbb R}
\newcommand{\C}{\mathbb C}
\newcommand{\cA}{\mathcal A}
\newcommand{\cB}{\mathcal B}
\newcommand{\cC}{\mathcal C}
\newcommand{\cD}{\mathcal D}
\newcommand{\cE}{\mathcal E}
\newcommand{\cF}{\mathcal F}
\newcommand{\cG}{\mathcal G}
\newcommand{\cH}{\mathcal H}
\newcommand{\cI}{\mathcal I}
\newcommand{\cJ}{\mathcal J}
\newcommand{\cL}{\mathcal L}
\newcommand{\cK}{\mathcal K}
\newcommand{\cN}{\mathcal N}
\newcommand{\cO}{\mathcal O}
\newcommand{\cP}{\mathcal P}
\newcommand{\cQ}{\mathcal Q}
\newcommand{\cS}{\mathcal S}
\newcommand{\cT}{\mathcal T}
\newcommand{\cV}{\mathcal V}
\newcommand{\cW}{\mathcal W}
\newcommand{\cZ}{\mathcal Z}
\newcommand{\emp}{\emptyset}
\newcommand{\bs}{\backslash}
\newcommand{\floor}[1]{\left \lfloor #1 \right \rfloor}
\newcommand{\ceil}[1]{\left \lceil #1 \right \rceil}
\newcommand{\abs}[1]{\left | #1 \right |}
\newcommand{\xspace}{}
\newcommand{\proofheader}[1]{\underline{\textbf{#1}}}
\)
</div>
<header id="title-block-header">
<h1 class="title">7.2 The One-Time Pad and Perfect Secrecy</h1>
</header>
<section>
<p>The Caesar cipher we studied in the previous section is simple enough as a starting point, but should never be used in practice! It suffers from the fatal flaw that each character of the plaintext is encrypted individually, using the same secret key each time. So for example, every occurrence of the character <code>'D'</code> in the plaintext is transformed into the same character in the ciphertext. Why is this a problem?</p>
<p>Consider the ciphertext <code>'OLaTO+T^+NZZW'</code> generated by the ASCII-based Caesar cipher. Even though it may look indecipherable at first, there is information that we can learn about the original plaintext just by looking at the distribution of letters in the ciphertext.<label for="sn-0" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-0" class="margin-toggle"/><span class="sidenote"> Given these observations and the hint that the plaintext is a common phrase used in CSC110, can you determine the plaintext?</span></p>
<ul>
<li>The first and fifth letters in the plaintext must be the same, since they both map to <code>'O'</code> in the ciphertext.</li>
<li>Similarly, the sixth and ninth characters must be the same, and the eleventh and twelfth characters must be the same.</li>
<li>Because the Caesar cipher is <em>additive</em>, it preserves the relative <code>ord</code> of each character. Since <code>ord('O') = 79</code> and <code>ord('N') = 78</code>, we know that the first and tenth characters of the plaintext must be consecutive ASCII characters.</li>
</ul>
<p>In addition to what we can infer from the distribution of letters in the ciphertext, the ASCII-based Caesar cipher is vulnerable to a <em>brute-force exhaustive key search attack</em>. There are only 128 possible secret keys the cipher could use (corresponding to the possible remainders of modulo 128). So, given a ciphertext, it is possible to try out every secret key and see which key yields a meaningful plaintext message.<label for="sn-1" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-1" class="margin-toggle"/><span class="sidenote"> For most ciphertexts generated from English plaintexts, only one possible secret key causes the decrypted message to be a meaningful English message.</span> That’s not very secure.</p>
<p>Even if we enlarge the set of possible keys (e.g., by using a more general text encoding like UTF8), Caesar ciphers are still vulnerable to observations like the ones we made earlier. From these observations, we can identify “likely” keys that a brute force search could try first. So the main weakness of the Caesar cipher is not just the number of possible keys.</p>
<h2 id="the-one-time-pad">The one-time pad</h2>
<p>We will now introduce a new symmetric-key cryptosystem known as the <strong>one-time pad</strong> that is structurally similar to the Caesar cipher, but avoids the issues we raised earlier. Encryption in the one-time pad works by shifting each character in the plaintext message, much like the Caesar cipher. But where the one-time pad differs is that the shift is <em>not</em> the same for each character. The one-time pad accomplishes this by not using a single number for the secret key, but rather a string of length greater than or equal to the length of the plaintext message you wish to encrypt.<label for="sn-2" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-2" class="margin-toggle"/><span class="sidenote"> This secret key is colloquially referred to as a “one-time pad” (of characters), from which this cryptosystem gets its name.</span></p>
<p>To <em>encrypt</em> a plaintext ASCII message <span class="math inline">\(m\)</span> with secret key <span class="math inline">\(k\)</span>, for each index <span class="math inline">\(i\)</span> between 0 and <span class="math inline">\(|m| - 1\)</span>, we compute:</p>
<ul>
<li><span class="math inline">\((m[i] + k[i]) ~\%~ 128\)</span>, where <span class="math inline">\(m[i]\)</span> and <span class="math inline">\(k[i]\)</span> are converted to their numeric representations to do the arithmetic.<label for="sn-3" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-3" class="margin-toggle"/><span class="sidenote"> In contrast, the Caesar cipher calculates <span class="math inline">\((m[i] + k) ~\%~ 128\)</span>, where <span class="math inline">\(k\)</span> is the secret key.</span></li>
</ul>
<p>Here is an example. Suppose we wanted to encrypt the plaintext <code>'HELLO'</code> with the secret key <code>'david'</code>. The ciphertext will have five characters, where the first is <code>'H' + 'd'</code> which results in <code>','</code>, the second is <code>'E' + 'A'</code> which results in <code>'&amp;'</code>, etc. The following diagram shows the full conversion:</p>
<p><img src="./images/one_time_pad.png" alt="One-Time Pad Example Diagram" /><br />
</p>
<p>Similarly, for decryption we take the ciphertext <code>c</code> and recover the plaintext by subtracting each letter of the secret key: <span class="math inline">\((c[i] - k[i]) ~\%~ 128\)</span>.</p>
<h2 id="perfect-secrecy-and-its-costs">Perfect secrecy and its costs</h2>
<p>The one-time pad cryptosystem is famous in cryptography for having a property known as <strong>perfect secrecy</strong>,<label for="sn-4" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-4" class="margin-toggle"/><span class="sidenote"> This is a term termed by the mathematician and cryptographer Claude Shannon in 1949.</span> which informally means that a ciphertext reveals no information about its corresponding plaintext other than its length. To see why, take our previous example, with ciphertext <code>',&amp;B53'</code>. This ciphertext could have been generated by <em>any</em> five-letter plaintext message, because for any such message there exists a secret key that could encrypt that message to obtain <code>',&amp;B53'</code>. The sender could have been sending plaintext message <code>'HELLO'</code> with secret key <code>'david'</code>, but it is equally likely they could have been sending the message <code>'FUNNY'</code> with secret key <code>'fQtgZ'</code>. Because of perfect secrecy, an eavesdropper cannot gain any information about the original plaintext message, even if they know the whole ciphertext.</p>
<p>This perfect secrecy comes at a cost, however. The main drawback of the one-time pad cryptosystem, and why it is not actually used in practice, is that the secret key must have at least the same length as the message being sent, and cannot be reused from one message to another.<label for="sn-5" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-5" class="margin-toggle"/><span class="sidenote"> The notion of perfect secrecy relies on every possible secret key to be chosen purely at random. This isn’t the case if I reuse the same one-time pad for all my messages. This requirement is also why the term “one-time” is used for one-time pads.</span></p>
<h2 id="stream-ciphers">Stream ciphers</h2>
<p>The attraction of perfect secrecy has led cryptographers to develop <em>stream ciphers</em>, which are a type of symmetric-key cryptosystem that emulate a one-time pad but share a much smaller secret key. The details of stream ciphers are beyond the scope of this course, but the basic is idea is the following: the shared secret key is quite small (less than 1KB), and both parties use an algorithm to generate an arbitrary number of new random characters, based on both the secret key and any previously-generated characters.<label for="sn-6" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-6" class="margin-toggle"/><span class="sidenote"> We say that this is a “stream” of characters, from which this type of cryptosystem gets its name.</span> These characters are then used in the same way as a one-time pad to encrypt messages.</p>
<p>Now, stream ciphers do not have perfect secrecy, since the characters used in encryption aren’t truly random. But if the generating algorithm is clever enough, each new character appears “random”, and the encrypted messages are computationally impossible to decrypt without knowing the starting secret key. In other words, stream ciphers give up on perfect secrecy in exchange for “good enough” secrecy and a much, much smaller shared secret key. Of course, the “good enough” is highly dependent on the algorithm used to generate the characters. A poorly-designed algorithm may unintentionally inject patterns in the generated characters, or even allow an eavesdropper to gain some information about the secret key itself!</p>
</section>
<footer>
<a href="https://www.teach.cs.toronto.edu/~csc110y/fall/notes/">CSC110 Course Notes Home</a>
</footer>
</body>
</html>