Files
CSC110/04-complex-data/02-data-classes-1.html
T
Hykilpikonna 6fffdf686a deploy
2021-12-07 22:28:01 -05:00

263 lines
23 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>4.2 Defining Our Own Data Types, Part 1</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>
<link rel="stylesheet" href="../tufte.css" />
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<div style="display:none">
\(
\newcommand{\NOT}{\neg}
\newcommand{\AND}{\wedge}
\newcommand{\OR}{\vee}
\newcommand{\XOR}{\oplus}
\newcommand{\IMP}{\Rightarrow}
\newcommand{\IFF}{\Leftrightarrow}
\newcommand{\TRUE}{\text{True}\xspace}
\newcommand{\FALSE}{\text{False}\xspace}
\newcommand{\IN}{\,{\in}\,}
\newcommand{\NOTIN}{\,{\notin}\,}
\newcommand{\TO}{\rightarrow}
\newcommand{\DIV}{\mid}
\newcommand{\NDIV}{\nmid}
\newcommand{\MOD}[1]{\pmod{#1}}
\newcommand{\MODS}[1]{\ (\text{mod}\ #1)}
\newcommand{\N}{\mathbb N}
\newcommand{\Z}{\mathbb Z}
\newcommand{\Q}{\mathbb Q}
\newcommand{\R}{\mathbb R}
\newcommand{\C}{\mathbb C}
\newcommand{\cA}{\mathcal A}
\newcommand{\cB}{\mathcal B}
\newcommand{\cC}{\mathcal C}
\newcommand{\cD}{\mathcal D}
\newcommand{\cE}{\mathcal E}
\newcommand{\cF}{\mathcal F}
\newcommand{\cG}{\mathcal G}
\newcommand{\cH}{\mathcal H}
\newcommand{\cI}{\mathcal I}
\newcommand{\cJ}{\mathcal J}
\newcommand{\cL}{\mathcal L}
\newcommand{\cK}{\mathcal K}
\newcommand{\cN}{\mathcal N}
\newcommand{\cO}{\mathcal O}
\newcommand{\cP}{\mathcal P}
\newcommand{\cQ}{\mathcal Q}
\newcommand{\cS}{\mathcal S}
\newcommand{\cT}{\mathcal T}
\newcommand{\cV}{\mathcal V}
\newcommand{\cW}{\mathcal W}
\newcommand{\cZ}{\mathcal Z}
\newcommand{\emp}{\emptyset}
\newcommand{\bs}{\backslash}
\newcommand{\floor}[1]{\left \lfloor #1 \right \rfloor}
\newcommand{\ceil}[1]{\left \lceil #1 \right \rceil}
\newcommand{\abs}[1]{\left | #1 \right |}
\newcommand{\xspace}{}
\newcommand{\proofheader}[1]{\underline{\textbf{#1}}}
\)
</div>
<header id="title-block-header">
<h1 class="title">4.2 Defining Our Own Data Types, Part 1</h1>
</header>
<section>
<p>Up to this point, all the data weve worked with in Python have been stored in objects that are instances of the built-in types that come with Python, like <code>int</code>s and <code>list</code>s. Pythons built-in data types are powerful, but are not always the most intuitive way to store data. For example, we saw in <a href="01-tabular-data.html">4.1 Tabular Data</a> that we could use a list of lists to represent tabular data. One of the downsides of this approach is that when working with this data, the onus is on us to remember which list element corresponds to which component of the data.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1"></a><span class="op">&gt;&gt;&gt;</span> <span class="im">import</span> datetime</span>
<span id="cb1-2"><a href="#cb1-2"></a><span class="op">&gt;&gt;&gt;</span> row <span class="op">=</span> [<span class="dv">1657</span>, <span class="st">&#39;ET&#39;</span>, <span class="dv">80</span>, datetime.date(<span class="dv">2011</span>, <span class="dv">1</span>, <span class="dv">1</span>)]</span>
<span id="cb1-3"><a href="#cb1-3"></a><span class="op">&gt;&gt;&gt;</span> row[<span class="dv">0</span>] <span class="co"># The id</span></span>
<span id="cb1-4"><a href="#cb1-4"></a><span class="dv">1657</span></span>
<span id="cb1-5"><a href="#cb1-5"></a><span class="op">&gt;&gt;&gt;</span> row[<span class="dv">1</span>] <span class="co"># The name of the civic centre</span></span>
<span id="cb1-6"><a href="#cb1-6"></a><span class="co">&#39;ET&#39;</span></span>
<span id="cb1-7"><a href="#cb1-7"></a><span class="op">&gt;&gt;&gt;</span> row[<span class="dv">2</span>] <span class="co"># The number of marriage licenses issued</span></span>
<span id="cb1-8"><a href="#cb1-8"></a><span class="dv">80</span></span>
<span id="cb1-9"><a href="#cb1-9"></a><span class="op">&gt;&gt;&gt;</span> row[<span class="dv">3</span>] <span class="co"># The time period</span></span>
<span id="cb1-10"><a href="#cb1-10"></a>datetime.date(<span class="dv">2011</span>, <span class="dv">1</span>, <span class="dv">1</span>)</span></code></pre></div>
<p>You can imagine how error prone this might be. A simple “off by one” error for an index might retrieve a completely different data type. It also makes our code difficult to read; the reader must know what each index of the list represents. And, as more experienced programmers will tell you, readable code is crucial.<label for="sn-0" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-0" class="margin-toggle"/><span class="sidenote"> “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” Martin Fowler</span></p>
<p>So a row in our marriage license data set is made up of four data elements. It would be nice if, instead of indices, we could use a name that was reflective of each element. Certainly, we could use a dictionary (instead of a list) where the keys are strings. But there is a more robust option well learn about in this section: creating our <em>own</em> data types.</p>
<h2 id="defining-a-data-class">Defining a data class</h2>
<p>You might remember from Chapter 1 that in Python, another term for data type is a <strong>class</strong>.<label for="sn-1" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-1" class="margin-toggle"/><span class="sidenote"> This is why <code>type(3)</code> evaluates to <code>&lt;class 'int'&gt;</code> in Python.</span> The built-in data types weve studied so far illustrate how rich and complex data types can be. So for creating our own data types, we will first learn about the simplest kind of data type: a <strong>data class</strong>, which is a kind of class whose purpose is to bundle individual pieces of data into a single Python object.</p>
<p>For example, suppose we want to represent a “person” consisting of a given name, family name, age, and home address. We already know how to represent each individual piece of data: the given name, family name, and address could be strings, and the age could be a natural number. To bundle these values together, we could use a list or other built-in collection data type, but that approach would run into the issues we discussed above.</p>
<p>So instead, we define our own data class to create a new data type consisting of these four values. Here is the way to create a data class in Python:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a><span class="im">from</span> dataclasses <span class="im">import</span> dataclass</span>
<span id="cb2-2"><a href="#cb2-2"></a></span>
<span id="cb2-3"><a href="#cb2-3"></a></span>
<span id="cb2-4"><a href="#cb2-4"></a><span class="at">@dataclass</span></span>
<span id="cb2-5"><a href="#cb2-5"></a><span class="kw">class</span> Person:</span>
<span id="cb2-6"><a href="#cb2-6"></a> <span class="co">&quot;&quot;&quot;A custom data type that represents data for a person.&quot;&quot;&quot;</span></span>
<span id="cb2-7"><a href="#cb2-7"></a> given_name: <span class="bu">str</span></span>
<span id="cb2-8"><a href="#cb2-8"></a> family_name: <span class="bu">str</span></span>
<span id="cb2-9"><a href="#cb2-9"></a> age: <span class="bu">int</span></span>
<span id="cb2-10"><a href="#cb2-10"></a> address: <span class="bu">str</span></span></code></pre></div>
<p>Lets unpack this definition.</p>
<ol type="1">
<li><p><code>from dataclasses import dataclass</code> is a Python import statement that lets us use <code>dataclass</code> below.</p></li>
<li><p><code>@dataclass</code> is a Python <em>decorator</em>. Weve seen decorators before for function definitions; a decorator for a class definition works in the same way, acting as a modifier for our definition. In this case, <code>@dataclass</code> tells Python that the data type were defining is a data class, which well explore the benefits of down below.</p></li>
<li><p><code>class Person:</code>, signals the start of a <em>class definition</em>.<label for="sn-2" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-2" class="margin-toggle"/><span class="sidenote"> This is similar to function definitions, except we use the <code>class</code> keyword instead of <code>def</code>. </span> The name of the class is <code>Person</code>.</p>
<p>The rest of the code is indented to put it inside of the class body.</p></li>
<li><p>The next line is a docstring that describes the purpose of the class.</p></li>
<li><p>Each remaining line (starting with <code>given_name: str</code>) defines a piece of data associated with the class; each piece of data is called an <strong>instance attribute</strong> of the class.</p>
<p>For each instance attribute, we write a name and a type annotation.<label for="sn-3" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-3" class="margin-toggle"/><span class="sidenote"> This is similar to defining parameter names and types for functions, though of course the purposes are different. </span></p></li>
</ol>
<h3 id="general-data-class-definition-syntax">General data class definition syntax</h3>
<p>In general, a data class definition in Python has the following syntax:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1"></a><span class="at">@dataclass</span></span>
<span id="cb3-2"><a href="#cb3-2"></a><span class="kw">class</span> <span class="op">&lt;</span>ClassName<span class="op">&gt;</span>:</span>
<span id="cb3-3"><a href="#cb3-3"></a> <span class="co">&quot;&quot;&quot;Description of data class.</span></span>
<span id="cb3-4"><a href="#cb3-4"></a><span class="co"> &quot;&quot;&quot;</span></span>
<span id="cb3-5"><a href="#cb3-5"></a> <span class="op">&lt;</span>attribute1<span class="op">&gt;</span>: <span class="op">&lt;</span>type1<span class="op">&gt;</span></span>
<span id="cb3-6"><a href="#cb3-6"></a> <span class="op">&lt;</span>attribute2<span class="op">&gt;</span>: <span class="op">&lt;</span>type2<span class="op">&gt;</span></span>
<span id="cb3-7"><a href="#cb3-7"></a> ...</span></code></pre></div>
<h2 id="using-data-classes">Using data classes</h2>
<p>Now that weve seen how to define a data class, we now are ready to actually put it to use. To create an instance of our <code>Person</code> data class, we write a Python expression that calls the data class, passing in as arguments the values for each instance attribute:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1"></a><span class="op">&gt;&gt;&gt;</span> david <span class="op">=</span> Person(<span class="st">&#39;David&#39;</span>, <span class="st">&#39;Liu&#39;</span>, <span class="dv">100</span>, <span class="st">&#39;40 St. George Street&#39;</span>)</span></code></pre></div>
<p>Pretty cool! That line of code creates a new <code>Person</code> object whose given name is <code>'David'</code>, family name is <code>'Liu'</code>, age is <code>100</code>, and address is <code>'40 St. George Street'</code>, and stores the object in the variable <code>david</code>. The <em>type</em> of this new value is, as wed expect, <code>Person</code>:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1"></a><span class="op">&gt;&gt;&gt;</span> <span class="bu">type</span>(david)</span>
<span id="cb5-2"><a href="#cb5-2"></a><span class="op">&lt;</span><span class="kw">class</span> Person<span class="op">&gt;</span></span></code></pre></div>
<p>If we ask Python to evaluate the <code>Person</code> object, we see the different pieces of data that have been bundled together:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1"></a><span class="op">&gt;&gt;&gt;</span> david</span>
<span id="cb6-2"><a href="#cb6-2"></a>Person(given_name<span class="op">=</span><span class="st">&#39;David&#39;</span>, family_name<span class="op">=</span><span class="st">&#39;Liu&#39;</span>, age<span class="op">=</span><span class="dv">100</span>, address<span class="op">=</span><span class="st">&#39;40 St. George Street&#39;</span>)</span></code></pre></div>
<p>But from a <code>Person</code> object, how do we extract the individual values we bundled together? If we were using lists, wed simply do list indexing: <code>david[0]</code>, <code>david[1]</code>, etc. The syntax for Python classes improves this because we can use the names of the instance attributes together with <strong>dot notation</strong> to access these values:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1"></a><span class="op">&gt;&gt;&gt;</span> david.given_name</span>
<span id="cb7-2"><a href="#cb7-2"></a><span class="co">&#39;David&#39;</span></span>
<span id="cb7-3"><a href="#cb7-3"></a><span class="op">&gt;&gt;&gt;</span> david.family_name</span>
<span id="cb7-4"><a href="#cb7-4"></a><span class="co">&#39;Liu&#39;</span></span>
<span id="cb7-5"><a href="#cb7-5"></a><span class="op">&gt;&gt;&gt;</span> david.age</span>
<span id="cb7-6"><a href="#cb7-6"></a><span class="dv">100</span></span>
<span id="cb7-7"><a href="#cb7-7"></a><span class="op">&gt;&gt;&gt;</span> david.address</span>
<span id="cb7-8"><a href="#cb7-8"></a><span class="co">&#39;40 St. George Street&#39;</span></span></code></pre></div>
<p>This is much more readable than list indexing, and this is one of the major advantages of using data classes over lists to represent custom data in Python.</p>
<h2 id="tip-naming-attributes-when-creating-data-class-instances">Tip: naming attributes when creating data class instances</h2>
<p>One challenge when creating instances of our data classes is keeping track of which arguments correspond to which instance attributes. In the expression <code>Person('David', 'Liu', 100, '40 St. George Street')</code>, the order of the arguments must match the order the instance attributes are listed in the definition of the data class—and its our responsibility to remember this order. Think about how easy it would be for us to write <code>Person('Liu', 'David', 100, '40 St. George Street')</code>, only to discover much later in our program that we accidentally switched this poor fellows given and family names!</p>
<p>To solve this issue, Python enables us to create data class instances using <em>keyword arguments</em> to explicitly name which argument corresponds to which instance attribute, using the exact same format as the <code>Person</code> representation we saw above:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1"></a><span class="op">&gt;&gt;&gt;</span> david <span class="op">=</span> Person(given_name<span class="op">=</span><span class="st">&#39;David&#39;</span>, family_name<span class="op">=</span><span class="st">&#39;Liu&#39;</span>, age<span class="op">=</span><span class="dv">100</span>, address<span class="op">=</span><span class="st">&#39;40 St. George Street&#39;</span>)</span></code></pre></div>
<p>Not only is this more explicit, but using keyword arguments allows us to pass the values in any order we want:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1"></a><span class="op">&gt;&gt;&gt;</span> david <span class="op">=</span> Person(family_name<span class="op">=</span><span class="st">&#39;Liu&#39;</span>, given_name<span class="op">=</span><span class="st">&#39;David&#39;</span>, address<span class="op">=</span><span class="st">&#39;40 St. George Street&#39;</span>, age<span class="op">=</span><span class="dv">100</span>)</span></code></pre></div>
<p>This is a great improvement for the readability of our code when we use data classes, especially as they grow larger. One potential downside that comes with this (and in general when being more explicit) is that this requires a bit more typing, and makes our code a little longer. You can get around the first issue by using auto-completion features (e.g., in PyCharm), and for the second issue you can put the different arguments on separate lines:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1"></a><span class="op">&gt;&gt;&gt;</span> david <span class="op">=</span> Person(</span>
<span id="cb10-2"><a href="#cb10-2"></a>... family_name<span class="op">=</span><span class="st">&#39;Liu&#39;</span>,</span>
<span id="cb10-3"><a href="#cb10-3"></a>... given_name<span class="op">=</span><span class="st">&#39;David&#39;</span>,</span>
<span id="cb10-4"><a href="#cb10-4"></a>... address<span class="op">=</span><span class="st">&#39;40 St. George Street&#39;</span>,</span>
<span id="cb10-5"><a href="#cb10-5"></a>... age<span class="op">=</span><span class="dv">100</span></span>
<span id="cb10-6"><a href="#cb10-6"></a>... )</span></code></pre></div>
<h3 id="representing-data-classes-in-the-memory-model">Representing data classes in the memory model</h3>
<p>Now that we have the ability to define our own data types, we need to decide how these data types will fit into our memory model. Well do this by using the representation that Python displays, formatted to show each instance attribute on a new line. For example, we would represent the <code>david</code> variable in a memory model as follows:</p>
<div class="memory-model-values" style="width:65%">
<table style="width:69%;">
<colgroup>
<col style="width: 15%" />
<col style="width: 54%" />
</colgroup>
<thead>
<tr class="header">
<th>Variable</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>david</code></td>
<td><div class="sourceCode" id="cb11"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1"></a>Person(</span>
<span id="cb11-2"><a href="#cb11-2"></a> family_name<span class="op">=</span><span class="st">&#39;Liu&#39;</span>,</span>
<span id="cb11-3"><a href="#cb11-3"></a> given_name<span class="op">=</span><span class="st">&#39;David&#39;</span>,</span>
<span id="cb11-4"><a href="#cb11-4"></a> address<span class="op">=</span><span class="st">&#39;40 St. George Street&#39;</span>,</span>
<span id="cb11-5"><a href="#cb11-5"></a> age<span class="op">=</span><span class="dv">100</span></span>
<span id="cb11-6"><a href="#cb11-6"></a>)</span></code></pre></div>
 </td>
</tr>
</tbody>
</table>
</div>
</section>
<!--
In Python, a **class** defines a data type.
The data types we've been working with so far (e.g., `int`, `dict`) are classes.
When we create, for example, a list, we have created an **object**.
Careful, an object and a class are two different things.
While a class defines the data type and you can think of it like a template or form.
An object is the template with all the blank spaces filled out.
We say that an **object** is an **instance** of a **class**.
-->
<footer>
<a href="https://www.teach.cs.toronto.edu/~csc110y/fall/notes/">CSC110 Course Notes Home</a>
</footer>
</body>
</html>