CSC110/01-working-with-data/01-data-types.html

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>1.1 The Different Types of Data</title>
  <style>
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    span.underline{text-decoration: underline;}
    div.column{display: inline-block; vertical-align: top; width: 50%;}
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
    ul.task-list{list-style: none;}
  </style>
  <link rel="stylesheet" href="../tufte.css" />
  <script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js" type="text/javascript"></script>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<div style="display:none">
\(
\newcommand{\NOT}{\neg}
\newcommand{\AND}{\wedge}
\newcommand{\OR}{\vee}
\newcommand{\XOR}{\oplus}
\newcommand{\IMP}{\Rightarrow}
\newcommand{\IFF}{\Leftrightarrow}
\newcommand{\TRUE}{\text{True}\xspace}
\newcommand{\FALSE}{\text{False}\xspace}
\newcommand{\IN}{\,{\in}\,}
\newcommand{\NOTIN}{\,{\notin}\,}
\newcommand{\TO}{\rightarrow}
\newcommand{\DIV}{\mid}
\newcommand{\NDIV}{\nmid}
\newcommand{\MOD}[1]{\pmod{#1}}
\newcommand{\MODS}[1]{\ (\text{mod}\ #1)}
\newcommand{\N}{\mathbb N}
\newcommand{\Z}{\mathbb Z}
\newcommand{\Q}{\mathbb Q}
\newcommand{\R}{\mathbb R}
\newcommand{\C}{\mathbb C}
\newcommand{\cA}{\mathcal A}
\newcommand{\cB}{\mathcal B}
\newcommand{\cC}{\mathcal C}
\newcommand{\cD}{\mathcal D}
\newcommand{\cE}{\mathcal E}
\newcommand{\cF}{\mathcal F}
\newcommand{\cG}{\mathcal G}
\newcommand{\cH}{\mathcal H}
\newcommand{\cI}{\mathcal I}
\newcommand{\cJ}{\mathcal J}
\newcommand{\cL}{\mathcal L}
\newcommand{\cK}{\mathcal K}
\newcommand{\cN}{\mathcal N}
\newcommand{\cO}{\mathcal O}
\newcommand{\cP}{\mathcal P}
\newcommand{\cQ}{\mathcal Q}
\newcommand{\cS}{\mathcal S}
\newcommand{\cT}{\mathcal T}
\newcommand{\cV}{\mathcal V}
\newcommand{\cW}{\mathcal W}
\newcommand{\cZ}{\mathcal Z}
\newcommand{\emp}{\emptyset}
\newcommand{\bs}{\backslash}
\newcommand{\floor}[1]{\left \lfloor #1 \right \rfloor}
\newcommand{\ceil}[1]{\left \lceil #1 \right \rceil}
\newcommand{\abs}[1]{\left | #1 \right |}
\newcommand{\xspace}{}
\newcommand{\proofheader}[1]{\underline{\textbf{#1}}}
\)
</div>
<header id="title-block-header">
<h1 class="title">1.1 The Different Types of Data</h1>
</header>
<section>
<p>Data is all around us and the amount of data stored increases every single day. In today’s world, decisions must be data-driven and so it is imperative that we be able to process, analyze, and understand the data we collect. Other important factors include the security and privacy of data. Businesses and governments need to answer important questions such as “Where should this data be stored?”; “How should this data be stored?”; and even, “Should this data be stored at all?”. The answers to these questions for Health Canada and personal health data is very different from the answers Nintendo might come up with for the next Animal Crossing game.</p>
<p>We begin our study of computer science by developing definitions for different categories of data. A <strong>data type</strong> is a way of categorizing data. A description of a data type conveys two important pieces of information:</p>
<ol type="1">
<li>The allowed <em>values</em> for a piece of data.</li>
<li>The allowed <em>operations</em> we can perform on a piece of data.</li>
</ol>
<p>For example, we could say that a person’s age is a natural number, which would tell us that values like 25 and 100 would be expected, while an age of -2 or “David” would be nonsensical. Knowing that a person’s age is a natural number also tells us what operations we could perform (e.g., “add 1 to the age”), and rules out other operations (e.g., “sort these ages alphabetically”).</p>
<p>In this section, we’ll review the common data types that we’ll make great use of in this course: numeric data, boolean data, textual data, and various forms of collections of data. Many terms and definitions may be review from your past studies, but be careful—they may differ slightly from what you’ve learned before, and it will be important to get these definitions exactly right.</p>
<h2 id="numeric-data">Numeric data</h2>
<p>Here are some types of numeric data, represented as familiar sets of numbers.</p>
<ul>
<li>A <strong>natural number</strong> is a value from the set <span class="math inline">\(\{0, 1, 2, \dots \}\)</span>. We use the symbol <span class="math inline">\(\N\)</span> to denote the set of natural numbers.<label for="sn-0" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-0" class="margin-toggle"/><span class="sidenote"> Note that our convention in computer science is to consider 0 a natural number!</span></li>
<li>An <strong>integer</strong> is a value from the set <span class="math inline">\(\{\dots, -2, -1, 0, 1, 2, \dots \}.\)</span> We use the symbol <span class="math inline">\(\Z\)</span> to denote the set of integers.</li>
<li>A <strong>rational number</strong> is a value from the set <span class="math inline">\(\{\frac{p}{q} \mid p, q \in \Z \text{ and } q \neq 0\}\)</span>—that is, the set of possible fractions. We use the symbol <span class="math inline">\(\Q\)</span> to denote the set of rational numbers.</li>
<li>An <strong>irrational number</strong> is a number with a infinite and non-repeating decimal expansion. Examples are <span class="math inline">\(\pi\)</span>, <span class="math inline">\(e\)</span>, and <span class="math inline">\(\sqrt 2\)</span>. We use the symbol <span class="math inline">\(\overline{\Q}\)</span> to denote the set of irrational numbers.</li>
<li>A <strong>real number</strong> is either a rational or irrational number. We use the symbol <span class="math inline">\(\R\)</span> to denote the set of real numbers.</li>
</ul>
<h3 id="operations-on-numeric-data">Operations on numeric data</h3>
<p>All numeric data types support the standard arithmetic operations (addition, subtraction, multiplication, division, and exponentiation), as well as the standard comparisons for equality (using <span class="math inline">\(=\)</span>) and inequality (<span class="math inline">\(&lt;\)</span>, <span class="math inline">\(\leq\)</span>, <span class="math inline">\(&gt;\)</span>, <span class="math inline">\(\geq\)</span>). And of course, you are familiar with many more numeric functions, like log and sin; these will come up throughout the course.</p>
<p>One additional arithmetic operation that may be less familiar to you is the <em>modulo operator</em>, which produces the remainder when one integer is divided by another. We’ll use the percent symbol <span class="math inline">\(\%\)</span> to denote the modulo operator, writing <span class="math inline">\(a \% b\)</span> to mean “the remainder when <span class="math inline">\(a\)</span> is divided by <span class="math inline">\(b\)</span>”. For example, <span class="math inline">\(10 \% 4 = 2\)</span> and <span class="math inline">\(30 \% 3 = 0\)</span>.</p>
<p>Some arithmetic operations are undefined for particular numbers; for example, we can’t divide by zero, and we can’t take the square root of a negative number.</p>
<h2 id="boolean-data">Boolean data</h2>
<p>A <strong>boolean</strong> is a value from the set <span class="math inline">\(\{\text{True}, \text{False}\}\)</span>. Think of a boolean value as an answer to a Yes/No question, e.g. “Is this person old enough to vote?”, “Is this country land-locked?”, and “Is this service free?”.</p>
<h3 id="operations-on-boolean-data">Operations on boolean data</h3>
<p>Booleans can be combined using <em>logical operators</em>. The three most common ones are:</p>
<ul>
<li><strong>not</strong>: reverses the value of a boolean. “not True” is False, and “not False” is True.</li>
<li><strong>and</strong>: takes two boolean values and produces True when both of the values are True, and False otherwise. For example, “True and False” is False, while “True and True” is True.</li>
<li><strong>or</strong>: takes two boolean values and produces True when at least one of the values is True, and False otherwise. For example, “True or False” is True, while “False or False” is False.</li>
</ul>
<p>Next week, we’ll discuss these logical operators in more detail and introduce a few others.</p>
<h2 id="textual-data">Textual data</h2>
<p>A <strong>string</strong> is an ordered sequence of characters, and is used to represent text. A character can be more than just an English letter (<span class="math inline">\(a\)</span>, <span class="math inline">\(b\)</span>, <span class="math inline">\(c\)</span>, etc.): number digits, punctuation marks, spaces, glyphs from non-English alphabets, and even emojis are all considered characters, and can be part of strings. Examples include a person’s name, your chat log, and the script of Shakespeare’s <em>Romeo and Juliet</em>.</p>
<h3 id="writing-textual-data">Writing textual data</h3>
<p>We typically will surround strings with single-quotes to differentiate them from any surrounding text, e.g., ‘David’.<label for="sn-1" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-1" class="margin-toggle"/><span class="sidenote"> We can also use double-quotes (“David”) to surround a string, but in this course we will generally prefer single-quotes for a reason we’ll discuss in Section 1.3.</span></p>
<p>A string can have zero characters; this string is called the <em>empty string</em>, and is denoted by `’ or the symbol <span class="math inline">\(\epsilon\)</span>.</p>
<h3 id="operations-on-textual-data">Operations on textual data</h3>
<p>Here are some common operations on strings.<label for="sn-2" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-2" class="margin-toggle"/><span class="sidenote"> <span class="math inline">\(s\)</span>, <span class="math inline">\(s_1\)</span>, and <span class="math inline">\(s_2\)</span> are all variables representing strings.</span></p>
<ul>
<li><p><span class="math inline">\(|s|\)</span>: <strong>string length/size</strong>. Returns the the number of characters in <span class="math inline">\(s\)</span>.</p></li>
<li><p><span class="math inline">\(s_1 = s_2\)</span>: <strong>string equality</strong>. Returns whether <span class="math inline">\(s_1\)</span> and <span class="math inline">\(s_2\)</span> have the same characters, in the same order.</p></li>
<li><p><span class="math inline">\(s + t\)</span>: <strong>string concatenation</strong>. Returns a new string consisting of the characters of <span class="math inline">\(s\)</span> followed by the characters of <span class="math inline">\(t\)</span>. For example, if <span class="math inline">\(s_1\)</span> represents the string ‘Hello’ and <span class="math inline">\(s_2\)</span> represents the string ‘Goodbye’, then <span class="math inline">\(s_1 + s_2\)</span> is the string ‘HelloGoodbye’.</p></li>
<li><p><span class="math inline">\(s[i]\)</span>: <strong>string indexing</strong>. Returns the <span class="math inline">\(i\)</span>-th character of <span class="math inline">\(s\)</span>, where indexing starts at 0. (So <span class="math inline">\(s[0]\)</span> returns the first character of <span class="math inline">\(s\)</span>, <span class="math inline">\(s[1]\)</span> returns the second, etc.) For example, if <span class="math inline">\(s\)</span> represents the string ‘Hello’, then <span class="math inline">\(s[0]\)</span> is ‘H’ and <span class="math inline">\(s[4]\)</span> is ‘o’.</p></li>
</ul>
<h2 id="set-data-unordered-distinct-values">Set data (unordered distinct values)</h2>
<p>A <strong>set</strong> is an unordered collection of zero or more distinct values, called its <strong>elements</strong>. Examples include: the set of all people in Toronto; the set of words of the English language; and the set of all countries on Earth.</p>
<h3 id="writing-sets">Writing sets</h3>
<p>We write sets using curly braces in two different ways:</p>
<ol type="1">
<li>Writing each element of the set within the braces, separated by commas. For example, <span class="math inline">\(\{1, 2, 3\}\)</span> or <span class="math inline">\(\{\text{‘hi&#39;}, \text{‘bye&#39;}\}\)</span>.</li>
<li>Using <em>set builder notation</em>, in which we define the form of elements of a set using variables. We saw an example of this earlier when defining the set of rational numbers, <span class="math inline">\(\{\frac{p}{q} \mid p, q \in \Z \text{ and } q \neq 0\}\)</span>.</li>
</ol>
<p>A set can have zero elements; this set is called the <em>empty set</em>, and is denoted by <span class="math inline">\(\{\}\)</span> or the symbol <span class="math inline">\(\emptyset\)</span>.</p>
<h3 id="operations-on-set-data">Operations on set data</h3>
<p>Here are some common set operations.<label for="sn-3" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-3" class="margin-toggle"/><span class="sidenote"><span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> represent sets.</span></p>
<ul>
<li><p><span class="math inline">\(|A|\)</span>: returns the <strong>size</strong> of set <span class="math inline">\(A\)</span>, i.e., the number of elements in <span class="math inline">\(A\)</span>.</p></li>
<li><p><span class="math inline">\(x \in A\)</span>: returns True when <span class="math inline">\(x\)</span> is an element of <span class="math inline">\(A\)</span>; <span class="math inline">\(y \notin A\)</span> returns True when <span class="math inline">\(y\)</span> is <em>not</em> an element of <span class="math inline">\(A\)</span>.</p></li>
<li><p><span class="math inline">\(A \subseteq B\)</span>: returns True when every element of <span class="math inline">\(A\)</span> is also in <span class="math inline">\(B\)</span>. We say in this case that <span class="math inline">\(A\)</span> is a <strong>subset</strong> of <span class="math inline">\(B\)</span>.</p>
<p>A set <span class="math inline">\(A\)</span> is a subset of itself, and the empty set is a subset of every set: <span class="math inline">\(A \subseteq A\)</span> and <span class="math inline">\(\emptyset \subseteq A\)</span> are always True.</p></li>
<li><p><span class="math inline">\(A = B\)</span>: returns True when <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> contain the exact same elements.</p></li>
</ul>
<p>The following operations return sets:</p>
<ul>
<li><p><span class="math inline">\(A \cup B\)</span>, the <strong>union</strong> of <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span>. Returns the set consisting of all elements that occur in <span class="math inline">\(A\)</span>, in <span class="math inline">\(B\)</span>, or in both.</p>
<p>Using set builder notation: <span class="math inline">\(A \cup B = \{x \mid x \in A \text{ or } x \in B\}\)</span>.</p></li>
<li><p><span class="math inline">\(A \cap B\)</span>, the <strong>intersection</strong> of <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span>. Returns the set consisting of all elements that occur in both <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span>.</p>
<p>Using set builder notation: <span class="math inline">\(A \cap B = \{x \mid x \in A \text{ and } x \in B\}\)</span>.</p></li>
<li><p><span class="math inline">\(A \setminus B\)</span>, the <strong>difference</strong> of <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span>. Returns the set consisting of all elements that are in <span class="math inline">\(A\)</span> but that are not in <span class="math inline">\(B\)</span>.</p>
<p>Using set builder notation: <span class="math inline">\(A \setminus B = \{x \mid x \in A \text{ and } x \notin B\}.\)</span></p></li>
<li><p><span class="math inline">\(A \times B\)</span>, the <strong>(Cartesian) product</strong> of <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span>. Returns the set consisting of all <em>pairs</em> <span class="math inline">\((a, b)\)</span> where <span class="math inline">\(a\)</span> is an element of <span class="math inline">\(A\)</span> and <span class="math inline">\(b\)</span> is an element of <span class="math inline">\(B\)</span>.</p>
<p>Using set builder notation: <span class="math inline">\(A \times B = \{(x, y) \mid x \in A \text{ and } y \in B\}.\)</span></p></li>
<li><p><span class="math inline">\(\cP(A)\)</span>, the <strong>power set</strong> of <span class="math inline">\(A\)</span>, returns the set consisting of all subsets of <span class="math inline">\(A\)</span>.<label for="sn-4" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-4" class="margin-toggle"/><span class="sidenote">Food for thought: what is the relationship between <span class="math inline">\(|A|\)</span> and <span class="math inline">\(|\cP(A)|\)</span>?</span> For example, if <span class="math inline">\(A = \{1,2,3\}\)</span>, then <span class="math display">\[\cP(A) = \big\{ \emptyset, \{1\},\{2\},\{3\},\{1,2\},\{1,3\},\{2,3\},\{1,2,3\}\big\}.\]</span></p>
<p>Using set builder notation: <span class="math inline">\(\cP(A) = \{S \mid S \subseteq A\}\)</span>.</p></li>
</ul>
<h2 id="list-data-ordered-values">List data (ordered values)</h2>
<p>A <strong>list</strong> is an ordered collection of zero or more (possibly duplicated) values, called its elements. List data is used instead of a set when the elements of the collection should be in a specified order, or if it may contain duplicates. Examples include: the list of all people in Toronto, ordered by age; the list of words of the English language, ordered alphabetically, and the list of names of students at U of T (two students may have the same name!), ordered alphabetically.</p>
<h3 id="writing-lists">Writing lists</h3>
<p>Lists are written with square brackets enclosing zero or more values separated by commas. For example, <span class="math inline">\([1, 2, 3]\)</span>.</p>
<p>A list can have zero elements; this list is called the <em>empty list</em>, and is denoted by <span class="math inline">\([]\)</span>.</p>
<h3 id="operations-on-list-data">Operations on list data</h3>
<p>Here are some common list operations.<label for="sn-5" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-5" class="margin-toggle"/><span class="sidenote"><span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> represent lists.</span></p>
<ul>
<li><p><span class="math inline">\(|A|\)</span>: returns the <strong>size</strong> of <span class="math inline">\(A\)</span>, i.e., the number of elements in <span class="math inline">\(A\)</span> (counting all duplicates).</p></li>
<li><p><span class="math inline">\(x \in A\)</span>: same meaning as for sets.</p></li>
<li><p><span class="math inline">\(A = B\)</span>: <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> have the same elements in the same order.</p></li>
<li><p><span class="math inline">\(A[i]\)</span>: <strong>list indexing</strong>. Returns the <span class="math inline">\(i\)</span>-th element of <span class="math inline">\(A\)</span>, where the indexing starts at 0. So <span class="math inline">\(A[0]\)</span> returns the first element of <span class="math inline">\(A\)</span>, <span class="math inline">\(A[1]\)</span> returns the second, etc.</p></li>
<li><p><span class="math inline">\(A + B\)</span>: <strong>list concatenation</strong>. Returns a new list consisting of the elements of <span class="math inline">\(A\)</span> followed by the elements of <span class="math inline">\(B\)</span>. This is similar to set union, but duplicates are kept, and order is preserved.</p>
<p>For example, <span class="math inline">\([1, 2, 3] + [2, 4, 6] = [1, 2, 3, 2, 4, 6]\)</span>.</p></li>
</ul>
<h2 id="mapping-data">Mapping data</h2>
<p>Finally, a <strong>mapping</strong> is an unordered collection of pairs of values. Each pair consists of a <em>key</em> and an associated <em>value</em>; the keys must be unique in the mapping, but the values can be duplicated. A key cannot exist in the mapping without a corresponding value.</p>
<p>Mappings are used to represent associations between two collections of data. For example: a mapping from the name of a country to its GDP; a mapping from student number to name; and a mapping from food item to price.</p>
<h3 id="writing-mappings">Writing mappings</h3>
<p>We use curly braces to represent a mapping.<label for="sn-6" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-6" class="margin-toggle"/><span class="sidenote"> This is similar to sets, because mappings are quite similar to sets. Both data types are unordered, and both have a uniqueness constraint (a set’s elements are unique; a mapping’s keys are unique).</span> Each key-value pair in a mapping is written using a colon, with the key on the left side of the colon and its associated value on the right. For example, here is how we could write a mapping representing the menu items of a restaurant: <span class="math display">\[\{\text{`fries&#39;}: 5.99, \text{`steak&#39;}: 25.99, \text{`soup&#39;}: 8.99\}.\]</span></p>
<h3 id="operations-on-mappings">Operations on mappings</h3>
<p>Here are some common set operations.<label for="sn-7" class="margin-toggle sidenote-number"></label><input type="checkbox" id="sn-7" class="margin-toggle"/><span class="sidenote"><span class="math inline">\(M\)</span> and <span class="math inline">\(N\)</span> represent mappings.</span></p>
<ul>
<li><span class="math inline">\(|M|\)</span>: returns the <strong>size</strong> of the mapping <span class="math inline">\(M\)</span>, i.e., the number of key-value pairs in <span class="math inline">\(M\)</span>.</li>
<li><span class="math inline">\(M = N\)</span>: returns whether two mappings are equal, i.e., when they contain exactly the same key-value pairs.</li>
<li><span class="math inline">\(k \in M\)</span>: returns whether <span class="math inline">\(k\)</span> is a <em>key</em> contained in the mapping <span class="math inline">\(M\)</span>.</li>
<li><span class="math inline">\(M[k]\)</span>: when <span class="math inline">\(k\)</span> is a key in <span class="math inline">\(M\)</span>, this operation returns the value that corresponds to <span class="math inline">\(k\)</span> in the mapping <span class="math inline">\(M\)</span>.</li>
</ul>
<h2 id="and-more">…and more!</h2>
<p>The data types we’ve studied so far are not the only kinds of data that we encounter in the real world, but they do form a basis for representing all kinds of more complex data. We’ll study how to represent more complex forms of data later in this course, but here’s one teaser: representing image data.</p>
<p><img src="images/chelsea_channels.png" alt="Chelsea Cat Split by Colour Channel" /><br />
</p>
<p>Images can be represented as a list of integers. Each element in the list corresponds to a very tiny dot on your screen—a <em>pixel</em>. For each dot, three integer values are used to represent three colour channels: red, green, and blue. We can then add these channels together to get a very wide range of colours (this is called the RGB colour model). Somehow, our computers are able to take these sequences of integers and translate them into a sequence of visible lights and if these lights are arranged in a particular way, well, a cat appears!</p>
<h2 id="references">References</h2>
<ol type="1">
<li>Check out <a href="https://en.wikipedia.org/wiki/Our_World_in_Data">Our World in Data</a> to see how data-driven research is being used to tackle global problems.</li>
<li>If you’d like to read more about the RGB colour model, the Wikipedia entry is a good start: <a href="https://en.wikipedia.org/wiki/RGB_color_model" class="uri">https://en.wikipedia.org/wiki/RGB_color_model</a>.</li>
</ol>
</section>
<footer>
<a href="https://www.teach.cs.toronto.edu/~csc110y/fall/notes/">CSC110 Course Notes Home</a>
</footer>
</body>
</html>