What is Computational Thinking? | Introduction to Data Science

As “Mathematical Thinking” draws from fundamental ideas in Mathematics (as a discipline), and “Statistical Thinking” relates to the core of Statistics (again, as a discipline), so “Computational Thinking” involves basic notions of Computer Science. Computational Thinking teaches the use of abstraction and decomposition when solving complex problems; it presents a framework for understanding algorithms; and it describes essential concepts in dealing with data and code and in expressing the limits of modern computing machinery. That said, Computational Thinking is a relatively recent proposition; we use the term to refer to learning related to computer science that transcends the purely functional or vocational (as is the case with even the more mature disciplinary “thinking” movements), and provide students with important critical thinking skills. This approach allows us to present Computer Science as a living discipline, built around a set of core concepts, with technological byproducts that touch both our personal and professional lives. (See Wing, 2006).

Computer Science, and perhaps more broadly information technologies, have reshaped nearly every disciplinary practice, and we therefore believe that Computational Thinking, as a pedagogical device, has a role to play in every STEM program. Students in math and science, for example, need more than simple programming exercises. They should be able to apply basic strategies in problem solving, understand the character of a solution or algorithm, and have a sense of the ways in which computerization and digitization have changed how research is conducted.

Therefore, we use “Computational Thinking” as a major framing device to present core ideas in Computer Science. In this way, we move beyond the traditional functional approach and introduce broader themes. With Participatory Sensing we create a context for student-directed learning. Participatory Sensing is a bridge for developing hands-on, inquiry-based projects with impacts in Math and Science, as well as Computer Science.

Given the above background we can summarize the core intent of this project as being: to develop methods for educating and engaging students in Computational Thinking. We will do so by developing and applying our proposed Participatory Sensing methods, tools and systems infrastructure. Such a platform for embodied interaction and visualization is, at best, minimally available in current instructional tools. In this context, we will develop and support standards-based units for both Computer Science and traditional STEM Science and Mathematics disciplines.

CT Construct #1: Data Format and Representation

Students will learn about the different forms that data take on as they are stored, processed, and shared, including but not limited to the differences between: binary, plain text, human readable, comma delimited, and self-describing formats. Different forms of data lend themselves to different forms of analysis, often with tradeoff between legibility and processing. JSON and XML are two formats used in publishing data on the web, and students can learn the tradeoffs made between compact representation and readability. Spatial data for example, often has more elaborate formats such as (KML, ArcGIS) some of which may even be non-standard, specific to a particular software application. Data used to be mostly about processing, but now it is as much about sharing and publishing, both through simple posting of files for download and viewing using a program that interprets the file format, as well as through creation of feeds that push out updates from a data source in real-time. Data representation is a key construct for applying computational thinking to problems at every scale.

CT Construct #2: Modes of Data Collection

Students will learn about both purposeful hypothesis-driven data collection and data exploration. Purposeful data collection includes surveys and designed experiments and is the more traditional mode. Increasingly, knowledge is equally being created through the exploration of otherwise-reported and available data to extract patterns, correlations, and even causations. Understanding exploratory data analysis is a key construct in applying computational thinking to the real world. It allows computer scientists to extract new understanding out of existing and diverse data, and as a pedagogical tool can be used to show the relevance of computer science to many aspects of students lives as learners and citizens. It allows teachers to work with topics and contexts clearly visible and relevant to students, instead of working with data in a vacuum, or in the context of toy problems (M&Ms), or more elaborate but canned and less relevant data.

CT Construct #3: Algorithmic Analysis

Students will learn that by constructing algorithms that iteratively process data, and whose logic is based on the results of previous steps in the processing, we can build the algorithms to do exploratory data analysis, for example using constructs such as decision trees and multi-dimensional scaling. These are algorithms that have informal descriptions and whose understanding will provide the students both with key analytical tools, as well as with the insight into how those tools work. The students will learn that data is not just about one-shot (if complex) mathematical functions applied to data, but rather about algorithmically defined programs that branch and loop over the data to extract knowledge.

Each of the Participatory Sensing data collection campaigns provides learning opportunities around the issues of data formats/representation, modes of data collection, and algorithmic analysis:

Data Format and Representation

For example, recycling, safety, asthma, stress/chill, and daily habits will be centered around tagged images, while transportation, and exercise, center around activity traces. This gives us the opportunity to talk about the different ways we represent images, text, and location time series, and the different ways we manage and process the data. The fact that the campaigns have multiple types of data lets us use those differences to understand how format and representation play an important role in computing systems.

Modes of Data Collection

Many of the studies present opportunities for both hypothesis driven and exploratory modes of data collection. This gives us the opportunity to within any one context concretely compare and contrast the differences in modalities. Take transportation, as an example. Students can do an exploratory exercise looking at how students in their class get to and from school and then investigate the data for correlations, trends, and clustering as to the patterns of transport and available demographic information (eg, you might find how things vary with the students subject matter of interest, number of siblings, whether they have a drivers license, and so forth.) Or one can present a hypothesis driven approach where you explore the extent to which mode of transportation trends toward independent transport differently for different genders. Similar contrasts can be presented across the environmental and personal campaigns. For example, a student might explore the relationship between sleep patterns during the week and exercise on the weekend, with the hypothesis that lack of sleep during the week contributes to reduced physical activity on the weekend. Alternatively, a student might run a broader diary of self-monitoring and then use exploratory tools to look for more complex or subtle relationships between parameters and over longer timescales.

Algorithmic Analysis

Within each of those campaigns there will be opportunities to both do direct analysis and then follow it with algorithmic analysis in order to expose more complex and telling relationships. For example, students might be taught how to construct a decision tree to process and classify GPS time series into activities, such as still, driving, and walking. The power of introducing decision trees is that they have a graphical as well as a predictive structure and students will be able to relate their intuition to the algorithms and mathematics behind it; students can look at the structure and details of the decision tree and build an understanding for the algorithm. Like the game of 20 questions, a decision tree is a series of splittings: the first split might be whether the person’s speed is greater than 25 miles an hour or not; a second might be whether they are near a freeway; a third might be based on what the person was doing in the previous time-slot, and so forth. Such algorithmic analysis is at the heart of data practices, modeling and many other aspects of modern computer science. In this context students will see how it relates to important environmental and personal issues, and how it gets applied to very personal instantiations of that data.

Introduction to Data Science – Course Overview
Unit	Unit Title	Unit Description
Unit1	Data and Visualizations	Introduces students to fundamental notions of data analysis—such as distribution and multivariate associations and emphasizes creating and interpreting visualizations of real-world processes as captured by data
Unit2	Distributions, Probability, and Simulations	Students use numerical summaries to describe distributions and introduces probability through the lens of computer simulations for informal inference
Unit3	Data Collection Methods: Traditional and Modern	Prepares students to learn about the various ways of collecting data, including Participatory Sensing, and the effect that data collection has on their interpretation of the patterns theydiscover
Unit4	Predictions and Models	Students learn to make and how to use mathematical and statistical models to predict future observations and how data scientists measure the success of these predictions