Software Carpentry at Nikhef, day 2

# Software Carpentry at Nikhef, day 2 :::info :information_source: On this page you will find notes for the second day of the Software Carpentry workshop organized on 27 September 2023. ::: ## Code of Conduct Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed. ## :watch: Schedule September 27 | | **Programming with Python**| |------|------| | 09:30 | Programming with Python | | 10:30 | *Morning break* | | 10:45 | Programming with Python (Continued) | | 12:30 | *Lunch break* | | 13:15 | Programming with Python (Continued) | | 15:30 | *Afternoon break* | | 15:45 | Programming with Python (Continued) | | 17:30 | *END* | ## Programming with Python ### :link: Links * Setup page: https://kb.nikhef.nl/computing-course/swc-setup/ * Lesson material: https://kb.nikhef.nl/computing-course/python-novice/ * Reference page: https://kb.nikhef.nl/computing-course/python-novice/reference.html ### 1. Python Fundamentals ```python! 1 + 4 + 2 weight_kg = 60 weight_kg weight_kg = 60.234 weight_kg patient_id = "hello world!" patient_id = 'hello world!' weight_kg * 2.2 weight_lb = weight_kg * 2.2 weight_lb print(weight_lb) patient_id = "patient_01" print(patient_id, "weights", weight_kg) print(type(patient_id)) patient_id = 12 print(type(patient_id)) print("weight in pounds", weight_kg * 2.2, "lb") print("weight:", weight_kg) weight_kg = 55 print("weight:", weight_kg) print(weight_lb) ``` :::success :pencil: **Check Your Understanding** What values do the variables `mass` and `age` have after each of the following statements? Test your answer by executing the lines. ```python! mass = 47.5 age = 122 mass = mass * 2.0 age = age - 20 ``` <details> <summary>Solution</summary> ``` `mass` holds a value of 47.5, `age` does not exist `mass` still holds a value of 47.5, `age` holds a value of 122 `mass` now has a value of 95.0, `age`'s value is still 122 `mass` still has a value of 95.0, `age` now holds 102 ``` </details> ::: :::success :pencil: **Seeing Data Types** What are the data types of the following variables? ```python! planet = 'Earth' apples = 5 distance = 10.5 ``` <details> <summary>Solution</summary> ```python! print(type(planet)) print(type(apples)) print(type(distance)) ``` ``` <class 'str'> <class 'int'> <class 'float'> ``` </details> ::: ### 2. Analyzing Patient Data ```python! import os os.getcwd() os.chdir("") import numpy numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') print(type(data)) data.dtype print(data.shape) data[0, 0] print("value for first patient on first day:", data[0, 0]) print("some random value from the array:", data[4, 8]) data[0, :] data[0, 4:10] data[0, 0:3] data[: ,0:3] data[0:2, 4:10] data[:, :3] data[:, 35:] data[:, -1] data[:, -2] data[:, :-2] print(numpy.mean(data)) numpy.std(data) print("sadsdf", 12, 456, 345) numpy.mean(data, 12) import time print(time.ctime()) print(time.ctime) numpy.mean() maxval, minval, stdval = numpy.amax(data), numpy.amin(data), numpy.std(data) maxval = numpy.amax(data) minval = numpy.amin(data) stdval = numpy.std(data) print("max inflammation:", maxval) print("min inflammation:", minval) print("stdev inflammation:", stdval) help(numpy.abs) patient_0 = data[0, :] patient_0 patient_max, patient_min = numpy.amax(patient_0), numpy.amin(patient_0) print("max inflammation ptient 0:", patient_max) print("min inflammation ptient 0:", patient_min) print(numpy.mean(data, axis=0)) print(numpy.mean(data, axis=0).shape) print(numpy.mean(data, axis=1)) print(numpy.mean(data, axis=1).shape) hello = "Hello world!" hello[0] hello[-1] hello[0:5] hello[:5] hello[6:] hello[6:-1] ``` :::success :pencil: **Change In Inflammation** The patient data is _longitudinal_ in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept. Let's find out how to calculate changes in the data contained in an array with NumPy. The `numpy.diff()` function takes an array and returns the differences between two successive values. Let's use it to examine the changes each day across the first week of patient 3 from our inflammation dataset. ```python! patient3_week1 = data[3, :7] print(patient3_week1) ``` ``` [0. 0. 2. 0. 4. 2. 2.] ``` Calling `numpy.diff(patient3_week1)` would do the following calculations ```python! [ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ] ``` and return the 6 difference values in a new array. ```python! numpy.diff(patient3_week1) ``` ``` array([ 0., 2., -2., 4., -2., 0.]) ``` Note that the array of differences is shorter by one element (length 6). When calling `numpy.diff` with a multi-dimensional array, an `axis` argument may be passed to the function to specify which axis to process. When applying `numpy.diff` to our 2D inflammation array `data`, which axis would we specify? <details> <summary>Solution</summary> Since the row axis (0) is patients, it does not make sense to get the difference between two arbitrary patients. The column axis (1) is in days, so the difference is the change in inflammation -- a meaningful concept. ```python! numpy.diff(data, axis=1) ``` </details> If the shape of an individual data file is `(60, 40)` (60 rows and 40 columns), what would the shape of the array be after you run the `diff()` function and why? <details> <summary>Solution</summary> > The shape will be `(60, 39)` because there is one fewer difference between > columns than there are columns in the data. </details> How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease? <details> <summary>Solution</summary> By using the `numpy.max()` function after you apply the `numpy.diff()` function, you will get the largest difference between days. ```python! numpy.max(numpy.diff(data, axis=1), axis=1) ``` ```python! array([ 7., 12., 11., 10., 11., 13., 10., 8., 10., 10., 7., 7., 13., 7., 10., 10., 8., 10., 9., 10., 13., 7., 12., 9., 12., 11., 10., 10., 7., 10., 11., 10., 8., 11., 12., 10., 9., 10., 13., 10., 7., 7., 10., 13., 12., 8., 8., 10., 10., 9., 8., 13., 10., 7., 10., 8., 12., 10., 7., 12.]) ``` If inflammation values *decrease* along an axis, then the difference from one element to the next will be negative. If you are interested in the **magnitude** of the change and not the direction, the `numpy.absolute()` function will provide that. Notice the difference if you get the largest _absolute_ difference between readings. ```python! numpy.max(numpy.absolute(numpy.diff(data, axis=1)), axis=1) ``` ```python! array([ 12., 14., 11., 13., 11., 13., 10., 12., 10., 10., 10., 12., 13., 10., 11., 10., 12., 13., 9., 10., 13., 9., 12., 9., 12., 11., 10., 13., 9., 13., 11., 11., 8., 11., 12., 13., 9., 10., 13., 11., 11., 13., 11., 13., 13., 10., 9., 10., 10., 9., 9., 13., 10., 9., 10., 11., 13., 10., 10., 12.]) ``` </details> ::: ### 3. Visualizing Tabular Data ```python! import numpy data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') import matplotlib.pyplot image = matplotlib.pyplot.imshow(data) matplotlib.pyplot.show() ave_inflammation = numpy.mean(data, axis=0) ave_plot = matplotlib.pyplot.plot(ave_inflammation) matplotlib.pyplot.show() ave_inflammation max_plot = matplotlib.pyplot.plot(numpy.amax(data, axis=0)) matplotlib.pyplot.show() min_plot = matplotlib.pyplot.plot(numpy.amin(data, axis=0)) matplotlib.pyplot.show() fig = matplotlib.pyplot.figure(figsize=(10.0,3.0)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.plot(numpy.mean(data, axis=0)) axes2.plot(numpy.amax(data, axis=0)) axes3.plot(numpy.amin(data, axis=0)) axes1.set_ylabel('average') axes2.set_ylabel('max') axes3.set_ylabel('min') axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1) axes1.set_xlabel('time') axes2.set_xlabel('time') axes3.set_xlabel('time') fig.tight_layout() matplotlib.pyplot.savefig('inflammation.png') matplotlib.pyplot.show() import matplotlib.pyplot as plt import numpy as np ``` :::success :pencil: **Plot Scaling** Why do all of our plots stop just short of the upper end of our graph? <details> <summary>Solution</summary> Because matplotlib normally sets x and y axes limits to the min and max of our data (depending on data range). </details> If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes', for example: ```python! axes3.set_ylim(0,6) ``` Update your plotting code to automatically set a more appropriate scale (Hint: you can make use of the `max` and `min` methods to help.) <details> <summary>Solution</summary> ```python! # One method axes3.set_ylabel('min') axes3.plot(numpy.min(data, axis=0)) axes3.set_ylim(0,6) ``` </details> <details> <summary>Solution</summary> ```python! # A more automated approach min_data = numpy.min(data, axis=0) axes3.set_ylabel('min') axes3.plot(min_data) axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1) ``` </details> ::: :::success :pencil: **Drawing Straight Lines** In the center and right subplots above, we expect all lines to look like step functions because non-integer value are not realistic for the minimum and maximum values. However, you can see that the lines are not always vertical or horizontal, and in particular the step function in the subplot on the right looks slanted. Why is this? <details> <summary>Solution</summary> Because matplotlib interpolates (draws a straight line) between the points. One way to do avoid this is to use the Matplotlib `drawstyle` option: ```python! import numpy import matplotlib.pyplot data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.set_ylabel('average') axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid') axes2.set_ylabel('max') axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid') axes3.set_ylabel('min') axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid') fig.tight_layout() matplotlib.pyplot.show() ``` ![Three line graphs, with step lines connecting the points, showing the daily average, maximum and minimum inflammation over a 40-day period.](https://sharemd.nikhef.nl/uploads/dc9df12d-6d2d-44b3-9acc-30716ded253a.svg) </details> ::: ### 4. Storing Multiple Values in Lists ```python! odds = [1, 3, 5, 7] print("odds aew:", odds) print("first:", odds[0]) print("last:", odds[3]) print("last:", odds[-1]) names = ['Curie', 'Darwing', 'Turing'] # typo in Darwin's name print('names is originally:', names) names[1] = 'Darwin' # correct the name print('final value of names:', names) name = 'Darwin' name[0] = 'd' mild_salsa = ['peppers', 'unions', 'cilantro', 'tomatoes'] hot_salsa = mild_salsa hot_salsa[0] = ['hot peppers'] print('Ingredients in mild salsa:', mild_salsa) print('Ingredients in hot salsa:', hot_salsa) mild_salsa = ['peppers', 'onions', 'cilantro', 'tomatoes'] hot_salsa = list(mild_salsa) hot_salsa[0] = 'hot peppers' print('Ingredients in mild salsa:', mild_salsa) print('Ingredients in hot salsa:', hot_salsa) hot_salsa = mild_salsa.copy() nparray_a = np.zeros(10) nparray_b = nparray_a nparray_b[0] = 1 print("Array a:", nparray_a) print("Array b:", nparray_b) veg = [['lettuce', 'lettuce', 'peppers', 'zucchini'], ['lettuce', 'lettuce', 'peppers', 'zucchini'], ['lettuce', 'cilantro', 'peppers', 'zucchini']] veg print(veg[2]) print(veg[2][0]) sample_ages = [10, 12.5, 'unknown'] sample_ages veg = [['lettuce', 'lettuce', 'peppers', 'zucchini'], 'lettuce', ['lettuce', 'cilantro']] print(veg) veg = [['lettuce', 'lettuce', 'peppers', 'zucchini'], ['lettuce', 'lettuce', 'peppers', 'zucchini'], ['lettuce', 'cilantro', 'peppers', 'zucchini']] odds.append(11) print('odds after appending a value', odds) removed_element = odds.pop() print('odds after removing:', odds) print('removed item:', removed_element) odds.reverse() print("Odds after reversing:", odds) primes = odds primes.append(2) print('primes:', primes) print('odds:', odds) binomial_name = 'Drosophila melanogaster' group = binomial_name[0:10] print("group:", group) species = binomial_name[11:23] print('species:', melanogaster) chromosomes = ['X', 'Y', '2', '3', '4'] autosomes = chromosomes[2:5] print('autosomes:', autosomes) autosomes[0], '1' print('autosomes:', autosomes) print('chromosomes:', chromosomes) ``` :::success :pencil: **Slicing From the End** Use slicing to access only the last four characters of a string or entries of a list. ```python! string_for_slicing = 'Observation date: 02-Feb-2013' list_for_slicing = [['fluorine', 'F'], ['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']] ``` ``` '2013' [['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']] ``` Would your solution work regardless of whether you knew beforehand the length of the string or list (e.g. if you wanted to apply the solution to a set of lists of different lengths)? If not, try to change your approach to make it more robust. Hint: Remember that indices can be negative as well as positive <details> <summary>Solution</summary> Use negative indices to count elements from the end of a container (such as list or string): ```python! string_for_slicing[-4:] list_for_slicing[-4:] ``` </details> ::: :::success :pencil: **Overloading** `+` usually means addition, but when used on strings or lists, it means "concatenate". Given that, what do you think the multiplication operator `*` does on lists? In particular, what will be the output of the following code? ```python! counts = [2, 4, 6, 8, 10] repeats = counts * 2 print(repeats) ``` 1. `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]` 2. `[4, 8, 12, 16, 20]` 3. `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]` 4. `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]` The technical term for this is *operator overloading*: a single operator, like `+` or `*`, can do different things depending on what it's applied to. <details> <summary>Solution</summary> The multiplication operator `*` used on a list replicates elements of the list and concatenates them together: ``` [2, 4, 6, 8, 10, 2, 4, 6, 8, 10] ``` It's equivalent to: ```python! counts + counts ``` </details> ::: ### 5. Repeating Actions with Loops ```python! odds = [1, 3, 5, 7] print(odds[0]) print(odds[1]) print(odds[2]) print(odds[4]) for num in odds: print(num) for banana in odds: print(banana) for num in odds: print(num) print(num+1) print('outside of the for loop') length = 0 names = ['Curie', 'Darwin', 'Turing'] for value in names: length = length + 1 print("There are", length, "items in th list") name = 'Rosalind' for name in names: print(name) print("After the loop, name is:", name) print("The length of the list is:" len(names)) ``` :::success :pencil: **From 1 to N** Python has a built-in function called `range` that generates a sequence of numbers. `range` can accept 1, 2, or 3 parameters. * If one parameter is given, `range` generates a sequence of that length, starting at zero and incrementing by 1. For example, `range(3)` produces the numbers `0, 1, 2`. * If two parameters are given, `range` starts at the first and ends just before the second, incrementing by one. For example, `range(2, 5)` produces `2, 3, 4`. * If `range` is given 3 parameters, it starts at the first one, ends just before the second one, and increments by the third one. For example, `range(3, 10, 2)` produces `3, 5, 7, 9`. Using `range`, write a loop that uses `range` to print the first 3 natural numbers: ```python! 1 2 3 ``` <details> <summary>Solution</summary> ```python! for number in range(1, 4): print(number) ``` </details> ::: :::success :pencil: ***Optional:* Computing the Value of a Polynomial** The built-in function `enumerate` takes a sequence (e.g. a [list]({{ page.root }}/04-lists/)) and generates a new sequence of the same length. Each element of the new sequence is a pair composed of the index (0, 1, 2,...) and the value from the original sequence: ```python! for idx, val in enumerate(a_list): # Do something using idx and val ``` The code above loops through `a_list`, assigning the index to `idx` and the value to `val`. Suppose you have encoded a polynomial as a list of coefficients in the following way: the first element is the constant term, the second element is the coefficient of the linear term, the third is the coefficient of the quadratic term, etc. ```python! x = 5 coefs = [2, 4, 3] y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2 print(y) ``` ``` 97 ``` Write a loop using `enumerate(coefs)` which computes the value `y` of any polynomial, given `x` and `coefs`. <details> <summary>Solution</summary> ```python! y = 0 for idx, coef in enumerate(coefs): y = y + coef * x**idx ``` </details> ::: ### 6. Analyzing Data from Multiple Files ```python! import glob print(glob.glob("inflammation*.csv")) filenames = sorted(glob.glob("inflammation*.csv")) for filename in filenames[:3]: print("filename:", filename) data = np.loadtxt(filename, delimeter=',') fig = plt.figure(figsize=(10.0, 3.0)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.set_ylabel('avg') axes1.plot(np.mean(data, axis=0)) axes2.set_ylabel('max') axes2.plot(np.amax(data, axis=0)) axes3.set_ylabel('min') axes3.plot(np.amin(data, axis=0)) fig.tight_layout() matplotlib.pyplot.show() ``` :::success :pencil: **Plotting Differences** Plot the difference between the average inflammations reported in the first and second datasets (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly), i.e., the difference between the leftmost plots of the first two figures. <details> <summary>Solution</summary> ```python! import glob import numpy import matplotlib.pyplot filenames = sorted(glob.glob('inflammation*.csv')) data0 = numpy.loadtxt(fname=filenames[0], delimiter=',') data1 = numpy.loadtxt(fname=filenames[1], delimiter=',') fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) matplotlib.pyplot.ylabel('Difference in average') matplotlib.pyplot.plot(numpy.mean(data0, axis=0) - numpy.mean(data1, axis=0)) fig.tight_layout() matplotlib.pyplot.show() ``` </details> ::: ### 7. Making Choices ```python! num = 24 if num > 100: print("it's greater than 100") else: print("it's samller or equal than 100") print("bye") num = 1 if num == 1: print("equal to 1") elif num > 0: print("greater than 0") elif num == 0: print("equal to 0") else: print("smaller than 0") print(num == 1) print(num != 1) num = 0 if( num > 0) or (num < 0): print("it's not zero") if(num != 0): print("it's not zero") print((num > 0) or (num < 0)) ``` ```python! import numpy data = np.loadtxt("inflammation-01.csv", delimiter=',') maxima = np.amax(data, axis=0) max_inflammation_0 = maxima[0] max_inflammation_20 = maxima[20] if max_inflammation_0 == 0 and max_inflammation_20 == 20: print("something is off") elif np.sum(np.amin(data, axis = 0)): print("hmm minima add up to zero") else: print("data should be ok") ``` :::success :pencil: **How Many Paths?** Consider this code: ```python! if 4 > 5: print('A') elif 4 == 5: print('B') elif 4 < 5: print('C') ``` Which of the following would be printed if you were to run this code? Why did you pick this answer? 1. A 2. B 3. C 4. B and C <details> <summary>Solution</summary> C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true, but `4 < 5` is true. </details> ::: :::success :pencil: **What Is Truth?** `True` and `False` booleans are not the only values in Python that are true and false. In fact, *any* value can be used in an `if` or `elif`. After reading and running the code below, explain what the rule is for which values are considered true and which are considered false. ```python! if '': print('empty string is true') if 'word': print('word is true') if []: print('empty list is true') if [1, 2, 3]: print('non-empty list is true') if 0: print('zero is true') if 1: print('one is true') ``` ::: :::success :pencil: **In-Place Operators** Python (and most other languages in the C family) provides [in-place operators]({{ page.root }}/reference.html#in-place-operators) that work like this: ```python! x = 1 # original value x += 1 # add one to x, assigning result back to x x *= 3 # multiply x by 3 print(x) ``` ``` 6 ``` Write some code that sums the positive and negative numbers in a list separately, using in-place operators. Do you think the result is more or less readable than writing the same without in-place operators? <details> <summary>Solution</summary> ```python! positive_sum = 0 negative_sum = 0 test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8] for num in test_list: if num > 0: positive_sum += num elif num == 0: pass else: negative_sum += num print(positive_sum, negative_sum) ``` Here `pass` means "don't do anything". </details> ::: ### 8. Creating Functions ```python! fahrenheit_val = 99 celsius_val = ((fahrenheit_val - 32) * (5/9)) def fahrenheit_to_celsius(temp_f): celsius_val = ((temp_f - 32) * (5/9)) return celsius_val print(fahrenheit_to_celsius(99)) print(fahrenheit_to_celsius(32)) print(fahrenheit_to_celsius(-32)) def celsius_to_kelvin(temp_c): return temp_c + 273.15 def fahrenheit_to_kelvin(temp_f): temp_c = fahrenheit_to_celsius(temp_f) temp_k = celsius_to_kelvin(temp_c) return(temp_k) print(celsius_to_kelvin(0)) print(fahrenheit_to_kelvin(32)) some_value = 12 def fahrenheit_to_kelvin(temp_f): temp_c = fahrenheit_to_celsius(temp_f) temp_k = celsius_to_kelvin(temp_c) print(some_value) return(temp_k) ``` ```python! def visualize(filename): data = numpy.loadtxt(fname=filename, delimiter=',') fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.set_ylabel('average') axes1.plot(numpy.mean(data, axis=0)) axes2.set_ylabel('max') axes2.plot(numpy.amax(data, axis=0)) axes3.set_ylabel('min') axes3.plot(numpy.amin(data, axis=0)) fig.tight_layout() matplotlib.pyplot.show() ``` ```python! def detect_problems(filename): data = numpy.loadtxt(fname=filename, delimiter=',') if numpy.amax(data, axis=0)[0] == 0 and numpy.amax(data, axis=0)[20] == 20: print('Suspicious looking maxima!') elif numpy.sum(numpy.amin(data, axis=0)) == 0: print('Minima add up to zero!') else: print('Seems OK!') ``` ```python! filenames = sorted(glob.glob('inflammation*.csv')) for filename in filenames[:3]: print(filename) visualize(filename) detect_problems(filename) ``` ```python! visualise(filename) detect_problems(filename) ``` ```python! help(detect_problems) def detect_problems(filename): """Given a filename, load the data and check for abnormalities. Loads a data file from the file system, parses it as CSV, and checks for abnormalities on the first and 20th day. filename should ber a string. """ data = numpy.loadtxt(fname=filename, delimiter=',') if numpy.amax(data, axis=0)[0] == 0 and numpy.amax(data, axis=0)[20] == 20: print('Suspicious looking maxima!') elif numpy.sum(numpy.amin(data, axis=0)) == 0: print('Minima add up to zero!') else: print('Seems OK!') help(detect_problems) ``` ```python! detect_problems() def offset_mean(data, target_mean_value=0.0): return (data - np.mean(data) + target_mean_value) print(data) print(offset_mean(data)) print(offset_mean(data, target_mean_value=12)) ``` :::success :pencil: **Combining Strings** "Adding" two strings produces their concatenation: `'a' + 'b'` is `'ab'`. Write a function called `fence` that takes two parameters called `original` and `wrapper` and returns a new string that has the wrapper character at the beginning and end of the original. A call to your function should look like this: ```python! print(fence('name', '*')) ``` ``` *name* ``` <details> <summary>Solution</summary> ```python! def fence(original, wrapper): return wrapper + original + wrapper ``` </details> ::: :::success :pencil: **Return versus print** Note that `return` and `print` are not interchangeable. `print` is a Python function that *prints* data to the screen. It enables us, *users*, see the data. `return` statement, on the other hand, makes data visible to the program. Let's have a look at the following function: ```python! def add(a, b): print(a + b) ``` **Question**: What will we see if we execute the following commands? ```python! A = add(7, 3) print(A) ``` <details> <summary>Solution</summary> Python will first execute the function `add` with `a = 7` and `b = 3`, and, therefore, print `10`. However, because function `add` does not have a line that starts with `return` (no `return` "statement"), it will, by default, return nothing which, in Python world, is called `None`. Therefore, `A` will be assigned to `None` and the last line (`print(A)`) will print `None`. As a result, we will see: ``` 10 None ``` </details> ::: ### 9. Errors and Exceptions ```python! def favorite_ice_cream(): ice_creams = [ 'chocolate', 'vanilla', 'strawberry' ] print(ice_creams[3]) favorite_ice_cream() def some_function() msg = 'hello, world!' print(msg) return msg def some_function(): msg = 'hello, world!' print(msg) return msg print(a) hello() number = 12 print(Number) handle = open("some_file_doesnt_exist.txt", delimiter=",") handle = open("inflammation_new.csv", "w") hand.read() ``` :::success :pencil: **Identifying Syntax Errors** 1. Read the code below, and (without running it) try to identify what the errors are. 2. Run the code, and read the error message. Is it a `SyntaxError` or an `IndentationError`? 3. Fix the error. 4. Repeat steps 2 and 3, until you have fixed all the errors. ```python! def another_function print('Syntax errors are annoying.') print('But at least Python tells us about them!') print('So they are usually not too hard to fix.') ``` <details> <summary>Solution</summary> `SyntaxError` for missing `():` at end of first line, `IndentationError` for mismatch between second and third lines. A fixed version is: ```python! def another_function(): print('Syntax errors are annoying.') print('But at least Python tells us about them!') print('So they are usually not too hard to fix.') ``` </details> ::: :::success :pencil: **Identifying Variable Name Errors** 1. Read the code below, and (without running it) try to identify what the errors are. 2. Run the code, and read the error message. What type of `NameError` do you think this is? In other words, is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not? 3. Fix the error. 4. Repeat steps 2 and 3, until you have fixed all the errors. ```python! for number in range(10): # use a if the number is a multiple of 3, otherwise use b if (Number % 3) == 0: message = message + a else: message = message + 'b' print(message) ``` <details> <summary>Solution</summary> 3 `NameError`s for `number` being misspelled, for `message` not defined, and for `a` not being in quotes. Fixed version: ```python! message = '' for number in range(10): # use a if the number is a multiple of 3, otherwise use b if (number % 3) == 0: message = message + 'a' else: message = message + 'b' print(message) ``` </details> ::: :::success :pencil: **Identifying Index Errors** 1. Read the code below, and (without running it) try to identify what the errors are. 2. Run the code, and read the error message. What type of error is it? 3. Fix the error. ```python! seasons = ['Spring', 'Summer', 'Fall', 'Winter'] print('My favorite season is ', seasons[4]) ``` <details> <summary>Solution</summary> `IndexError`; the last entry is `seasons[3]`, so `seasons[4]` doesn't make sense. A fixed version is: ```python! seasons = ['Spring', 'Summer', 'Fall', 'Winter'] print('My favorite season is ', seasons[-1]) ``` </details> ::: ### 10. Defensive Programming ```python! numbers = [1.5, 2.3, 0.7, -0.001, 4.4] total = 0.0 for num in numbers: assert num > 0.0, 'Data should only contain positive values' total += num print('total is:', total) ``` ``` python! def normalize_rectangle(rect): """Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis. Input should be of the format (x0, y0, x1, y1).""" assert len(rect) == 4, 'Rectangles must contain 4 coordinates' x0, y0, x1, y1 = rect assert x0 < x1, 'Invalid X coordinates' assert y0 < y1, 'Invalid Y coordinates' dx = x1 - x0 dy = y1 - y0 if dx > dy: scaled = dy / dx upper_x, upper_y = 1.0, scaled else: scaled = dx / dy upper_x, upper_y = scaled, 1.0 assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid' assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid' return (0, 0, upper_x, upper_y) print(normalize_rectangle( (0.0, 1.0, 2.0) )) print(normalize_rectangle( (4.0, 1.0, 2.0, 5.0) )) print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) )) print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) )) ``` ```python! def range_overlap(ranges): pass assert range_overlap([(0.0, 1.0)]) == (0.0, 1.0) assert range_overlap([(2.0, 3.0), (2.0, 4.0)]) == (2.0, 3.0) assert range_overlap([(0.0, 1.0), (0.0, 2.0), (-2.0, 1.0)]) == (0.0, 1.0) assert range_overlap([(0.0, 1.0), (5.0,6.0)]) == None assert range_overlap([(0.0, 1.0), (1.0,2.0)]) == None def range_overlap(ranges): """Return common overlap among a set of (left, right) ranges.""" max_left = -np.Inf max_right = np.Inf for value in ranges: max_left = max(max_left, left) min_right = min(min_right, right) return(max_left, min_right) def test_range_overlap(): assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0) assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0) assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0) assert range_overlap([]) == None test_range_overlap() ``` :::success :pencil: **Pre- and Post-Conditions** Suppose you are writing a function called `average` that calculates the average of the numbers in a list. What pre-conditions and post-conditions would you write for it? Compare your answer to your neighbor's: can you think of a function that will pass your tests but not his/hers or vice versa? <details> <summary>Solution</summary> ```python! # a possible pre-condition: assert len(input_list) > 0, 'List length must be non-zero' # a possible post-condition: assert numpy.min(input_list) <= average <= numpy.max(input_list), 'Average should be between min and max of input values (inclusive)' ``` </details> ::: ### 11. Debugging :::success :pencil: **Not Supposed to be the Same** You are assisting a researcher with Python code that computes the Body Mass Index (BMI) of patients. The researcher is concerned because all patients seemingly have unusual and identical BMIs, despite having different physiques. BMI is calculated as **weight in kilograms** divided by the square of **height in metres**. Use the debugging principles in this exercise and locate problems with the code. What suggestions would you give the researcher for ensuring any later changes they make work correctly? ```python! patients = [[70, 1.8], [80, 1.9], [150, 1.7]] def calculate_bmi(weight, height): return weight / (height ** 2) for patient in patients: weight, height = patients[0] bmi = calculate_bmi(height, weight) print("Patient's BMI is:", bmi) ``` ``` Patient's BMI is: 0.000367 Patient's BMI is: 0.000367 Patient's BMI is: 0.000367 ``` <details> <summary>Solution</summary> * The loop is not being utilised correctly. `height` and `weight` are always set as the first patient's data during each iteration of the loop. * The height/weight variables are reversed in the function call to `calculate_bmi(...)`, the correct BMIs are 21.604938, 22.160665 and 51.903114. </details> ::: # How to continue? * Please, complete the [post-workshop survey](https://carpentries.typeform.com/to/UgVdRQ?slug=2023-09-26-Nikhef) * Practice * Materials at software-carpentry.org and Nikhef Knowledge Base ([:shell:](https://kb.nikhef.nl/computing-course/shell-novice/) [:octocat:](https://kb.nikhef.nl/computing-course/git-novice/) [:snake:](https://kb.nikhef.nl/computing-course/python-novice/) ) * Learn more * [NWO-I Digital Competence Center](https://nwo-i.nl/dcc) * [Netherlands eScience Center](https://www.esciencecenter.nl) (subscribe to their newsletter) * Browse [Carpentries](https://carpentries.org) website * Particular [The Incubator](https://carpentries-incubator.org) (mind different levels of maturity)