From ae64b99041c30063f791c41a959ba895ab713597 Mon Sep 17 00:00:00 2001 From: wk9874 <117366764+wk9874@users.noreply.github.com> Date: Tue, 22 Apr 2025 11:11:46 +0100 Subject: [PATCH 1/2] Update section_3_software_dev_process.md --- slides/section_3_software_dev_process.md | 530 ++++++----------------- 1 file changed, 135 insertions(+), 395 deletions(-) diff --git a/slides/section_3_software_dev_process.md b/slides/section_3_software_dev_process.md index c75677886..6f74ead68 100644 --- a/slides/section_3_software_dev_process.md +++ b/slides/section_3_software_dev_process.md @@ -552,7 +552,7 @@ Regardless of doing Object Oriented Programming or Functional Programming ## ☕ 10 Minute Break ☕ - + ## Refactoring @@ -567,7 +567,11 @@ Regardless of doing Object Oriented Programming or Functional Programming ## Refactoring -Refactoring is vital for improving code quality. +Refactoring is vital for improving code quality. It might include things such as: +* Code decoupling and abstractions +* Renaming variables +* Reorganising functions to avoid code duplication +* Simplifying conditional statements to improve readability @@ -578,12 +582,11 @@ Often working on existing software - refactoring is how we improve it ## Refactoring Loop -When making a change to a piece of software, do the following: +When refactoring a piece of software, a good process to follow is: -* Automated tests verify current behaviour -* Refactor code (so new change slots in cleanly) -* Re-run tests to ensure nothing is broken -* Make the desired change, which now fits in easily. +* Make sure you have tests that verify the current behaviour +* Refactor the code +* Re-run tests to verify thr behavour of the code is unchanged @@ -591,101 +594,74 @@ When making a change to a piece of software, do the following: ## Refactoring -Rest of section we will learn how to refactor an existing piece of code - - -```python +In the rest of section we will learn how to refactor an existing piece of code. We need to: -``` - - -In the process of refactoring, we will try to target some of the "good practices" we just talked about, like making good abstractions and reducing cognitive load. +* Add more tests so we can be more confident that future changes will not break the existing code. +* Further split analyse_data() function into a number of smaller and more decoupled functions - - -## Refactoring Exercise - -Look at `inflammation/compute_data.py` - -Bring up the code +When refactoring, first we need to make sure there are tests in place that can verify the code behaviour as it is now (or write them if they are missing), then refactor the code and, finally, check that the original tests still pass. -Explain the feature: -In it, if the user adds --full-data-analysis then the program will scan the directory of one of the provided files, compare standard deviations across the data by day and plot a graph. - -The main body of it exists in inflammation/compute_data.py in a function called analyse_data. +In the process of refactoring, we will try to target some of the "good practices" we just talked about, like making good abstractions and reducing cognitive load. -## Key Points - -> "Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do." +## Writing Regression Tests +Look at the `analyse_data` function within `inflammation/compute_data.py`: +```python +def analyse_data(data_dir): + data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) + if len(data_file_paths) == 0: + raise ValueError(f"No inflammation data CSV files found in path {data_dir}") + data = map(models.load_csv, data_file_paths) - -## ☕ 5 Minute Break ☕ - - - -## Refactoring Functions to do Just One Thing - - -## Introduction + means_by_day = map(models.daily_mean, data) + means_by_day_matrix = np.stack(list(means_by_day)) -Functions that just do one thing are: + daily_standard_deviation = np.std(means_by_day_matrix, axis=0) -* Easier to test -* Easier to read -* Easier to re-use + graph_data = { + 'standard deviation by day': daily_standard_deviation, + } + views.visualize(graph_data) +``` - -We identified last episode that the code has a function that does many more than one thing - -Hard to understand - high cognitive load - -Hard to test as mixed lots of different things together - -Hard to reuse as was very fixed in its behaviour. - +Bring up the code - -## Test Before Refactoring +Explain the feature: +When using inflammation-analysis.py if the user adds --full-data-analysis then the program will scan the directory of one of the provided files, compare standard deviations across the data by day and plot a graph. -* Write tests *before* refactoring to ensure we do not change behaviour. +The main body of it exists in inflammation/compute_data.py in a function called analyse_data. +We want to add extra regression tests to this function. Firstly, modify the function to return the data instead of visualise it so that it is easier to automatically test. Next, we will add assert statements that verify that the current outcome always remains the same, rather than checking if it is *correct* or not. These are called regression tests. -## Writing Tests for Code that is Hard to Test -What can we do? +## Writing Regression Tests -* Test at a higher level, with coarser accuracy -* Write "hacky" temporary tests +Add a new test file called `test_compute_data.py` in the `tests` folder and add a regression test to verify the current output of `analyse_data()`. +Remember that this is a *regression test* to check that we don't break our code during refactoring, and so ensure that this result remains unchanged. It does *not* necessarily check that the result is correct. - -Think of hacky tests like scaffolding - we will use them to ensure we can do the work safely, -but we will remove them in the end. - - - -## Exercise: Write a Regression Test for Analyse Data Before Refactoring - -Add a new test file called `test_compute_data.py` in the tests folder. There is more information on the relevant web page. -Complete the regression test to verify the current output of analyse_data is unchanged by the refactorings we are going to do. - -Time: 10min - +```python +def test_analyse_data(): + from inflammation.compute_data import analyse_data + path = Path.cwd() / "../data" + data_source = CSVDataSource(path) + result = analyse_data(data_source) + # TODO: add assert statement(s) to test the result value is as expected +``` Hint: You might find it helpful to assert the results equal some made up array, observe the test failing and copy and paste the correct result into the test. @@ -696,6 +672,17 @@ When talking about the solution: * Brittle - changing the files will break the tests + +## Refactoring Functions to only do One Thing +Functions which just do one thing are: + +* Easier to test +* Easier to read +* Easier to re-use + +Ideally we want to create 'pure functions', which work like a mathematical function - they take some input, and produce an output. They do not rely on any information other than the inputs provided, and do not cause any side effects. + + ## Pure Functions @@ -704,18 +691,12 @@ A **pure function** takes in some inputs as parameters, and it produces a consis That is, just like a mathematical function. -The output does not depend on externalities. +The output does not depend on externalities, such as global variables. -There will be no side effects from running the function +There will be no side effects from running the function, eg it wont edit any files or modify global variables/ - -Externalities like what is in a database or the time of day - -Side effects like modifying a global variable or writing a file - - ## Pure Functions @@ -765,14 +746,28 @@ Time: 10min +```python +@pytest.mark.parametrize('data,expected_output', [ + ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), + ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), + ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) +], +ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) +def test_compute_standard_deviation_by_day(data, expected_output): + from inflammation.compute_data import compute_standard_deviation_by_data + + result = compute_standard_deviation_by_data(data) + npt.assert_array_almost_equal(result, expected_output) +``` + ## Functional Programming -Pure functions are a concept from an approach to programming called **functional programming**. +Pure functions are a concept from an approach to programming called **functional programming**, where programs are constructed by chaining together these pure functions. -Python, and other languages, provide features that make it easier to write "functional" code: +Writing code in this way is particularly useful for data processing and analysis, or translating data from one format to another. - * `map` / `filter` / `reduce` can be used to chain pure functions together into pipelines +We have so far mostly focussed on Procedural Programming, where a series of sequential steps are performed in a specific order. Different programming paradigms have different strengths and weaknesses, and are useful to solve different types of problems. @@ -817,380 +812,125 @@ total = sum(map(squared, filter(is_even, numbers))) ## Architecting Code to Separate Responsibilities - -## Using Classes to Decouple Code - - -### Decoupled Code - -When thinking about code, we tend to think of it in distinct parts or **units**. - -Two units are **decoupled** if changes in one can be made independently of the other - - - - -E.g we have the part that loads a file and the part that draws a graph - -Or the part that the user interacts with and the part that does the calculations - - - -### Decoupled Code - -Abstractions allow decoupling code - - +Recall that we are using a Model-View-Controller architecture in our project, which are located in: - -When we have a suitable abstraction, we do not need to worry about the inner workings of the other part. +* **Model**: `inflammation/models.py` +* **View**: `inflammation/views.py` +* **Controller**: `inflammation-analysis.py` -For example break of a car, the details of how to slow down are abstracted, so when we change how -breaking works, we do not need to retrain the driver. +But the code we were previously analysing was added in a separate script `inflammation/compute_data.py` and contains a mix of all three. -### Exercise: Decouple the File Loading from the Computation - -Currently the function is hard coded to load all the files in a directory. - -Decouple this into a separate function that returns all the files to load - -Time: 10min - - +### Exercise: Identify Model, View and Controller - -### Decoupled... but not completely +Looking at the code inside compute_data.py, what parts could be considered Model, View and Controller code? -Although we have separated out the data loading, there is still an assumption and therefore coupling in terms of the format of that data (in this case CSV). +Time: 5min -Is there a way we could make this more flexible? -- The format of the data stored is a practical detail which we don't want to limit the use of our `analyse_data()` function -- We could add an argument to our function to specify the format, but then we might have quite a long conditional list of all the different possible formats, and the user would need to request changes to `analyse_data()` any time they want to add a new format -- Is there a way we can let the user more flexibly specify the way in which their data gets read? +Computing the standard deviation belongs to Model. +Reading the data from CSV files also belongs to Model. +Displaying of the output as a graph is View. +The logic that processes the supplied files is Controller. - -One way is with **classes**! - -### Python Classes - -A **class** is a Python feature that allows grouping methods (i.e. functions) with some data. - - - - -Do some live coding, ending with: - -```python -import math +### Exercise: Split Out Model, View and Controller -class Circle: - def __init__(self, radius): - self.radius = radius - - def get_area(self): - return math.pi * self.radius * self.radius - -my_circle = Circle(10) -print(my_circle.get_area()) -``` - - - - -### Exercise: Use a Class to Configure Loading - -Put the `load_inflammation_data` function we wrote in the last exercise as a member method of a new class called `CSVDataSource`. - -Put the configuration of where to load the files in the class' initialiser. - -Once this is done, you can construct this class outside the the statistical analysis and pass the instance in to analyse_data. +Refactor analyse_data() function so that the Model, View and Controller code we identified in the previous exercise is moved to appropriate modules. Time: 10min -### Interfaces +### Merge the Feature In -**Interfaces** describe how different parts of the code interact with each other. +Hopefully you have now refactored the feature to conform to our MVC structure, and ran our regression tests to check that the outputs rermain the same. - +We can commit this to our branch, and then switch to the `develop` branch and merge it in. - -For example, the interface of the breaking system in a car, is the break pedal. -The user can push the pedal harder or softer to get more or less breaking. -The interface of our circle class is the user can call get_area to get the 2D area of the circle -as a number. - - - -### Interfaces - -Question: what is the interface for CSVDataSource - -```python -class CSVDataSource: - """ - Loads all the inflammation csvs within a specified folder. - """ - def __init__(self, dir_path): - self.dir_path = dir_path - - def load_inflammation_data(self): - data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) - if len(data_file_paths) == 0: - raise ValueError(f"No inflammation csv's found in path {self.dir_path}") - data = map(models.load_csv, data_file_paths) - return list(data) +```bash +$ git switch develop +$ git merge full-data-analysis ``` - -Suggest discuss in groups for 1min. - -Answer: the interface is the signature of the `load_inflammation_data()` method, i.e. what arguments it takes and what it returns. - - -### Common Interfaces - -If we have two classes that share the same interface, we can use the interface without knowing which class we have - - - - -Easiest shown with an example, lets do more live coding: - +## Controller Structure +The structure of our controller is as follows: ```python -class Rectangle(Shape): - def __init__(self, width, height): - self.width = width - self.height = height - def get_area(self): - return self.width * self.height - -my_circle = Circle(radius=10) -my_rectangle = Rectangle(width=5, height=3) -my_shapes = [my_circle, my_rectangle] -total_area = sum(shape.get_area() for shape in my_shapes) -``` - - - - -### Polymorphism - -Using an interface to call different methods is a technique known as **polymorphism**. - -A form of abstraction - we have abstracted what kind of shape we have. - - - - -### Exercise: Introduce an alternative implementation of DataSource +# import modules -Polymorphism is very useful - suppose we want to read a JSON (JavaScript Object Notation) file. +def main(args): + # perform some actions -Write a class that has the same interface as `CSVDataSource` that -loads from JSON. - -There is a function in `models.py` that loads from JSON. - -Time: 15min - - - - -Remind learners to check the course webpage for further details and some important hints. +if __name__ == "__main__": + # perform some actions before main() + main(args) +``` +Actions performed by the script are contained within the `main` function. This is called if the `__name__` variable (a special veriable set by the Python interpreter) is `__main__`. So if our file is run by the Python interpreter on the command line, this condition will be satisfied. -### Mocks - -Another use of polymorphism is **mocking** in tests. - - - - - -Lets live code a mock shape: +## Passing Command-Line Options to Controller +To read command line arguments passed into a script, we use `argparse`. To use this, we import it in our controller script, initialise a parser class, and then add arguments which we want to look out for: ```python -from unittest.mock import Mock +import argparse -def test_sum_shapes(): +parser = argparse.ArgumentParser( + description='A basic patient inflammation data management system') - mock_shape1 = Mock() - mock_shape1.get_area().return_value = 10 +parser.add_argument( + 'infiles', + nargs='+', + help='Input CSV(s) containing inflammation series for each patient') - mock_shape2 = Mock() - mock_shape2.get_area().return_value = 13 - my_shapes = [mock_shape1, mock_shape2] - total_area = sum(shape.get_area() for shape in my_shapes) - - assert total_area = 23 +args = parser.parse_args() ``` - -Easier to read this test as do not need to understand how -get_area might work for a real shape. - -Focus on testing behaviour rather than implementation. - - - - -## Exercise: Test Using a Mock Implementation - -Complete the exercise to write a mock data source for `analyse_data`. - -Time: 15min - - - - - -## Object Oriented Programming - -These are techniques from **object oriented programming**. - -There is a lot more that we will not go into: - -* Inheritance -* Information hiding - - - - - -## A note on Data Classes - -Regardless of doing Object Oriented Programming or Functional Programming - -**Grouping data into logical classes is vital for writing maintainable code.** - - - - -## ☕ 10 Minute Break ☕ - - - -## Model-View-Controller - -Reminder - this program is using the MVC Architecture: - -* Model - Internal data of the program, and operations that can be performed on it -* View - How the data is presented to the user -* Controller - Responsible for how the user interacts with the system - - - - -### Breakout: Read and do the exercise - -Read the section **Separating Out Responsibilities**. - -Complete the exercise. - -Time: 10min - -Suggest discussing answer to the exercise as a table. -Once time is up, ask one table to share their answer and any questions -Then do the other exercise - +Take people through each of these parts: - -### Breakout Exercise: Split out the model code from the view code +Import the library -Refactor `analyse_data` such the view code we identified in the last exercise is removed from the function, so the function contains only model code, and the view code is moved elsewhere. +Initialise the parser class -Time: 10min +Define an argument called 'infiles' which will hold a list of input CSV file(s) to read inflammation data from. The user can specify 1 or more of these files, so we define the number of args as '+'. It also contains a help string for the user, which will be displayed if they use `--help` on the command line. +You then parse the arguments, which returns an object we called `args` which contains all of the arguments requested. These can be accessed by their name, eg `args.infiles`. +## Positional and Optional Arguments +Positional arguments are required arguments which must be provided all together and in the proper order when calling the script. Optional arguments are indicated by a `-` or `--` prefix, and these do not have to be provided to run the script. For example we can see the help string: -## Programming Patterns - -* MVC is a programming pattern -* Others exist - like the visitor pattern -* Useful for discussion and ideas - not a complete solution - - - - - -Next slide if it feels like we have got loads of time. - - - - -### Breakout Exercise: Read about a random pattern on the website and share it with the group - -Go to the website linked and pick a random pattern, see if you can understand what it is doing -and why you'd want to use it. - -Time: 15min - - - - - -## Architecting larger changes - -* Use diagrams of boxes and lines to sketch out how code will be structured -* Useful for larger changes, new code, or even understanding complex projects - - - - - -## Exercise: Design a high-level architecture - -Sketch out a design for something you have come up with or the current project. - - -Time: 10min - - - - - -At end of time, share diagrams, discussion. - - - - - -## Breakout: Read to end of page - -Read til the end, including the exercise on real world examples - -Time: 15min +```bash +$ python3 inflammation-analysis.py --help +``` - +```bash +usage: inflammation-analysis.py [-h] infiles [infiles ...] - +A basic patient inflammation data management system -At end of time, reconvene to discuss real world examples as a group. +positional arguments: + infiles Input CSV(s) containing inflammation series for each patient +optional arguments: + -h, --help show this help message and exit +``` - ## Conclusion Good software architecture and design is a **huge** topic. From e36bd57ae594c84c9e00c31fa78c6f08720b03fc Mon Sep 17 00:00:00 2001 From: bielsnohr <6177028+bielsnohr@users.noreply.github.com> Date: Tue, 27 May 2025 15:04:44 +0100 Subject: [PATCH 2/2] Make further updates to section 3 slides around refactoring and functional programming --- slides/section_3_software_dev_process.md | 154 +++++++++++++---------- 1 file changed, 85 insertions(+), 69 deletions(-) diff --git a/slides/section_3_software_dev_process.md b/slides/section_3_software_dev_process.md index 6f74ead68..af987952b 100644 --- a/slides/section_3_software_dev_process.md +++ b/slides/section_3_software_dev_process.md @@ -11,7 +11,7 @@ jupyter: theme: solarized --- - + # Section 3: Software Development as a Process
@@ -24,7 +24,7 @@ jupyter: - We are going to step up a level and look at the overall process of developing software - + ## Writing Code versus Engineering Software - Software is _not_ just a tool for answering a research question @@ -36,7 +36,7 @@ jupyter: - Software can be reused 🔁 - + - Software is _not_ just a tool for answering a research question - Software is shared frequently between researchers and _reused_ after publication - Therefore, we need to be concerned with more than just the implementation, i.e. "writing code" @@ -47,7 +47,7 @@ jupyter: - Software can be reused: like with stakeholders, it is hard to predict how the software will be used in the future, and we want to make it easy for reuse to happen - + ## Software Development Lifecycle
@@ -548,24 +548,24 @@ Regardless of doing Object Oriented Programming or Functional Programming - + ## ☕ 10 Minute Break ☕ - + ## Refactoring **Refactoring** is modifying code, such that: - * external behaviour unchanged, + * external behaviour is unchanged, * code itself is easier to read / test / extend. - + -## Refactoring +### Refactoring Refactoring is vital for improving code quality. It might include things such as: * Code decoupling and abstractions @@ -574,25 +574,23 @@ Refactoring is vital for improving code quality. It might include things such as * Simplifying conditional statements to improve readability - + Often working on existing software - refactoring is how we improve it - - -## Refactoring Loop + +### Refactoring Loop When refactoring a piece of software, a good process to follow is: * Make sure you have tests that verify the current behaviour * Refactor the code -* Re-run tests to verify thr behavour of the code is unchanged +* Re-run tests to verify the behavour of the code is unchanged - - -## Refactoring + +### Refactoring In the rest of section we will learn how to refactor an existing piece of code. We need to: @@ -601,18 +599,18 @@ In the rest of section we will learn how to refactor an existing piece of code. - + When refactoring, first we need to make sure there are tests in place that can verify the code behaviour as it is now (or write them if they are missing), then refactor the code and, finally, check that the original tests still pass. In the process of refactoring, we will try to target some of the "good practices" we just talked about, like making good abstractions and reducing cognitive load. - + -## Writing Regression Tests +### Writing Regression Tests Before Refactoring Look at the `analyse_data` function within `inflammation/compute_data.py`: - + ```python def analyse_data(data_dir): data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) @@ -631,38 +629,39 @@ def analyse_data(data_dir): } views.visualize(graph_data) ``` - - - + + Bring up the code Explain the feature: -When using inflammation-analysis.py if the user adds --full-data-analysis then the program will scan the directory of one of the provided files, compare standard deviations across the data by day and plot a graph. +When using inflammation-analysis.py if the user adds `--full-data-analysis` then the program will scan the directory of one of the provided files, compare standard deviations across the data by day and plot a graph. The main body of it exists in inflammation/compute_data.py in a function called analyse_data. We want to add extra regression tests to this function. Firstly, modify the function to return the data instead of visualise it so that it is easier to automatically test. Next, we will add assert statements that verify that the current outcome always remains the same, rather than checking if it is *correct* or not. These are called regression tests. - + -## Writing Regression Tests +### Exercise: Writing Regression Tests Add a new test file called `test_compute_data.py` in the `tests` folder and add a regression test to verify the current output of `analyse_data()`. Remember that this is a *regression test* to check that we don't break our code during refactoring, and so ensure that this result remains unchanged. It does *not* necessarily check that the result is correct. - ```python +from inflammation.compute_data import analyse_data + def test_analyse_data(): - from inflammation.compute_data import analyse_data path = Path.cwd() / "../data" data_source = CSVDataSource(path) result = analyse_data(data_source) # TODO: add assert statement(s) to test the result value is as expected ``` - + + + Hint: You might find it helpful to assert the results equal some made up array, observe the test failing and copy and paste the correct result into the test. When talking about the solution: @@ -672,33 +671,36 @@ When talking about the solution: * Brittle - changing the files will break the tests - -## Refactoring Functions to only do One Thing + +### Refactoring Functions to only do One Thing + Functions which just do one thing are: * Easier to test * Easier to read * Easier to re-use -Ideally we want to create 'pure functions', which work like a mathematical function - they take some input, and produce an output. They do not rely on any information other than the inputs provided, and do not cause any side effects. +We can take this further by making our single-purpose functions **pure**. - -## Pure Functions + +### Pure Functions -A **pure function** takes in some inputs as parameters, and it produces a consistent output. +A **pure function** is effectively what we think of as a mathematical function: -That is, just like a mathematical function. +- they take some input, and produce an output +- they do not rely on any information other than the inputs provided +- they do not cause any side effects. -The output does not depend on externalities, such as global variables. +As a result, the output of a **pure function** does not depend on externalities or program sate, such as global variables. -There will be no side effects from running the function, eg it wont edit any files or modify global variables/ +Moreover, there will be no side effects from running the function, e.g. it wont edit any files or modify global variables such that behaviour in other parts of our code are unaffected. - -## Pure Functions + +### Pure Functions Pure functions have a number of advantages for maintainable code: @@ -707,8 +709,8 @@ Pure functions have a number of advantages for maintainable code: - -## Refactor Code into a Pure Function + +### Exercise: Refactor Code into a Pure Function Refactor the analyse_data function into a pure function with the logic, and an impure function that handles the input and output. The pure function should take in the data, and return the analysis results: @@ -722,8 +724,8 @@ Time: 10min - -## Testing Pure Functions + +### Testing Pure Functions Pure functions are also easier to test @@ -733,12 +735,12 @@ Pure functions are also easier to test - + Can focus on making sure we get all edge cases without real world considerations - -## Write Test Cases for the Pure Function + +### Exercise: Write Test Cases for the Pure Function Now we have refactored our a pure function, we can more easily write comprehensive tests. Add tests that check for when there is only one file with multiple rows, multiple files with one row and any other cases you can think of that should be tested. @@ -746,7 +748,10 @@ Time: 10min + ```python +from inflammation.compute_data import compute_standard_deviation_by_data + @pytest.mark.parametrize('data,expected_output', [ ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), @@ -754,14 +759,15 @@ Time: 10min ], ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) def test_compute_standard_deviation_by_day(data, expected_output): - from inflammation.compute_data import compute_standard_deviation_by_data + result = compute_standard_deviation_by_data(data) npt.assert_array_almost_equal(result, expected_output) ``` + - -## Functional Programming + +### Functional Programming Pure functions are a concept from an approach to programming called **functional programming**, where programs are constructed by chaining together these pure functions. @@ -771,7 +777,7 @@ We have so far mostly focussed on Procedural Programming, where a series of sequ - + If there is time - do some live coding to show imperative code, then transform into a pipeline: * Sequence of numbers @@ -808,11 +814,11 @@ total = sum(map(squared, filter(is_even, numbers))) ## ☕ 10 Minute Break ☕ - + ## Architecting Code to Separate Responsibilities - + Recall that we are using a Model-View-Controller architecture in our project, which are located in: * **Model**: `inflammation/models.py` @@ -822,16 +828,16 @@ Recall that we are using a Model-View-Controller architecture in our project, wh But the code we were previously analysing was added in a separate script `inflammation/compute_data.py` and contains a mix of all three. - + ### Exercise: Identify Model, View and Controller -Looking at the code inside compute_data.py, what parts could be considered Model, View and Controller code? +Looking at the code inside `compute_data.py`, what parts could be considered Model, View and Controller code? Time: 5min - + Computing the standard deviation belongs to Model. Reading the data from CSV files also belongs to Model. Displaying of the output as a graph is View. @@ -839,7 +845,7 @@ The logic that processes the supplied files is Controller. - + ### Exercise: Split Out Model, View and Controller Refactor analyse_data() function so that the Model, View and Controller code we identified in the previous exercise is moved to appropriate modules. @@ -848,7 +854,7 @@ Time: 10min - + ### Merge the Feature In Hopefully you have now refactored the feature to conform to our MVC structure, and ran our regression tests to check that the outputs rermain the same. @@ -862,9 +868,11 @@ $ git merge full-data-analysis - -## Controller Structure + +### Controller Structure + The structure of our controller is as follows: + ```python # import modules @@ -875,11 +883,19 @@ if __name__ == "__main__": # perform some actions before main() main(args) ``` -Actions performed by the script are contained within the `main` function. This is called if the `__name__` variable (a special veriable set by the Python interpreter) is `__main__`. So if our file is run by the Python interpreter on the command line, this condition will be satisfied. + +This is a common pattern for entry points to Python packages. Actions performed by the script are contained within the `main` function. The main function is run automatically if the `__name__` variable (a special veriable set by the Python interpreter) is `"__main__"`. So if our file is run by the Python interpreter on the command line, this condition will be satisfied, and our script gets run as expected. + +However, if our Python module is imported from another, instead `__name__ = "inflammation_analysis"` will be defined, and the `main()` function will not automatically be run. - -## Passing Command-Line Options to Controller + +It is useful to have this dual behaviour for our entry point scripts so that functions defined within them can be used by other modules without the main function being run on import, while still making it clear how the core functionality is run. Moreover, this pattern makes it possible to test the functions within our script because everything is put inside more easily callable functions. + + + +### Passing Command-Line Options to Controller + To read command line arguments passed into a script, we use `argparse`. To use this, we import it in our controller script, initialise a parser class, and then add arguments which we want to look out for: ```python @@ -897,7 +913,7 @@ args = parser.parse_args() ``` - + Take people through each of these parts: Import the library @@ -909,8 +925,8 @@ Define an argument called 'infiles' which will hold a list of input CSV file(s) You then parse the arguments, which returns an object we called `args` which contains all of the arguments requested. These can be accessed by their name, eg `args.infiles`. - -## Positional and Optional Arguments + +### Positional and Optional Arguments Positional arguments are required arguments which must be provided all together and in the proper order when calling the script. Optional arguments are indicated by a `-` or `--` prefix, and these do not have to be provided to run the script. For example we can see the help string: ```bash @@ -930,7 +946,7 @@ optional arguments: ``` - + ## Conclusion Good software architecture and design is a **huge** topic. @@ -945,6 +961,6 @@ Practise makes perfect: - + ## 🕓 End of Section 3 🕓