class: center, middle, inverse, title-slide # Part 3: Building Software Systems using Object-Oriented Programming ### Michael Kane
School of Public Health, Biostatistics Department
Yale University
###
kaneplusplus
kaneplusplus
--- # <br> Highlights from Part 2 -- <br> ## Another look at environments and packages -- <br> ## Numpy for R's vector/matrix/array operations -- <br> ## How to use Python Objects -- <br> ## Some Visualization --- # <br> Feedback Questions -- <br> ## Is Python _that_ different from R? -- <br> ## What do you like most about Python so far? -- <br> ## Which Python constructs would you like to see in R? --- # <br> Topics for this part -- <br> ## Function with State: Mutable Closures and Generators -- <br> ## R and Python's Dispatch -- <br> ## Python Classes and Objects --- # <br> Function with State: Mutable Closures and Generators <br> ## In R Function with State are called _Mutable Closures_ -- ```r #R increment_generator <- function(start = 1) { val <- start - 1 function() { val <<- val + 1 val } } inc <- increment_generator() inc() ``` ``` ## [1] 1 ``` ```r inc() ``` ``` ## [1] 2 ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## How does this work? ```r # R ls(environment(inc)) ``` ``` ## [1] "start" "val" ``` ```r environment(inc)$val ``` ``` ## [1] 2 ``` ```r b <- inc() environment(inc)$val ``` ``` ## [1] 3 ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## <br> So what? <br> -- ## This is very useful for iterating over things -- - in loops it means we don't have to have all of the values we are iterating over up front -- - we can iterate over things that are bigger than memory -- - we can easily change the size of the things we are iterating over -- - useful in parallel computing where number of processors varies from machine to machine. --- # <br> Function with State: Mutable Closures and Generators ## This is the basis for iterators in R -- ```r library(iterators) icount ``` ``` ## function (count) ## { ## if (missing(count)) ## count <- NULL ## else if (!is.numeric(count) || length(count) != 1) ## stop("count must be a numeric value") ## i <- 0L ## nextEl <- function() { ## if (is.null(count) || i < count) ## (i <<- i + 1L) ## else stop("StopIteration", call. = FALSE) ## } ## it <- list(nextElem = nextEl) ## class(it) <- c("abstractiter", "iter") ## it ## } ## <bytecode: 0x7fcec4bcddd0> ## <environment: namespace:iterators> ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## This is the basis for iterators in R ```r it <- icount(2) nextElem(it) ``` ``` ## [1] 1 ``` ```r nextElem(it) ``` ``` ## [1] 2 ``` ```r nextElem(it) ``` ``` ## Error: StopIteration ``` ```r nextElem(it) ``` ``` ## Error: StopIteration ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## Which can be used with the `%foreach%` packge ```r library(foreach) registerDoSEQ() system.time({ foreach(it = icount(10), .combine = c) %dopar% { Sys.sleep(1) it + 1 } }) ``` ``` ## user system elapsed ## 0.026 0.002 10.055 ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## ... and is an excellent package for parallelization ```r library(doParallel) ``` ``` ## Loading required package: parallel ``` ```r registerDoParallel() # Same code. Runs in parallel. system.time({ foreach(it = icount(10), .combine = c) %dopar% { Sys.sleep(1) it + 1 } }) ``` ``` ## user system elapsed ## 0.032 0.075 2.051 ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## R's iterators `\(\simeq\)` Python's generators <br> ```python # Python def icount(n = None): if not isinstance(n, int): raise ValueError("n must be an integer.") i = 0 while n is None or i < n: yield i i += 1 ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## R's iterators ~ Python's generators ```python # Python it = icount(2) next(it) ``` ``` ## 0 ``` ```python next(it) ``` ``` ## 1 ``` ```python next(it) ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): StopIteration: ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## Generators and Loops ```python # Python [x + 1 for x in icount(5)] ``` ``` ## [1, 2, 3, 4, 5] ``` ```python for x in icount(5): print(x + 1) ``` ``` ## 1 ## 2 ## 3 ## 4 ## 5 ``` --- # <br> Function with State: Mutable Closures and Generators <br> ## Parallelizing Loops ```python # Python from multiprocessing import Pool def add_one(x): return x + 1 p = Pool(processes = 4) p.map(add_one, icount(10)) ``` ``` ## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ``` --- # <br> R and Python's Dispatches <br> ## What is Dispatch? <br> ### The choice of which version of a method to call. - If this choice is based on a single object, it is _single dispatch_ - For multiple object - _multiple dispatch_. ### A function does not dispatch. ### In R: - Single dispatch is supported by S3, R5 (RC) (also implemented in packages R6 and R.oo). - Multiple dispatch is supported by S4. ### In Python - Single dispatch is handled with objects. - Multiple dispatch is available in a few packages. --- # <br> R and Python's Dispatches ## A review of R's S3 dispatch ```r # R add_one <- function(x) { UseMethod("add_one", x) } add_one.default <- function(x) { stop(paste("Don't know how to add_one to object of type", class(x))) } add_one.numeric <- function(x) { print("Dispatched to `add_one.numeric`.") x } cat("Calling add_one on a.") add_one("a") cat("Calling foo on the number 1.") add_one(1) ``` --- # <br> R and Python's Dispatches <br> ## A quick review of R's S3 dispatch (cont'd) <br> ``` ## Calling add_one on "a". ``` ``` ## Error in add_one.default("a"): Don't know how to add_one to object of type character ``` ``` ## Calling add_one on the number 1. ``` ``` ## Dispatched to `add_one.numeric`. ``` ``` ## [1] 2 3 4 5 6 7 8 9 10 11 ``` --- # <br> R and Python's Dispatches <br> ## S3 in Practice ```r print_methods <- methods(print) print(head(print_methods, 20)) ``` ``` ## [1] "print,ANY-method" "print,diagonalMatrix-method" ## [3] "print,sparseMatrix-method" "print.acf" ## [5] "print.AES" "print.anova" ## [7] "print.aov" "print.aovlist" ## [9] "print.ar" "print.Arima" ## [11] "print.arima0" "print.AsIs" ## [13] "print.aspell" "print.aspell_inspect_context" ## [15] "print.bibentry" "print.Bibtex" ## [17] "print.browseVignettes" "print.by" ## [19] "print.bytes" "print.changedFiles" ``` --- # <br> R and Python's Dispatches <br> ## A Python Equivalent - objects <br> ### We already know how to find and call list or numpy array methods. <br> ### Let's find out how to build one of these objects, which has it's own dispatch. <br> ### The interesting part is not adding one. It is building objects that ### perform an operation with a common name for different types. <br> ### We are going to start by building a _class_ which describes types of objects. --- # <br> R and Python's Dispatches <br> ## The `AddOneToNumericList` Class ```python # Python class AddOneToNumericList: def __init__(self, lst): if any( [not isinstance(x, (float, int)) for x in lst] ): raise TypeError("All list elements must be int or float") self.lst = lst def add_one(self): self.lst = [x + 1 for x in self.lst] def get_lst(self): return(self.lst) ``` --- # <br> R and Python's Dispatches <br> ## Creating an Instance ```python # Python my_new_object = AddOneToNumericList(list(range(1, 11))) print(my_new_object.get_lst()) ``` ``` ## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ``` ```python print(my_new_object.lst) ``` ``` ## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ``` ```python my_new_object.add_one() print(my_new_object.lst) ``` ``` ## [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] ``` --- # <br> R and Python's Dispatches <br> ## Limiting access to attributes ```python # Python class AddOneToNumericList: def __init__(self, lst): if any( [not isinstance(x, (float, int)) for x in lst] ): raise TypeError("All list elements must be int or float") self.__lst = lst def add_one(self): self.__lst = [x + 1 for x in self.__lst] def get_lst(self): return(self.__lst.copy()) ``` --- # <br> R and Python's Dispatches <br> ## Limiting access to attributes (cont'd) ```python # Python my_new_object = AddOneToNumericList(list(range(1, 11))) print(my_new_object.get_lst()) ``` ``` ## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ``` ```python print(my_new_object.__lst) ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): AttributeError: 'AddOneToNumericList' object has no attribute '__lst' ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` --- # <br> R and Python's Dispatches <br> ## Let's abstract this a little bit <br> ### The class works for numeric (int and float) types. <br> ### We've already implemented adding one other other types (strings). --- # <br> R and Python's Dispatches <br> ## Let's Create an _Abstract Class_ `AddOneToList` ```python # Python from abc import ABC, abstractmethod class AddOneToList(ABC): @abstractmethod def __init__(self, lst): self._lst = lst @abstractmethod def add_one(self): pass @abstractmethod def get_lst(self): pass ao = AddOneToList(list(range(1, 11))) ``` --- # <br> R and Python's Dispatches <br> ## Let's Create an _Abstract Class_ `AddOneToList` (cont'd) <br> ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: Can't instantiate abstract class AddOneToList with abstract methods __init__, add_one, get_lst ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` --- # <br> R and Python's Dispatches ## `@abstractmethod` is a _Decorator_ ```python def check_second_arg_not_zero(func): def inner(a1, a2): if a2 == 0: print("Can't divide by zero!") return(None) return func(a1, a2) return inner @check_second_arg_not_zero def divide(num, denom): return num / denom divide(22, 7) ``` ``` ## 3.142857142857143 ``` ```python divide(22, 0) ``` ``` ## Can't divide by zero! ``` --- # <br> R and Python's Dispatches <br> ## Now Let's Create `AddOneToNumericList` ```python # Python class AddOneToNumericList(AddOneToList): def add_one(self): self._lst = [x + 1 for x in self._lst] def get_lst(self): return(self._lst.copy()) ao = AddOneToNumericList(list(range(1, 11))) ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: Can't instantiate abstract class AddOneToNumericList with abstract methods __init__ ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` --- # <br> R and Python's Dispatches <br> ## Now Create a Concrete Classes ```python # Python class AddOneToIntList(AddOneToNumericList): def __init__(self, lst): if any( [not isinstance(x, int) for x in lst] ): raise TypeError("All list elements must be int!") super().__init__(lst) aoi = AddOneToIntList(list(range(1, 11))) aoi.add_one() print(aoi.get_lst()) ``` ``` ## [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] ``` --- # <br> R and Python's Dispatches <br> ## Now Create a Concrete Classes (cont'd) ```python # Python AddOneToIntList([float(x) for x in range(1, 11)]) ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: All list elements must be int! ## ## Detailed traceback: ## File "<string>", line 1, in <module> ## File "<string>", line 5, in __init__ ``` --- # <br> R and Python's Dispatches <br> ## What if I want to support a bunch of different types and let the constructor ## figure out which one to make? <br> -- ### You want a _factory_ <br> -- ### You provide a list, it figures out if it can create an instance of a class derived from ### AddOneToList and gives it to you. <br> -- ### You don't even need to specify the concrete types before hand. <br> -- ### We'll need to revise `AddOneToList`. --- # <br> R and Python's Dispatches <br> ## `AddOneToList` Redux ```python # Python def get_lst_type(lst): if len(lst) == 0: raise AssertionError("List length is zero.") lst_types = [type(lst[0]) == type(x) for x in lst] if not all(lst_types): raise AssertionError("All list elements must have the same type.") return type(lst[0]) ``` --- # <br> R and Python's Dispatches ```python class AddOneToList(ABC): def factory(lst): if get_lst_type(lst) is int: return AddOneToIntList(lst) else: raise TypeError("Unsupported list type.") factory = staticmethod(factory) @abstractmethod def __init__(self, lst): self._lst = lst @abstractmethod def add_one(self): pass @abstractmethod def get_lst(self): pass ``` --- # <br> R and Python's Dispatches <br> ## `AddOneToList` Redux (cont'd) ```python lsts = [AddOneToList.factory(list(range(x))) for x in range(1, 3)] print(lsts) ``` ``` ## [<__main__.AddOneToIntList object at 0x7fcf18231940>, <__main__.AddOneToIntList object at 0x7fcf18231978>] ``` ```python print(lsts[1].get_lst()) ``` ``` ## [0, 1] ``` ```python [x.add_one() for x in lsts] ``` ``` ## [None, None] ``` ```python print(lsts[1].get_lst()) ``` ``` ## [1, 2] ``` --- # <br> R and Python's Dispatches <br> ## What's the point again? <br> ### Factories create the "right" class with a little bit of information. - they can be static, as in our example - or they can be dynamic, allowing users to register new concrete classes ### Methods have the same interface and do different things based on class. <br> ### Examples - data importing, a class might know how to get data from various sources - graphics, a class might know how to create a visualization - model fitting, a class might know how to fit a data set --- # <br> `*args` and `**kwargs` -- <br> ## Both are used to take a variable number of argument. -- <br> ## `*args` is for unnamed arguments. -- <br> ## `**kwargs` is for named arguments. --- # <br> `*args` and `**kwargs` <br> ## `*args` ```r star_arg <- function(...) { dots <- list(...) unlist(dots) } star_arg("Almost", "at", "the", "end", "of", "this", "part") ``` ``` ## [1] "Almost" "at" "the" "end" "of" "this" "part" ``` -- ```python def star_arg(*argv): print([x for x in argv]) star_arg("Almost", "at", "the", "end", "of", "this", "part") ``` ``` ## ['Almost', 'at', 'the', 'end', 'of', 'this', 'part'] ``` --- # <br> `*args` and `**kwargs` ## `**kwargs` ```r star_star_kwarg <- function(...) { dots <- list(...) for (i in seq_along(dots)) { print(paste(names(dots)[i], "==", dots[[i]])) } } star_star_kwarg(first = "R", second = "Python") ``` ``` ## [1] "first == R" ## [1] "second == Python" ``` -- ```python def star_star_kwarg(**kwargs): for key, value in kwargs.items(): print ("%s == %s" %(key, value)) star_star_kwarg(first = "R", second = "Python") ``` ``` ## first == R ## second == Python ``` --- <style type="text/css"> .huge { font-size: 200%; } </style> <br> <br> <br> <br> .center[ .huge[ You made it to the end of part 3. ] ]