class: center, middle, inverse, title-slide # Part 1: Background and Basics ### Michael Kane
School of Public Health, Biostatistics Department
Yale University
###
kaneplusplus
kaneplusplus
--- # <br> Welcome to Bootstrap Learning Python with R <br> ## What are the goals of this class? <br> -- Learn how to translate your R programming skills to other langauges, Python in particular. -- Separate programming language _syntax_ from programming _constructs_. -- Appreciate the linguistic differences between R and Python and think about the implications for expressing computations and execution efficiency. -- Be able to evaluate objective strengths and weaknesses of a programming language. --- # <br> Welcome to Bootstrap Learning Python with R <br> ## How will this class be taught? <br> -- I'm going to start with a core-programming constructs in R. I'll describe it, show how it works, and talk about some of it's distinguishing characteristics -- I'll show the equivalent/analogous construct in Python. I'll describe it, show how it works, and we'll discuss some of the differences -- You will follow along. -- You execute the code, modify it, maybe (probably) break it, and ask questions. --- # <br> Welcome to Bootstrap Learning Python with R <br> ## <br> Who am I and why should you listen to me? <br> My name is Michael Kane, I'm an Assistant Professor of Biostatistics at Yale University and I work on methods in optimization-based machine learning, numerical and statistical computing. -- I have a degree in Computer Engineering, master's degrees in Electrical Engineering and Statistics, and a Ph.D. in Statistics. -- I've been a software architect and developer for a couple of different start-ups and other businesses. I built software in ~15 different programming languages. --- # <br> What are the parts of this class? <br> ## Background and Basics <br> ## Digging Deeper into Python Programming Constructs <br> ## Building Software Systems using Object-Oriented Programming <br> ## Working with Data --- # <br> Becoming a Polyglot Programmer <br> ## Why learn Python <br> -- "Use the best tool for the job" is great advice if you know how to use all of the tools. -- Python is a great general-purpose programming language. - It is [popular](https://www.tiobe.com/tiobe-index/) - Simple syntax - A lot of facilities in non-data-science applications -- You learn to separate syntax from construct. -- It will make you a better programmer. --- # <br> Becoming a Polyglot Programmer <br> ## Is Python Better than R? <br> -- If you are looking for a [head-to-head comparison by a computer scientist, Norm Matloff](https://github.com/matloff/R-vs.-Python-for-Data-Science) does a good job. -- <br> I like avoiding the questions about language + community in favor of characterizing the language. --- # <br> Becoming a Polyglot Programmer <br> ## Is Python Better than R? <br> -- Python - Object-oriented focused (manages complexity in large software systems - Keras + Tensorflow) - Readable -- R - A pragmatic version of Lisp - More dynamic (better meta-programming capabilities - DSD's) -- Both of them are [Turing complete](https://en.wikipedia.org/wiki/Turing_completeness#Non-mathematical_usage), most comparisons are about areas of emphasis of the community. --- # <br> Materials and Setup <br> ## Python Virtual Environments <br> -- "Python, like most other modern programming languages, has its own unique way of downloading, storing, and resolving packages (or modules). While this has its advantages, there were some _interesting_ decisions made about package storage, resolution [and compatibility], which has lead to some problems—particularly with how and where packages are stored." -[realpython](https://realpython.com/python-virtual-environments-a-primer/) (emphasis added) --- # <br> Materials and Setup <br> ## What are my options for running Python? <br> 1. The command line -- 2. RStudio -- 3. RMarkdown -- 4. IPython notebooks --- # <br> Materials and Setup <br> ## How do we install Python and manage packages? -- <br> ## We use R <br> 1. Install the `reticulate`. -- 2. Use `reticulate::install_miniconda()` to create a conda installation. -- 3. Create a conda environment with `conda_create()` -- 4. Start the conda environment. -- See [python-and-conda-install.r](python-and-conda-install.r) for details. --- # <br> Start the conda environment, run from R Markdown <br> ```r library(reticulate) use_condaenv("python-for-r-developers", required = TRUE) ``` -- Now check that it works: ```` ```{python} # Python print("Hello world from Python!") ``` ```` ```python print("Hello world from Python!") ``` ``` ## Hello world from Python! ``` --- # <br> The Basics <br> ## Style (according to [PEP-8](https://www.python.org/dev/peps/pep-0008)) -- <br> Python doesn't use { } to enclose code blocks it uses indentation. _[Four spaces](https://www.python.org/dev/peps/pep-0008/#indentation)_ is recommended. -- [Line length should be limited to 72 characters](https://www.python.org/dev/peps/pep-0008/#maximum-line-length). -- [Functions and variables should use lowercase with words separated by underscores as necessary to improve readability](https://www.python.org/dev/peps/pep-0008/#method-names-and-instance-variables). -- Variables, types, and functions cannot contain "." This is an infix operator in Python and it's used to call methods (functions) and member attributes. --- # <br> The Basics <br> ## Getting help <br> R's help function is the prefix operator `?`. ```r # R # Get help on the ? function ?`?` ``` -- <br> Python's help function is `help()`. ```python # Python # Get help on the help function help(help) ``` --- # <br> The Basics <br> ## Variables and Assignment -- ```r # R a <- 3 print(a + 2) ``` ``` ## [1] 5 ``` -- In Python assignments are performed using `=`. There is no `<-` -- ```python # Python a = 3 print(a + 2) ``` ``` ## 5 ``` --- # <br> The Basics <br> ## Basic types in R -- ```r # R # Note: factor and raw not included logical <- TRUE numeric <- 2.72 integer <- 42L complex <- 3 + 2i character <- "string" cat(logical, numeric, integer, complex, character) ``` ``` ## TRUE 2.72 42 3+2i string ``` --- # <br> The Basics <br> ## Basic types in Python -- ```python # Python logical = True numeric = 2.72 integer = 42 complex = 3 + 2j # Note: j... like an engineer. character = "string" print(logical, numeric, integer, complex, character) ``` ``` ## True 2.72 42 (3+2j) string ``` --- # <br> The Basics <br> ## What about NA, NaN, Null, -Inf, and Inf? -- Python has `None` - `NA` and others need to be imported from packages. - It is roughly equivalent to `Null`. - We'll get to these later. -- "Use `is` when you want to check against an object's identity (e.g. checking to see if var is None). Use `==` when you want to check equality (e.g. Is var equal to 3?)." - [Stack Overflow User](https://stackoverflow.com/questions/14247373/python-none-comparison-should-i-use-is-or#14247383) -- ```python #Python a = None print(a is None) ``` ``` ## True ``` --- # <br> The Basics <br> ## If Statements in R -- ```r # R confused <- FALSE if (!confused) { cat("I'm getting it.") } else { cat("Huh?") } ``` ``` ## I'm getting it. ``` --- # <br> The Basics <br> ## if Statements in Python -- ```python # Python confused = False if not confused: print("I'm getting it.") else: print("Huh?") ``` ``` ## I'm getting it. ``` --- # <br> The Basics <br> ## `if-elif-else` -- <br> ```python # Python confusion_level = 0 # assume confusion is 0 to 10 (very confused) if 0 <= confusion_level < 3: print("I'm getting it.") elif 3 <= confusion_level < 6: print("I'm kind of getting it.") elif 6 < confusion_level <= 10: print("I'm not really getting it.") else: print("Where am I?") ``` ``` ## I'm getting it. ``` --- # <br> The Basics <br> ## `for` Loops ```r # R for (i in c(1, 2, 3)) { print(i) } ``` ``` ## [1] 1 ## [1] 2 ## [1] 3 ``` ```python # Python for i in (1, 2, 3): print(i) ``` ``` ## 1 ## 2 ## 3 ``` --- # <br> The Basics <br> ## Another `for` Loop ```r # R for (i in seq_len(3)) { print(i) } ``` ``` ## [1] 1 ## [1] 2 ## [1] 3 ``` ```python # Python for i in range(3): print(i+1) ``` ``` ## 1 ## 2 ## 3 ``` --- # <br> The Basics <br> ## `while` Loops in R ```r # R i <- 1 while (i < 4) { print(i) i <- i + 1 } ``` ``` ## [1] 1 ## [1] 2 ## [1] 3 ``` --- # <br> The Basics <br> ## `while` Loops in Python ```python # Python i = 1 while i < 4: print(i) i += 1 ``` ``` ## 1 ## 2 ## 3 ``` --- # <br> The Basics <br> ## Functions ```r # R add_one <- function(v) { v + 1 } cat(add_one(3)) ``` ``` ## 4 ``` ```python # Python def add_one(v): return v + 1 print(add_one(3)) ``` ``` ## 4 ``` --- # <br> The Basics <br> ## Don't forget to return values you want ```python # Python def add_one(v): ret = v + 1 print(ret) ret print(add_one(3)) ``` ``` ## 4 ## None ``` --- # <br> The Basics <br> ## Documenting your function ```python # Python def add_one(v): '''Add one to a vector of numeric values.''' ret = v + 1 print(ret) ret help(add_one) ``` ``` ## Help on function add_one in module __main__: ## ## add_one(v) ## Add one to a vector of numeric values. ``` --- # <br> The Basics <br> ## R Lists ```r # R v <- list(1, 2) print(v) ``` ``` ## [[1]] ## [1] 1 ## ## [[2]] ## [1] 2 ``` --- # <br> The Basics <br> ## R Lists ```r # R v <- c(v, list(3)) print(v) ``` ``` ## [[1]] ## [1] 1 ## ## [[2]] ## [1] 2 ## ## [[3]] ## [1] 3 ``` --- # <br> The Basics <br> ## R Lists ```r # R print(v[-1]) ``` ``` ## [[1]] ## [1] 2 ## ## [[2]] ## [1] 3 ``` --- # <br> The Basics <br> ## Python Lists ```python # Python v = [1, 2] print(v) ``` ``` ## [1, 2] ``` ```python # Python v = [1, 2] print(v + [3]) ``` ``` ## [1, 2, 3] ``` --- # <br> The Basics <br> ## Python Lists Continued ```python # Python v = [1, 2, 3] print(v[-1]) ``` ``` ## 3 ``` --- # <br> The Basics <br> ## List Indexing <br> R indexes start at 1. Python starts at 0. ```r # R v <- c('a', 'b', 'c')[3] cat(v) ``` ``` ## c ``` ```python # Python v = ['a', 'b', 'c'] print(v[2]) ``` ``` ## c ``` --- # <br> The Basics <br> R indexes from the first argument _through_ the second. Python indexes from the first _up to_ the last (non-inclusive). ```r # R v <- c('a', 'b', 'c')[1:2] cat(v) ``` ``` ## a b ``` ```python # Python v = ['a', 'b', 'c'] print(v[0:2]) ``` ``` ## ['a', 'b'] ``` ```python print(v[:2]) ``` ``` ## ['a', 'b'] ``` --- # <br> The Basics <br> ## Lists containing values with different types ```python # Python list_vals = [1, '2', None] def add_one(list_vals): for i in range(len(list_vals)): if list_vals[i] is None: print("Value at index " + str(i) + " is None. Not adding 1.") elif isinstance(list_vals[i], str): list_vals[i] = str(int(list_vals[i]) + 1) elif isinstance(list_vals[i], int): list_vals[i] += 1 else: print("Unsupported type at ", i, ". Not doing anything") return list_vals ``` --- # <br> The Basics <br> ## Lists containing values with different types (cont'd) ```python # Python print(list_vals) ``` ``` ## [1, '2', None] ``` ```python print(add_one(list_vals)) ``` ``` ## Value at index 2 is None. Not adding 1. ## [2, '3', None] ``` ```python print(list_vals) ``` ``` ## [2, '3', None] ``` --- # <br> The Basics <br> ## Values are passed _by reference_ ```python # Python list_vals = [1, '2', None] def add_one(list_vals): ret_vals = list_vals.copy() for i in range(len(ret_vals)): if ret_vals[i] is None: print("Value at index " + str(i) + " is None. Not adding 1.") elif isinstance(ret_vals[i], str): ret_vals[i] = str(int(ret_vals[i]) + 1) elif isinstance(ret_vals[i], int): ret_vals[i] += 1 else: print("Unsupported type at ", i, ". Not doing anything") return ret_vals ``` --- # <br> The Basics <br> ## Values are passed _by reference_ (cont'd) ```python # Python lv_one_added = add_one(list_vals) ``` ``` ## Value at index 2 is None. Not adding 1. ``` ```python print(lv_one_added) ``` ``` ## [2, '3', None] ``` ```python print(list_vals) ``` ``` ## [1, '2', None] ``` --- # <br> The Basics <br> ## `=` Doesn't copy lists ```python # Python a = [3] b = a b[0] = 4 print(a) ``` ``` ## [4] ``` --- # <br> The Basics <br> ## It does copy singletons ```python # Python a = 3 b = a b = 4 print(a) ``` ``` ## 3 ``` --- # <br> The Basics ## Lists are not like R vectors ```r # R a <- 1:3 print(a + 1) ``` ``` ## [1] 2 3 4 ``` ```python # Python a = list(range(1, 4)) print(a) ``` ``` ## [1, 2, 3] ``` ```python a + 1 ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: can only concatenate list (not "int") to list ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` --- # <br> The Basics <br> ## List Comprehensions ```r # R l <- as.list(1:3) print(lapply(l, function(x) x + 1)) ``` ``` ## [[1]] ## [1] 2 ## ## [[2]] ## [1] 3 ## ## [[3]] ## [1] 4 ``` --- # <br> The Basics <br> ## List Comprehensions ```python # Python l = list(range(1, 4)) print( [x + 1 for x in l] ) ``` ``` ## [2, 3, 4] ``` --- # <br> The Basics <br> ## A Note on Tuples In Python a tuple is an _immutable_ list. ```python t1 = 1, 2, 3 # Create a tuple like this... t2 = (1, 2, 3) # or like this t1[1] = 2 # Throws an error ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: 'tuple' object does not support item assignment ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ```python first_equal, second_equal, third_equal = [ t1[i] == t2[i] for i in range(len(t1)) ] print(first_equal, second_equal, third_equal) ``` ``` ## True True True ``` --- # <br> The Basics <br> ## Dictionaries in Python are named lists in R) ```r # R l <- list(a = c(1:3), b = c("c", "d", "e", "f")) l$b[2:4] ``` ``` ## [1] "d" "e" "f" ``` ```python # Python l = {'a' : list(range(1, 4)), 'b' : ["c", "d", "e", "f"] } print(l['b'][1:4]) ``` ``` ## ['d', 'e', 'f'] ``` --- # <br> The Basics <br> ## Dictionaries are composed of keys and values ```python # Python l = {'a' : list(range(1, 4)), 'b' : ["c", "d", "e", "f"] } print(l.keys()) ``` ``` ## dict_keys(['a', 'b']) ``` ```python print(l.values()) ``` ``` ## dict_values([[1, 2, 3], ['c', 'd', 'e', 'f']]) ``` --- <style type="text/css"> .huge { font-size: 200%; } </style> <br> <br> <br> <br> .center[ .huge[ Great job. You made it to the end of part 1. ] ]