# This is a comment in Python (like # in R)
print("Hello from Python!")Hello from Python!
This tutorial introduces Python programming with a focus on comparisons to R. If you’re familiar with R, you’ll find many concepts translate directly, though the syntax differs. Python is a general-purpose programming language that’s become increasingly popular in data science, offering powerful libraries for data manipulation, visualization, and machine learning.
Python can be run in various environments:
For this tutorial, we’ll use Python code chunks in Quarto, which can execute Python code just like they execute R code.
One of the biggest differences between Python and R is how they handle packages and dependencies. In R, when you install a package with install.packages(), it typically goes into a central library that all your R projects share. Python takes a different approach with virtual environments.
A Python environment is an isolated directory that contains:
Imagine you have Project A that needs version 1.0 of a package, and Project B needs version 2.0 of the same package. In R, you’d typically have one version installed. In Python, you create separate environments for each project.
We will use conda for all things environment management in this tutorial. Other options include: - venv (built-in Python tool) - virtualenv (third-party tool)
conda is more powerful and manages both Python versions and packages. It’s popular in data science.
Or do it the reticulate way in Quarto:
If you’re working through this tutorial:
Using Jupyter/Quarto: These typically run in their own environment. Your code chunks will execute in whatever environment is active.
Installing packages: When we eventually use packages like NumPy, selenium, or Pandas, you’ll want to install them in an environment:
If you have not created your environment using reticulate yet, I recommend doing it manually in the console.
Think of it this way: In R, you might use RStudio Projects to organize your work. In Python, you use Projects plus environments to isolate not just files, but also package versions.
If you’re using conda, you can also check in your terminal:
Python has several fundamental data types. Unlike R, where everything is a vector by default, Python distinguishes between individual values (scalars) and collections. You will also use an assignment operator = instead of <-.
Python has two main numeric types: integers (int) and floating-point numbers (float).
<class 'int'>
42
<class 'float'>
3.14159
In R, you’d use class() instead of type(). R doesn’t strictly distinguish between integers and doubles unless you explicitly create an integer with 42L.
Strings represent text data. Python treats single quotes ' and double quotes " identically (unlike R where they’re essentially the same but double quotes are conventional).
# Strings in Python
greeting = "Hello"
name = 'World'
# String concatenation uses +
message = greeting + " " + name + "!"
print(message)Hello World!
# Or use f-strings (formatted string literals) -- Python 3.6+
message_formatted = f"{greeting} {name}!"
print(message_formatted)Hello World!
In R, you’d use paste() or paste0() for concatenation, or str_c() from the tidyverse. Python’s f-strings are similar to glue::glue() in R.
Boolean values are True and False (note the capitalization - this matters in Python!).
True
<class 'bool'>
R uses TRUE and FALSE (all caps), while Python uses True and False (capitalized). Python is case-sensitive, so true would throw an error.
Python’s equivalent of R’s NULL or NA is None.
Python has several built-in collection types. These are roughly analogous to R’s vectors and lists, but with important differences.
Python lists are ordered, mutable (changeable) collections. They’re similar to R’s lists, not R’s atomic vectors.
# Creating a list
numbers = [1, 2, 3, 4, 5]
mixed_list = [1, "two", 3.0, True] # Can contain different types
print(numbers)[1, 2, 3, 4, 5]
[1, 'two', 3.0, True]
Indexing: Python uses 0-based indexing (the first element is at position 0), unlike R’s 1-based indexing.
# Accessing elements
fruits = ["apple", "banana", "cherry", "date"]
print(fruits[0]) # First element (R would use fruits[1])apple
cherry
['banana', 'cherry']
R’s c(1, 2, 3, 4, 5) creates an atomic vector. Python’s [1, 2, 3, 4, 5] is more like R’s list(1, 2, 3, 4, 5), though it can be used for numeric operations when converted to a NumPy array.
Some common list operations:
Tuples are like lists but immutable (can’t be changed after creation). They use parentheses () instead of square brackets.
(10, 20)
<class 'tuple'>
10
R doesn’t have a direct equivalent to tuples, though you could think of them as named lists that are “frozen.”
Dictionaries store key-value pairs. They’re similar to R’s named lists or named vectors.
{'name': 'Alice', 'age': 30, 'city': 'Leipzig'}
Alice
30
{'name': 'Alice', 'age': 30, 'city': 'Leipzig', 'occupation': 'Data Scientist'}
Python supports standard arithmetic operators, similar to R.
14
8
33
3.6666666666666665
3
2
1331
Most operators are the same.
Key differences:
/ always returns float in Python 3 (R returns integer if both are integers)// for integer division (R: %/%)** for exponentiation (R traditionally uses ^, though ** now works)False
True
False
True
Same operators, but Python uses != for “not equal” (R can use != or <>).
Python uses words instead of symbols for logical operations.
False
True
False
and, or, not&, |, ! (for element-wise); &&, || (for scalar)if, elif, elseSometimes you want your code to only run in specific cases. Python uses if statements for conditional execution.
x is smaller than or equal to 5
Add an else block for an alternative action:
x is greater than 5
When you need to check multiple conditions sequentially, use elif (short for “else if”). Python will check each condition in order and execute the first block where the condition is True.
score = 75
if score >= 90:
print("Grade: A")
elif score >= 80:
print("Grade: B")
elif score >= 70:
print("Grade: C")
elif score >= 60:
print("Grade: D")
else:
print("Grade: F")Grade: C
You can have as many elif statements as you need. Python checks them from top to bottom and stops at the first True condition. Note that the final else is optional but highly recommended as a catch-all.
temperature = 25
if temperature < 0:
print("Freezing!")
elif temperature < 10:
print("Cold")
elif temperature < 20:
print("Cool")
elif temperature < 30:
print("Warm")
else:
print("Hot!")Warm
Important: The condition must evaluate to a single boolean value (True or False).
Functions allow you to package code into reusable blocks. Just like in R, functions are essential for writing clean, maintainable code that follows the DRY (Don’t Repeat Yourself) principle.
The basic syntax for defining a function in Python:
You can provide default values for parameters, just like in R.
returnPer the official style guide it is idiomatic to be consistent with return statements:
Be consistent in return statements. Either all return statements in a function should return an expression, or none of them should. If any return statement returns an expression, any return statements where no value is returned should explicitly state this as return None, and an explicit return statement should be present at the end of the function (if reachable)
The object created in the last call will automatically be returned. return(object) shall be avoided unless you need an early return.
Python can return multiple values as a tuple (similar to R’s lists).
def calculate_stats(numbers):
"""Calculate mean and standard deviation."""
n = len(numbers)
mean = sum(numbers) / n
# Calculate standard deviation
squared_diff = [(x - mean) ** 2 for x in numbers] # this is a list comprehension, an abbreviated for loop
variance = sum(squared_diff) / n
std_dev = variance ** 0.5
return mean, std_dev # Returns a tuple
data = [2, 4, 4, 4, 5, 5, 7, 9]
mean_value, std_value = calculate_stats(data) # make sure to define both objects, or unpack the tuple later
mean_value5.0
2.0
#this is the same as:
results = calculate_stats(data)
mean_value = results[0]
std_value = results[1]
mean_value5.0
2.0
Python uses triple-quoted strings (“““…”““) right after the function definition to document what the function does. This is similar to roxygen2 comments in R.
Python has lambda functions for short, one-line functions. These are similar to R’s anonymous functions function(x) x + 1 or the shorthand \(x) x + 1 in purrr.
# Regular function
def add_ten(x):
return x + 10
# Lambda function (anonymous)
add_ten_lambda = lambda x: x + 10
add_ten(5)15
15
Lambda functions are especially useful when you need a simple function as an argument:
Python follows similar scoping rules to R: variables defined inside functions are local to that function.
x = 10 # Global variable
def modify_variable():
x = 5 # Local variable - doesn't affect global x
return x
modify_variable() # Returns 55
10
To modify a global variable inside a function, you need the global keyword (though this is generally discouraged):
1
2
2
Comparison to R: R uses <<- for assigning to parent environments, which is also discouraged.
Loops allow you to repeat operations. Python has for loops and while loops, similar to R.
for loops in Python iterate over sequences (lists, tuples, strings, ranges, etc.).
# Basic for loop
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
print(f"I like {fruit}")I like apple
I like banana
I like cherry
range() generates a sequence of numbers and is commonly used with for loops. It’s similar to R’s seq() or : operator.
When you need both the index and the value, use enumerate().
You can nest loops just like in R.
One of Python’s most powerful features is list comprehensions – a concise way to create lists. This is similar to R’s sapply() or purrr::map().
Now that you understand for loops, list comprehensions will make more sense as they’re essentially condensed for loops. List comprehensions will yield a list.
# Traditional loop approach
squares = [] # define list
for i in range(1, 6):
squares.append(i ** 2) # build list as you go -- more efficient than vector growing in R
print(squares)[1, 4, 9, 16, 25]
[1, 4, 9, 16, 25]
# With condition (like dplyr::filter)
#[expression *for* item *in* iterable *if* condition]
even_squares = [i ** 2 for i in range(1, 11) if i % 2 == 0]
print(even_squares)[4, 16, 36, 64, 100]
The general syntax for list comprehensions is:
[expression *for* item *in* iterable *if* condition]
This reads almost like English: “Create a list of expression for each item in iterable if condition is true.”
while loops continue executing as long as a condition is True. They’re identical in concept to R’s while loops.
Count is: 0
Count is: 1
Count is: 2
Count is: 3
Count is: 4
Loop finished!
Very similar syntax, though Python uses count += 1 instead of count <- count + 1.
# while loop with user input simulation
# Let's simulate checking until we find a value
import random
target = 7
attempts = 0
guess = 0
while guess != target:
guess = random.randint(1, 10)
attempts += 1
print(f"Attempt {attempts}: guessed {guess}")Attempt 1: guessed 10
Attempt 2: guessed 8
Attempt 3: guessed 3
Attempt 4: guessed 3
Attempt 5: guessed 3
Attempt 6: guessed 5
Attempt 7: guessed 3
Attempt 8: guessed 6
Attempt 9: guessed 1
Attempt 10: guessed 1
Attempt 11: guessed 10
Attempt 12: guessed 5
Attempt 13: guessed 8
Attempt 14: guessed 9
Attempt 15: guessed 5
Attempt 16: guessed 9
Attempt 17: guessed 7
Found 7 in 17 attempts!
break and continueControl loop execution with break and continue.
0
1
2
3
4
# continue - skip rest of current iteration
for i in range(10):
if i % 2 == 0: # Skip even numbers
continue
print(i)1
3
5
7
9
break and next work the same way (next in R is continue in Python).
Example 1: Filter and Transform Data Let’s say we have a list of student grades and we want to filter out those who passed (grade >= 70) and calculate the average grade.
# Sample data
students = [
{"name": "Alice", "grade": 85},
{"name": "Bob", "grade": 72},
{"name": "Charlie", "grade": 90},
{"name": "Diana", "grade": 68},
{"name": "Eve", "grade": 95}
]
# Filter students who passed (grade >= 70)
passing_students = [s for s in students if s["grade"] >= 70]
print("Passing students:")Passing students:
Alice: 85
Bob: 72
Charlie: 90
Eve: 95
# Calculate average grade
grades = [s["grade"] for s in students]
average = sum(grades) / len(grades)
print(f"\nClass average: {average:.2f}")
Class average: 82.00
This is similar to using dplyr::filter() and summarize().
Example 2: FizzBuzz
A classic programming exercise - counting with special rules.
# FizzBuzz: Print numbers 1-20, but:
# - "Fizz" for multiples of 3
# - "Buzz" for multiples of 5
# - "FizzBuzz" for multiples of both
for i in range(1, 21):
if i % 3 == 0 and i % 5 == 0:
print("FizzBuzz")
elif i % 3 == 0:
print("Fizz")
elif i % 5 == 0:
print("Buzz")
else:
print(i)1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Python uses indentation to define code blocks (unlike R which uses {}). This is not optional – incorrect indentation will cause errors!
This is indented
This too
This is not in the if block
Important: Use 4 spaces (not tabs) for indentation. Most Python editors handle this automatically.
Python follows PEP 8 style guide:
Python has a philosophy! Type import this to see it:
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Exercise 1: Temperature Converter Write a program that converts temperatures between Celsius and Fahrenheit. Formula: \(F = (C \times 9/5) + 32\)
Exercise 2: Word Counter
Given a sentence, count how many times each word appears.
Hint: Use sentence.split() to get a list of words, and a dictionary to store counts. The operator in yields True if an element is in another list, e.g., if word in word_counts
python_sentence = "the quick brown fox jumps over the lazy dog the fox"
# Split sentence into words
words = python_sentence.split()
# Count occurrences using a dictionary
word_counts = {}
for word in words:
if word in word_counts:
word_counts[word] += 1
else:
word_counts[word] = 1
word_counts{'the': 3, 'quick': 1, 'brown': 1, 'fox': 2, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1}
Exercise 3: Prime Numbers
Write a function that checks if a number is prime. Then use it to find all prime numbers between 1 and 50.
def is_prime(n):
"""Check if a number is prime"""
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0:
return False
for i in range(3, n):
if n % i == 0:
return False
return True
is_prime(50)False
# Find all prime numbers between 1 and 50
primes = []
for num in range(1, 51):
if is_prime(num):
primes.append(num)
primes[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
Exercise 4: List Manipulation
# a. Create a list of numbers from 1 to 20
numbers = list(range(1, 21))
# b. Create a new list with only even numbers
even_numbers = [num for num in numbers if num % 2 == 0]
even_numbers[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
# c. Create a new list with squares of odd numbers
odd_squares = [num ** 2 for num in numbers if num % 2 != 0]
odd_squares[1, 9, 25, 49, 81, 121, 169, 225, 289, 361]
# d. Calculate the sum of numbers divisible by 3
divisible_by_3 = [num for num in numbers if num % 3 == 0]
sum(divisible_by_3)63
The syntax might be different, but the concepts you know from R will translate well!