Chapter 3: reticulate

Using Python in R with reticulate

The reticulate package provides a comprehensive set of tools for interoperability between Python and R. It allows you to call Python from R in a variety of ways, including importing Python modules, sourcing Python scripts, and using Python interactively from R.

needs(reticulate, tidyverse)

Setting up Python

Before you can use Python in R, you need to ensure Python is installed on your system. The reticulate package will try to find Python automatically, but you can also specify which Python to use.

For a consistent experience across Windows and Mac, I recommend using reticulate’s built-in miniconda installation helper:

install_miniconda()

This will install a minimal version of conda, which is a package and environment management system that works well for both R and Python.

I would recommend using a conda environment for managing Python packages, as it helps avoid conflicts. When creating the environment, you can specify the Python version (this is important since not all Python versions are compatible with all package versions, unlike R) and packages you need. For example, to create an environment named “toolbox_env” with the numpy package:

conda_create("toolbox_env", packages = c("python=3.9", "numpy"))

If you work in Positron, you might need to go to settings and select the Python interpreter from this environment. In RStudio, this can be done by navigating to Tools -> Global Options -> Python and selecting the appropriate interpreter. An overview of all conda environments and their paths can be found with:

conda_list()
          name
1         base
2 selenium_env
3  toolbox_env
                                                                      python
1                   /Users/felixlennert/Library/r-miniconda-arm64/bin/python
2 /Users/felixlennert/Library/r-miniconda-arm64/envs/selenium_env/bin/python
3  /Users/felixlennert/Library/r-miniconda-arm64/envs/toolbox_env/bin/python

Side note: you can remove an environment with

conda_remove("r-reticulate")

Basic Python usage in R

Before you can get started, you need to tell reticulate to use this environment:

use_condaenv("toolbox_env", required = TRUE)

Further, post-creation, you can install Python packages into this environment directly from R:

py_install(c("pandas", "scikit-learn"))

In Quarto documents, like this one, you can use Python chunks directly by using the {python} engine (i.e., by typing ```{python} instead of ```{r} when creating a code block):

# This is a Python chunk
import pandas as pd
import numpy as np

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'score': [85.5, 92.0, 78.5, 88.0]
}

df = pd.DataFrame(data)
print(df)
      name  age  score
0    Alice   25   85.5
1      Bob   30   92.0
2  Charlie   35   78.5
3    David   40   88.0

The objects created in Python chunks are accessible in R if you make them globally accessible. For example, the df DataFrame created above can be accessed in R as follows:

df_r <- py$df
print(df_r)
     name age score
1   Alice  25  85.5
2     Bob  30  92.0
3 Charlie  35  78.5
4   David  40  88.0

However, note that this is only the case when knitting the document. If you run the chunks interactively, the objects won’t be available in R.

You can enable availability behavior while coding by running Python code directly from R using the py_run_string() function:

py_run_string("
import pandas as pd
import numpy as np

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'score': [85.5, 92.0, 78.5, 88.0]
}

df = pd.DataFrame(data)
print(df)
")
      name  age  score
0    Alice   25   85.5
1      Bob   30   92.0
2  Charlie   35   78.5
3    David   40   88.0
# This should work
print(py$df)
     name age score
1   Alice  25  85.5
2     Bob  30  92.0
3 Charlie  35  78.5
4   David  40  88.0

That’s a bit cumbersome for larger code blocks, so you can also use py_run_file() to source a Python script. Note that the output objects might differ in type when accessed from R. For example, this pandas data frame becomes a named list in R.

py_run_file("data/create_name_tbl.py")

data_list <- py$data
typeof(data_list)
[1] "list"

This is however easily fixable:

data_list |> bind_cols()
# A tibble: 4 × 3
  name      age score
  <chr>   <int> <dbl>
1 Alice      25  85.5
2 Bob        30  92  
3 Charlie    35  78.5
4 David      40  88  

Another way of sharing objects between R and Python is of course by writing a file to disc. You can save data frames as CSV files in R and read them into Python, or vice versa. Make sure to set index=False when saving from Python to avoid an extra index column.

df.to_csv("data/people.csv", index=False)
df.to_csv("data/people_w_index.csv", index=True)
read_csv("data/people.csv")
Rows: 4 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): name
dbl (2): age, score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 4 × 3
  name      age score
  <chr>   <dbl> <dbl>
1 Alice      25  85.5
2 Bob        30  92  
3 Charlie    35  78.5
4 David      40  88  
read_csv("data/people_w_index.csv")
New names:
Rows: 4 Columns: 4
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): name dbl (3): ...1, age, score
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
# A tibble: 4 × 4
   ...1 name      age score
  <dbl> <chr>   <dbl> <dbl>
1     0 Alice      25  85.5
2     1 Bob        30  92  
3     2 Charlie    35  78.5
4     3 David      40  88