needs(reticulate, tidyverse)
Chapter 3: reticulate
Using Python in R with reticulate
The reticulate
package provides a comprehensive set of tools for interoperability between Python and R. It allows you to call Python from R in a variety of ways, including importing Python modules, sourcing Python scripts, and using Python interactively from R.
Setting up Python
Before you can use Python in R, you need to ensure Python is installed on your system. The reticulate
package will try to find Python automatically, but you can also specify which Python to use.
For a consistent experience across Windows and Mac, I recommend using reticulate
’s built-in miniconda installation helper:
install_miniconda()
This will install a minimal version of conda, which is a package and environment management system that works well for both R and Python.
I would recommend using a conda environment for managing Python packages, as it helps avoid conflicts. When creating the environment, you can specify the Python version (this is important since not all Python versions are compatible with all package versions, unlike R) and packages you need. For example, to create an environment named “toolbox_env” with the numpy
package:
conda_create("toolbox_env", packages = c("python=3.9", "numpy"))
If you work in Positron, you might need to go to settings and select the Python interpreter from this environment. In RStudio, this can be done by navigating to Tools
-> Global Options
-> Python
and selecting the appropriate interpreter. An overview of all conda environments and their paths can be found with:
conda_list()
name
1 base
2 selenium_env
3 toolbox_env
python
1 /Users/felixlennert/Library/r-miniconda-arm64/bin/python
2 /Users/felixlennert/Library/r-miniconda-arm64/envs/selenium_env/bin/python
3 /Users/felixlennert/Library/r-miniconda-arm64/envs/toolbox_env/bin/python
Side note: you can remove an environment with
conda_remove("r-reticulate")
Basic Python usage in R
Before you can get started, you need to tell reticulate
to use this environment:
use_condaenv("toolbox_env", required = TRUE)
Further, post-creation, you can install Python packages into this environment directly from R:
py_install(c("pandas", "scikit-learn"))
In Quarto documents, like this one, you can use Python chunks directly by using the {python}
engine (i.e., by typing ```{python} instead of ```{r} when creating a code block):
# This is a Python chunk
import pandas as pd
import numpy as np
= {
data 'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 40],
'score': [85.5, 92.0, 78.5, 88.0]
}
= pd.DataFrame(data)
df print(df)
name age score
0 Alice 25 85.5
1 Bob 30 92.0
2 Charlie 35 78.5
3 David 40 88.0
The objects created in Python chunks are accessible in R if you make them globally accessible. For example, the df
DataFrame created above can be accessed in R as follows:
<- py$df
df_r print(df_r)
name age score
1 Alice 25 85.5
2 Bob 30 92.0
3 Charlie 35 78.5
4 David 40 88.0
However, note that this is only the case when knitting the document. If you run the chunks interactively, the objects won’t be available in R.
You can enable availability behavior while coding by running Python code directly from R using the py_run_string()
function:
py_run_string("
import pandas as pd
import numpy as np
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 40],
'score': [85.5, 92.0, 78.5, 88.0]
}
df = pd.DataFrame(data)
print(df)
")
name age score
0 Alice 25 85.5
1 Bob 30 92.0
2 Charlie 35 78.5
3 David 40 88.0
# This should work
print(py$df)
name age score
1 Alice 25 85.5
2 Bob 30 92.0
3 Charlie 35 78.5
4 David 40 88.0
That’s a bit cumbersome for larger code blocks, so you can also use py_run_file()
to source a Python script. Note that the output objects might differ in type when accessed from R. For example, this pandas
data frame becomes a named list in R.
py_run_file("data/create_name_tbl.py")
<- py$data
data_list typeof(data_list)
[1] "list"
This is however easily fixable:
|> bind_cols() data_list
# A tibble: 4 × 3
name age score
<chr> <int> <dbl>
1 Alice 25 85.5
2 Bob 30 92
3 Charlie 35 78.5
4 David 40 88
Another way of sharing objects between R and Python is of course by writing a file to disc. You can save data frames as CSV files in R and read them into Python, or vice versa. Make sure to set index=False
when saving from Python to avoid an extra index column.
"data/people.csv", index=False)
df.to_csv("data/people_w_index.csv", index=True) df.to_csv(
read_csv("data/people.csv")
Rows: 4 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): name
dbl (2): age, score
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 4 × 3
name age score
<chr> <dbl> <dbl>
1 Alice 25 85.5
2 Bob 30 92
3 Charlie 35 78.5
4 David 40 88
read_csv("data/people_w_index.csv")
New names:
Rows: 4 Columns: 4
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): name dbl (3): ...1, age, score
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
# A tibble: 4 × 4
...1 name age score
<dbl> <chr> <dbl> <dbl>
1 0 Alice 25 85.5
2 1 Bob 30 92
3 2 Charlie 35 78.5
4 3 David 40 88