Welcome

This is a Quarto website for the “Forschungsseminar CSS” course at Leipzig University. It covers different techniques for the aspiring computational social scientist, hence I have dubbed it “Toolbox CSS.” You can reach me anytime at felix.lennert@uni-leipzig.de. If you’re interested in my academic work, you can visit my website.

In the field, harvesting some fresh raw material before transforming it to insights (© Alexandra Gibbons)

Here’s the official description:

The Forschungsseminar in Computational Social Science (CSS) equips you with the tools to analyze human behavior, predict social trends, and tackle complex societal issues using cutting-edge data science techniques. From web scraping to AI-powered text analysis, you’ll learn to use your computer in new ways to gain insights into social phenomena.

The curriculum covers a range of topics including data management, web scraping, speech-to-text, and computational text analysis. Students will hone their R and develop skills in Python, applying these languages to real-world social science problems. The course progresses from fundamental concepts to advanced techniques, including the use of state-of-the-art AI models for text analysis.

The course structure consists of one lecture and one lab session per week, providing a balance of theoretical knowledge and practical application. Throughout the semester, students will benefit from hands-on coding exercises, one-on-one mentoring, and collaborative projects. The course culminates in a research paper, allowing students to apply their new skills to a topic of their choice.

Course Structure

The course consists of lectures introducing each week’s content and a course script that provides hands-on coding examples for the content. It is mostly containing R with some Python mixed in for good measure when no great R alternatives exist (e.g., for web scraping with Selenium, text classification with transformer models).

At the beginning of the course, students are encouraged to form groups based on research interests and general vibes. I require each student group to check in with me at the beginning of each week to report their progress (even if there’s nothing to report – no progress, no problem). This does not count towards any grade but rather serves the purpose of me receiving feedback on the learning experience (this is a new course!) – and will hopefully help me with providing more appropriate guidance.

Here’s an overview of the topics covered:

WEEK TITLE CONTENT INFORMAL TITLE
1 Kick Off Housekeeping; Setting up workstation; R recap Whatever you want to know about CSS
2 Brief Intro to Python & Regexes Python basics (reticulate, data types, loops, functions, pandas); Regular expressions with stringr REGEXES – tame your data
3 Data Acquisition I How the web is written and ethics; rvest web scraping stealing data from websites without them noticing it
4 Data Acquisition II Dynamic pages and forms with selenium; APIs with httr2 stealing MORE data from websites
5 Data Acquisition III Intro to OCR and transcription making images and audio readable
6 Data Acquisition IV Optical Character Recognition (tesseract); Speech transcription (OpenAI Whisper); Project discussion making the computer your transcription servant
7 Student Project Week Work on projects in class time to get your hands dirty
8 Text as Data I Bag of words; Sentiment analysis, TF-IDF, NER/POS basic text analysis
9 Text as Data II Supervised machine learning in theory and practice advanced text analysis with training data
10 Text as Data III Unsupervised ML (topic modeling); Remote counseling pre-Christmas finding patterns without labels
11 Text as Data IV Measuring similarity and distributional hypothesis; Word embeddings cutting-edge text analysis with vector spaces
12 Text as Data V Supervised learning on steroids (BERT); Active learning with BERT holy shit…transformer models
13 Text as Data VI LLMs for information extraction; Local LLMs primer unleashing the power of large language models
14 Presentation Preparation Week Work on presentations (deadline Jan 30, 6PM) polish your masterpiece
15 Presentations & Wrap Up Peer-reviewed presentations; Course wrap-up show off your work

Syllabus

Please click here to download the latest version of the syllabus.

Alternatively, read it here (probably not the best if you’re visiting this page on a mobile device):