Overview – week by week
Week 1: Kick Off
This week answers the following questions:
- What does the term CSS mean?
- Which principles underly a new CSS?
- Which methods does CSS encompass?
- How do we set up our computational workstation?
Week 2: Brief Intro to Python & Regexes
This week answers the following questions:
- How can we run Python in RStudio using reticulate?
- What are the basics of Python programming?
- What are regular expressions and why are they powerful?
- How can we use stringr to work with text patterns?
Week 3: Data Acquisition I
This week answers the following questions:
- How is the web written and structured?
- What are considerations in terms of law and ethics when scraping?
- How do we acquire digital trace data via web scraping with rvest?
- What are CSS selectors and how do they work?
Week 4: Data Acquisition II
This week answers the following questions:
- How do we scrape dynamic web pages that use JavaScript?
- How can we interact with forms and buttons using selenium?
- What are APIs and how do we use them?
- How do we make API calls with httr2?
Week 5: Data Acquisition III
This week answers the following questions:
- What is Optical Character Recognition (OCR)?
- How can we extract text from images and PDFs?
- What is speech-to-text transcription?
- How can we leverage these techniques for social science research?
Week 6: Data Acquisition IV
This week answers the following questions:
- How do we digitize text using Tesseract?
- How do we digitize speech using OpenAI Whisper?
- What are best practices for data acquisition projects?
- How do we plan our research projects?
Week 7: Student Project Week
This week is dedicated to working on your research projects in class.
- Discuss your project ideas with peers and the instructor
- Begin data acquisition for your research
- Troubleshoot technical challenges
- Form study groups with matching research interests
Week 8: Text as Data I
This week answers the following questions:
- What do we mean by “text as data”?
- What do we mean by “bag of words”?
- How do we perform sentiment analysis?
- What are TF-IDF, Named Entity Recognition (NER), and Part-of-Speech (POS) tagging?
Week 9: Text as Data II
This week answers the following questions:
- What is supervised machine learning?
- How can we train classifiers to categorize text?
- What are best practices for supervised ML in social science?
- How do we evaluate model performance?
Week 10: Text as Data III
This week answers the following questions:
- What is unsupervised machine learning?
- How does topic modeling work?
- What can probabilistic topic models tell us about text corpora?
- When should we use unsupervised vs. supervised approaches?
Week 11: Text as Data IV
This week answers the following questions:
- How can we go beyond the “bag of words”?
- What is the distributional hypothesis?
- How can we measure semantic similarity between texts?
- What are word embeddings and what do they capture?
Week 12: Text as Data V
This week answers the following questions:
- What are transformer models like BERT?
- How do these models revolutionize text classification?
- What is transfer learning and why is it so powerful?
- What is active learning and how can it reduce annotation burden?
Week 13: Text as Data VI
This week answers the following questions:
- What’s the latest in Natural Language Processing with Large Language Models (LLMs)?
- How can we use LLMs for information extraction?
- What are local LLMs and when should we use them?
- How do we move from codebooks to promptbooks?
Week 14: Presentation Preparation Week
This week is dedicated to preparing your final presentations.
- Finalize your analyses and preliminary results
- Prepare presentation slides (deadline: January 30, 6PM)
- Review the peer review guidelines
- Practice your 10-minute presentation
Week 15: Presentations & Wrap Up
This week features peer-reviewed presentations of your research projects.
- Present your research (10 minutes)
- Receive feedback from assigned peer reviewers (5 minutes)
- Discuss next steps for your final paper
- Reflect on what we’ve learned throughout the semester