Week 1
Introduction to the Course & Setting Up

SOCI 269

Sakeef M. Karim
Amherst College

AN INTRODUCTION TO QUANTITATIVE SOCIOLOGY—CULTURE AND POWER

Getting Started—
January 28th

First Order of Business

Quick Questions What’s your name? What are your majors?
Why did you sign up for this class?

First Order of Business

Now, it’s my turn.

An Odd Introduction

My Journey

The Broad View

My Journey

Cities Defined by International Migration

My Journey

Cities Defined by International Migration

My Journey

Cities Defined by International Migration

My Journey

Cities Defined by International Migration

My Research

Personal Culture of Immigrant-Origin People

Figure 8 from Karim (Karim 2024a)

My Research

Personal Culture of Immigrant-Origin People

Adaptation of results from Karim (2024b)

My Research

Exclusionary Politics

Figure 1 from Soehl and Karim (2021)

My Research

Exclusionary Politics

Karim and Lukk’s The Radicalization of Mainstream Parties in the 21st Century

This Class

The Syllabus

You can access the syllabus here.

The Syllabus


Office Hours


Fridays, 2:30-5:00 PM in Morgan Hall (Room 203 A) or during a Zoom Open Slot.

Directions (Click to Expand or Close)

Appointment Policy

All meetings, even during office hours, must be scheduled in advance via Google Calendar.

Readings

Course Readings

All course readings can be accessed via our eReserves page on Moodle.

Course Structure

Module I

Introduction to and ggplot2.

Module II

Three weeks unpacking quantitative sociological scholarship.

Module III

Basic introduction to Python and seaborn.

Module IV

Final Presentations!

Evaluations

Note: Scroll to access the entire table

Task Description Weight Deadline or Evaluative Time Horizon

Participation

Students are expected to actively participate in class discussions by raising their hands to share ideas, asking clarifying questions, assisting peers when needed, and providing constructive feedback during final presentations.

10%

All Semester

Coding Assignment

Students are required to submit a short coding assignment in early March. For this assignment, they will clean a dataset in R, report basic descriptive statistics, and create a simple data visualization. Students must also include their script file (i.e., a .R document) as part of their submission. Additional assignment instructions will be made available online (that is, embedded within this syllabus) by mid-February.

10%

Monday, March 3rd at 8:00 PM.

Midterm Assignment

For their midterm assignment, students must—either individually or in groups of 2-3—submit a relatively complex data visualization; an annotated script file or Quarto/RMarkdown document featuring their underlying code; and a 5–10-page reflection memo (double-spaced) where they interpret their results and establish connections between their visualization and recent social scientific scholarship. Additional assignment instructions will be made available online (that is, embedded within this syllabus) by early March. Datasets will be provided.

30%

Friday, March 28th at 8:00 PM.

Final Presentations

In Module IV of the class, students will deliver a 10–15-minute presentation based on, or informed by, their term paper. A rubric detailing my basic expectations will be included in this syllabus by early April.

15%

During Module IV

Final Project

Drawing on the applied examples featured in Module II, students must submit a term paper on a topic related to (i) race, ethnicity and nation; (ii) gender and sexuality; or (iii) culture. To earn an A, students must also submit a companion data visualization using a truncated version of the General Social Survey, which will be made available on Moodle. Students are free to create this visualization in either R or Python. Additional assignment instructions will be posted online by early April.

35%

Friday, May 9th at 8:00 PM.

Evaluations

Let’s look at one row at a time.

Evaluations


Task Description Weight Deadline or Evaluative Time Horizon

Participation

Students are expected to actively participate in class discussions by raising their hands to share ideas, asking clarifying questions, assisting peers when needed, and providing constructive feedback during final presentations.

10%

All Semester

Evaluations

Task Description Weight Deadline or Evaluative Time Horizon

Coding Assignment

Students are required to submit a short coding assignment in early March. For this assignment, they will clean a dataset in R, report basic descriptive statistics, and create a simple data visualization. Students must also include their script file (i.e., a .R document) as part of their submission. Additional assignment instructions will be made available online (that is, embedded within this syllabus) by mid-February.

10%

Monday, March 3rd at 8:00 PM.

Evaluations

Task Description Weight Deadline or Evaluative Time Horizon

Midterm Assignment

For their midterm assignment, students must—either individually or in groups of 2-3—submit a relatively complex data visualization; an annotated script file or Quarto/RMarkdown document featuring their underlying code; and a 5–10-page reflection memo (double-spaced) where they interpret their results and establish connections between their visualization and recent social scientific scholarship. Additional assignment instructions will be made available online (that is, embedded within this syllabus) by early March. Datasets will be provided.

30%

Friday, March 28th at 8:00 PM.

Evaluations


Task Description Weight Deadline or Evaluative Time Horizon

Final Presentations

In Module IV of the class, students will deliver a 10–15-minute presentation based on, or informed by, their term paper. A rubric detailing my basic expectations will be included in this syllabus by early April.

15%

During Module IV

Evaluations

Task Description Weight Deadline or Evaluative Time Horizon

Final Project

Drawing on the applied examples featured in Module II, students must submit a term paper on a topic related to (i) race, ethnicity and nation; (ii) gender and sexuality; or (iii) culture. To earn an A, students must also submit a companion data visualization using a truncated version of the General Social Survey, which will be made available on Moodle. Students are free to create this visualization in either R or Python. Additional assignment instructions will be posted online by early April.

35%

Friday, May 9th at 8:00 PM.

Evaluations


Guidelines for Major Assignments

Guidelines for the three key deliverables—i.e., the midterm assignment; the final presentation; and the term paper—will be gradually rolled out (or uploaded online) as deadlines come into focus.

Norms, Rules, Regulations & More

Accessibility and Accommodations

If you require accommodations, please contact Student Accessibility Services as soon as possible and submit an application through the new AIM Portal. More generally, if you have any suggestions about how this class can be more accessible and inclusive, please let me know via e-mail or during office hours.

Norms, Rules, Regulations & More

Class Policies

Please review the Amherst College Honor Code, which can be accessed in its entirety here.

Violations of the Honor Code will be promptly reported to the Dean of Students. As Section 1.1 of the Honor Code indicates, plagiarism is a serious offense. In most cases, students who plagiarize the work of others will fail this class and may face additional disciplinary penalties. Moreover, as detailed in Sections 1.2 to 1.4 of the Honor Code, students must respect others in the classroom, including those whose views deviate from their own. Failure to do so will prompt disciplinary action.

There is no reason to pretend like generative artificial intelligence (GAI) does not exist in the world out there. These systems have arrived, and they may revolutionize how higher education “works.” With this in mind, you are free to use ChatGPT and its analogues for class assignments—but you have to cite the GAI you are using.
Failure to do so amounts to plagiarism.

To reiterate:

Generative AI Policy

If you use a GAI tool (like ChatGPT) and do not cite it, it is a form of plagiarism.

You are expected to attend each and every class. If you do not, you will lose points for participation. That said, I am aware that you are all human beings whose lives are often fraught with uncertainty. If something comes up, please let me know and I will do my best to be as accommodating as possible. Extended absences may, however, require additional documentation (e.g., note from a physician).

Provisionally, I have decided to allow students to use laptops and tablets in class. This is, however, highly conditional. If I observe students using their electronic devices for non-academic pursuits (e.g., shopping, consuming social media and so on), I will institute a sweeping ban on electronics. Do not be the one to contravene our social contract

On weekdays and non-holidays, I will respond to e-mails within 48 hours. If I fail to meet this standard, please send me a follow-up message with a gentle reminder. On weekends and breaks, I will not respond to e-mails unless you have an emergency. If you do, please include EMERGENCY in the subject line.

Assignments must be submitted on time. A late submission will result in a penalty of 5% for each day beyond the deadline. However, as noted, we are well aware that life can present unexpected challenges. If you anticipate missing a deadline or have an emergency, please inform us soon as you can. Extensions may be granted on a case-by-case basis.

Four Basic Expectations


  1. Read carefully but efficiently.

  2. Practice coding as often as you can.

  3. Participate in class conversations. We’ll all learn more that way.

  4. Have fun!

“Homework”

Download

Download RStudio

Download RStudio by clicking here.

Introduction Pt. II–
January 30th

What We’ll Do Today

  1. Briefly review some of the optionals readings for Week 1.

  2. Launch and pull in some data.

Transcending a
“General Linear Reality”

Putting You on the Spot

y = \beta_0 + \beta_1 x + \epsilon

A Quick Question How would you explain what a linear regression model is?

A Silly Example

library(palmerpenguins)

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

A Silly Example

basic_model <- lm(bill_length_mm ~ flipper_length_mm, data = penguins)

basic_model |> summary()

Call:
lm(formula = bill_length_mm ~ flipper_length_mm, data = penguins)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.5792 -2.6715 -0.5721  2.0148 19.1518 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -7.26487    3.20016   -2.27   0.0238 *  
flipper_length_mm  0.25477    0.01589   16.03   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.126 on 340 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.4306,    Adjusted R-squared:  0.4289 
F-statistic: 257.1 on 1 and 340 DF,  p-value: < 2.2e-16

A Silly Example

Show the underlying code
library(marginaleffects)

avg_predictions(basic_model,
                variables = "flipper_length_mm") %>% 
as_tibble() %>% 
ggplot(., aes(x = flipper_length_mm, y = estimate)) +
geom_line(colour = "#b7a5d3", linewidth = 1.1) +
geom_ribbon(mapping = aes(ymin = conf.low,
                          ymax = conf.high),
            alpha = 0.1) +
theme_bw() +
labs(x = "Flippper Length (mm)",
     y = "Predicted Bill Length (mm)") +
theme(panel.grid.minor = element_blank())  

A Silly Example

Adding Another Predictor

Show the underlying code
new_model <- update(basic_model, . ~ . + bill_depth_mm)

model_labels <-  c("flipper_length_mm" = "Flipper Length (mm)",
                   "bill_depth_mm" = "Bill Depth (mm)")

library(modelsummary)

modelplot(new_model, 
          coef_omit = "Int",
          coef_map = model_labels) +
geom_vline(xintercept = 0, 
           linetype = "dotted")

So, What’s the Issue?

To use such a model to actually represent social reality, one must map the processes of social life onto the algebra of linear transformations. This connection makes assumptions about social life—not the statistical assumptions required to estimate the equations, but philosophical assumptions about how the social world works.

(Abbott 1988:170, EMPHASIS ADDED)

So, What’s the Issue?

Such representational use assumes that the social world consists of fixed entities (the units of analysis) that have attributes (the variables). These attributes interact, in causal or actual time, to create outcomes, themselves measurable as attributes of the fixed entities. The variable attributes have only one causal meaning (one pattern of effects) in a given study, although different studies may assign similar attributes different meanings. An attribute’s causal meaning cannot depend on the entity’s location in the attribute space (its context), since the linear transformation is the same throughout that space. For similar reasons, the past path of an entity through the attribute space (its history) has no influence on its future path, nor can the causal importance of an attribute change from one entity to the next. All must obey the same transformation.

(Abbott 1988:170, EMPHASIS ADDED)

Transcending a General Linear Reality

Figure 2 from Syrda (2023)

Transcending a General Linear Reality

Figure 5 from Zhao (2023)

Transcending a General Linear Reality

Using Sequence Analyses (cf. Abbott 1995)

Karim and Drago’s Democratic Strain and Populist Fervor in India, America and Beyond

Karim and Drago’s Democratic Strain and Populist Fervor in India, America and Beyond

From Categories to Categorical Typicality

The Dangers of Nominalism


[T]he categories social scientists use in our research belie the inherent fuzziness and vagueness of our most commonly used and important concepts. In many, if not most, cases, no particular feature is a common element in defining a concept. Rather, there are only family resemblances—criss-crossing patterns of similarities between different members.

(Monk 2022:9, EMPHASIS ADDED)

The Dangers of Nominalism

A Random (Yet Relevant) Question How would you explain what a bird is?

The Dangers of Nominalism

[C]onsider the immense heterogeneity of features possessed by members of the category “bird” … A prototypical bird, for instance, may be of a certain size (relatively small), have feathers and a beak, be able to fly and lay eggs, and so on. These features, in turn, are weighted in terms of importance to the prototype. Clusters of key categorical cues and the relations between these cues are known as prototypes—abstract summary representations of “best examples” of a concept … Flight, for instance, may be weighted more heavily than feathers. Even the relations between features may be weighted. Having feathers and a beak may be more important than a potential bird’s size and its ability to lay eggs. These features, the sets of properties we associate with a term, are called intensions and form the foundation of human thought.

(Monk 2022:9, EMPHASIS ADDED)

The Dangers of Nominalism

In a paper-in-progress, Martin Lukk and I posit that party politics can be understood through this lens, too.

Consequently, we focus on a party’s far right typicality in lieu of their nominal membership in the far right party family.

The Dangers of Nominalism

Some Preliminary Results—Two Data Points


country year party leader far right typicality
United States of America 2012
United States of America 2016

Linking Sociological Theory and Method

Do Not Condition on a Collider

Show the underlying code
library(ggdag)

collider_triangle(x = "Education", y = "Error Term",
                  m = "Income (Truncated)") %>% 
ggdag_dseparated(text = FALSE,
                 use_labels = "label",
                 controlling_for = "m") +
theme_dag()

Adaptation of Figure 5 from Elwert and Winship (2014)

Define Your Estimand

Figure 1 from Lundberg, Johnson and Stewart (2021)

Some Light

Your Tasks

For the rest of today’s session, work with data from the wonderful {palmerpenguins} package.

Here’s one way to access the data:

install.packages("palmerpenguins")

library(palmerpenguins)

penguins

I want you to explore other ways to import the same data frame. To this end, download different penguins file types by—

Your Tasks

Here’s how you can access the .csv version of the dataset:

library(tidyverse)

penguins <- read_csv("penguins.csv")

Working Directories

Make sure the files are available in a working directory that’s easy to find. Next week, we’ll discuss how to streamline this process using Projects in RStudio.

Your Tasks

In small groups, try to (1) filter() observations/rows in the penguins data frame; and (2) select() variables/columns of substantive interest.

Note

If this feels a bit too complex, fear not. We will be going through all the relevant steps in detail next week.

Manipulate the penguins data frame and report basic descriptive statistics (e.g., group means, proportions and so on). Then, generate interesting data visualizations using ggplot2 to highlight cool patterns in the data. You may also fit and visualize models.

Some Optional “Homework”

Get Git

Sign Up for GitHub

Sign up for GitHub by clicking here.

Use usethis

Full Page

Enjoy the Weekend

References

Note: Scroll to access the entire bibliography

Abbott, Andrew. 1988. “Transcending General Linear Reality.” Sociological Theory 6(2):169–86. doi: 10.2307/202114.
Abbott, Andrew. 1995. Sequence Analysis: New Methods for Old Ideas.” Annual Review of Sociology 21:93–113.
Elwert, Felix, and Christopher Winship. 2014. “Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable.” Annual Review of Sociology 40(Volume 40, 2014):31–53. doi: 10.1146/annurev-soc-071913-043455.
Healy, Kieran Joseph. 2019. Data Visualization: A Practical Introduction. Princeton, NJ: Princeton University Press.
Karim, Sakeef M. 2024a. “Islam and the Transmission of Cultural Identity in Four European Countries.” Social Forces 103(2):756–79. doi: 10.1093/sf/soae076.
Karim, Sakeef M. 2024b. “The Organization of Ethnocultural Attachments Among Second- Generation Germans.” Social Science Research 118:102959. doi: 10.1016/j.ssresearch.2023.102959.
Lundberg, Ian, Rebecca Johnson, and Brandon M. Stewart. 2021. “What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.” American Sociological Review 86(3):532–65. doi: 10.1177/00031224211004187.
McKinney, Wes. 2022. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and Jupyter. 3rd Edition. Sebastopol, CA: O’Reilly.
Monk, Ellis P. 2022. “Inequality Without Groups: Contemporary Theories of Categories, Intersectional Typicality, and the Disaggregation of Difference.” Sociological Theory 40(1):3–27. doi: 10.1177/07352751221076863.
Soehl, Thomas, and Sakeef M. Karim. 2021. “How Legacies of Geopolitical Trauma Shape Popular Nationalism Today.” American Sociological Review 86(3):406–29. doi: 10.1177/00031224211011981.
Syrda, Joanna. 2023. “Gendered Housework: Spousal Relative Income, Parenthood and Traditional Gender Identity Norms.” Work, Employment and Society 37(3):794–813. doi: 10.1177/09500170211069780.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd edition. Sebastopol, CA: O’Reilly.
Wickham, Hadley, Danielle Navarro, and Thomas Lin Pedersen. 2025. ggplot2: Elegant Graphics for Data Analysis. 3rd Edition. New York: Springer.
Zhao, Linda. 2023. “From Superdiversity to Consolidation: Implications of Structural Intersectionality for Interethnic Friendships.” American Journal of Sociology 128(4):1114–57. doi: 10.1086/723435.