Week 1
Introduction to the Course & Setting Up
SOCI 269
Now, it’s my turn.
Figure 8
from Karim (Karim 2024a)
Adaptation of results from Karim (2024b)
Figure 1
from Soehl and Karim (2021)
Karim and Lukk’s The Radicalization of Mainstream Parties in the 21st Century
You can access the syllabus here.
Fridays, 2:30-5:00 PM in Morgan Hall (Room 203 A) or during a Zoom Open Slot.
Appointment Policy
All meetings, even during office hours, must be scheduled in advance via Google Calendar.
Data Visualization: A Practical Introduction
(Healy 2019)
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and Jupyter
(McKinney 2022)
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
(Wickham, Çetinkaya-Rundel, and Grolemund 2023)
ggplot2: Elegant Graphics for Data Analysis
(Wickham, Navarro, and Pedersen 2025)
Course Readings
All course readings can be accessed via our eReserves page on Moodle.
Module I
Introduction to and ggplot2
.
Module II
Three weeks unpacking quantitative sociological scholarship.
Module III
Basic introduction to Python
and seaborn
.
Module IV
Final Presentations!
Note: Scroll to access the entire table
Task | Description | Weight | Deadline or Evaluative Time Horizon |
---|---|---|---|
Participation |
Students are expected to actively participate in class discussions by raising their hands to share ideas, asking clarifying questions, assisting peers when needed, and providing constructive feedback during final presentations. |
10% |
All Semester |
Coding Assignment |
Students are required to submit a short coding assignment in early March. For this assignment, they will clean a dataset in |
10% |
Monday, March 3rd at 8:00 PM. |
Midterm Assignment |
For their midterm assignment, students must—either individually or in groups of 2-3—submit a relatively complex data visualization; an annotated script file or Quarto/RMarkdown document featuring their underlying code; and a 5–10-page reflection memo (double-spaced) where they interpret their results and establish connections between their visualization and recent social scientific scholarship. Additional assignment instructions will be made available online (that is, embedded within this syllabus) by early March. Datasets will be provided. |
30% |
Friday, March 28th at 8:00 PM. |
Final Presentations |
In Module IV of the class, students will deliver a 10–15-minute presentation based on, or informed by, their term paper. A rubric detailing my basic expectations will be included in this syllabus by early April. |
15% |
During Module IV |
Final Project |
Drawing on the applied examples featured in Module II, students must submit a term paper on a topic related to (i) race, ethnicity and nation; (ii) gender and sexuality; or (iii) culture. To earn an A, students must also submit a companion data visualization using a truncated version of the General Social Survey, which will be made available on Moodle. Students are free to create this visualization in either |
35% |
Friday, May 9th at 8:00 PM. |
Let’s look at one row at a time.
Task | Description | Weight | Deadline or Evaluative Time Horizon |
---|---|---|---|
Participation |
Students are expected to actively participate in class discussions by raising their hands to share ideas, asking clarifying questions, assisting peers when needed, and providing constructive feedback during final presentations. |
10% |
All Semester |
Task | Description | Weight | Deadline or Evaluative Time Horizon |
---|---|---|---|
Coding Assignment |
Students are required to submit a short coding assignment in early March. For this assignment, they will clean a dataset in |
10% |
Monday, March 3rd at 8:00 PM. |
Task | Description | Weight | Deadline or Evaluative Time Horizon |
---|---|---|---|
Midterm Assignment |
For their midterm assignment, students must—either individually or in groups of 2-3—submit a relatively complex data visualization; an annotated script file or Quarto/RMarkdown document featuring their underlying code; and a 5–10-page reflection memo (double-spaced) where they interpret their results and establish connections between their visualization and recent social scientific scholarship. Additional assignment instructions will be made available online (that is, embedded within this syllabus) by early March. Datasets will be provided. |
30% |
Friday, March 28th at 8:00 PM. |
Task | Description | Weight | Deadline or Evaluative Time Horizon |
---|---|---|---|
Final Presentations |
In Module IV of the class, students will deliver a 10–15-minute presentation based on, or informed by, their term paper. A rubric detailing my basic expectations will be included in this syllabus by early April. |
15% |
During Module IV |
Task | Description | Weight | Deadline or Evaluative Time Horizon |
---|---|---|---|
Final Project |
Drawing on the applied examples featured in Module II, students must submit a term paper on a topic related to (i) race, ethnicity and nation; (ii) gender and sexuality; or (iii) culture. To earn an A, students must also submit a companion data visualization using a truncated version of the General Social Survey, which will be made available on Moodle. Students are free to create this visualization in either |
35% |
Friday, May 9th at 8:00 PM. |
Guidelines for Major Assignments
Guidelines for the three key deliverables—i.e., the midterm assignment; the final presentation; and the term paper—will be gradually rolled out (or uploaded online) as deadlines come into focus.
If you require accommodations, please contact Student Accessibility Services as soon as possible and submit an application through the new AIM Portal. More generally, if you have any suggestions about how this class can be more accessible and inclusive, please let me know via e-mail or during office hours.
Please review the Amherst College Honor Code, which can be accessed in its entirety here.
Violations of the Honor Code will be promptly reported to the Dean of Students. As Section 1.1 of the Honor Code indicates, plagiarism is a serious offense. In most cases, students who plagiarize the work of others will fail this class and may face additional disciplinary penalties. Moreover, as detailed in Sections 1.2 to 1.4 of the Honor Code, students must respect others in the classroom, including those whose views deviate from their own. Failure to do so will prompt disciplinary action.
There is no reason to pretend like generative artificial intelligence (GAI) does not exist in the world out there. These systems have arrived, and they may revolutionize how higher education “works.” With this in mind, you are free to use ChatGPT and its analogues for class assignments—but you have to cite the GAI you are using.
Failure to do so amounts to plagiarism.
To reiterate:
Generative AI Policy
If you use a GAI tool (like ChatGPT) and do not cite it, it is a form of plagiarism.
You are expected to attend each and every class. If you do not, you will lose points for participation. That said, I am aware that you are all human beings whose lives are often fraught with uncertainty. If something comes up, please let me know and I will do my best to be as accommodating as possible. Extended absences may, however, require additional documentation (e.g., note from a physician).
Provisionally, I have decided to allow students to use laptops and tablets in class. This is, however, highly conditional. If I observe students using their electronic devices for non-academic pursuits (e.g., shopping, consuming social media and so on), I will institute a sweeping ban on electronics. Do not be the one to contravene our social contract
On weekdays and non-holidays, I will respond to e-mails within 48 hours. If I fail to meet this standard, please send me a follow-up message with a gentle reminder. On weekends and breaks, I will not respond to e-mails unless you have an emergency. If you do, please include EMERGENCY in the subject line.
Assignments must be submitted on time. A late submission will result in a penalty of 5% for each day beyond the deadline. However, as noted, we are well aware that life can present unexpected challenges. If you anticipate missing a deadline or have an emergency, please inform us soon as you can. Extensions may be granted on a case-by-case basis.
Read carefully but efficiently.
Practice coding as often as you can.
Participate in class conversations. We’ll all learn more that way.
Have fun!
Download RStudio by clicking here.
Briefly review some of the optionals readings for Week 1.
Launch and pull in some data.
y = \beta_0 + \beta_1 x + \epsilon
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
Call:
lm(formula = bill_length_mm ~ flipper_length_mm, data = penguins)
Residuals:
Min 1Q Median 3Q Max
-8.5792 -2.6715 -0.5721 2.0148 19.1518
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.26487 3.20016 -2.27 0.0238 *
flipper_length_mm 0.25477 0.01589 16.03 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.126 on 340 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.4306, Adjusted R-squared: 0.4289
F-statistic: 257.1 on 1 and 340 DF, p-value: < 2.2e-16
library(marginaleffects)
avg_predictions(basic_model,
variables = "flipper_length_mm") %>%
as_tibble() %>%
ggplot(., aes(x = flipper_length_mm, y = estimate)) +
geom_line(colour = "#b7a5d3", linewidth = 1.1) +
geom_ribbon(mapping = aes(ymin = conf.low,
ymax = conf.high),
alpha = 0.1) +
theme_bw() +
labs(x = "Flippper Length (mm)",
y = "Predicted Bill Length (mm)") +
theme(panel.grid.minor = element_blank())
To use such a model to actually represent social reality, one must map the processes of social life onto the algebra of linear transformations. This connection makes assumptions about social life—not the statistical assumptions required to estimate the equations, but philosophical assumptions about how the social world works.
(Abbott 1988:170, EMPHASIS ADDED)
Such representational use assumes that the social world consists of fixed entities (the units of analysis) that have attributes (the variables). These attributes interact, in causal or actual time, to create outcomes, themselves measurable as attributes of the fixed entities. The variable attributes have only one causal meaning (one pattern of effects) in a given study, although different studies may assign similar attributes different meanings. An attribute’s causal meaning cannot depend on the entity’s location in the attribute space (its context), since the linear transformation is the same throughout that space. For similar reasons, the past path of an entity through the attribute space (its history) has no influence on its future path, nor can the causal importance of an attribute change from one entity to the next. All must obey the same transformation.
(Abbott 1988:170, EMPHASIS ADDED)
Figure 2
from Syrda (2023)
Figure 5
from Zhao (2023)
[T]he categories social scientists use in our research belie the inherent fuzziness and vagueness of our most commonly used and important concepts. In many, if not most, cases, no particular feature is a common element in defining a concept. Rather, there are only family resemblances—criss-crossing patterns of similarities between different members.
(Monk 2022:9, EMPHASIS ADDED)
[C]onsider the immense heterogeneity of features possessed by members of the category “bird” … A prototypical bird, for instance, may be of a certain size (relatively small), have feathers and a beak, be able to fly and lay eggs, and so on. These features, in turn, are weighted in terms of importance to the prototype. Clusters of key categorical cues and the relations between these cues are known as prototypes—abstract summary representations of “best examples” of a concept … Flight, for instance, may be weighted more heavily than feathers. Even the relations between features may be weighted. Having feathers and a beak may be more important than a potential bird’s size and its ability to lay eggs. These features, the sets of properties we associate with a term, are called intensions and form the foundation of human thought.
(Monk 2022:9, EMPHASIS ADDED)
In a paper-in-progress, Martin Lukk and I posit that party politics can be understood through this lens, too.
Consequently, we focus on a party’s far right typicality in lieu of their nominal membership in the far right party family.
country | year | party | leader | far right typicality |
---|---|---|---|---|
United States of America | 2012 | |||
United States of America | 2016 |
Adaptation of Figure 5
from Elwert and Winship (2014)
Figure 1
from Lundberg, Johnson and Stewart (2021)
For the rest of today’s session, work with data from the wonderful {palmerpenguins}
package.
Here’s one way to access the data:
I want you to explore other ways to import the same data frame. To this end, download different penguins
file types by—
Accessing this GitHub repository.
Or simply clicking this link .
Here’s how you can access the .csv
version of the dataset:
Working Directories
Make sure the files are available in a working directory that’s easy to find. Next week, we’ll discuss how to streamline this process using Projects in RStudio.
In small groups, try to (1) filter()
observations/rows in the penguins
data frame; and (2) select()
variables/columns of substantive interest.
Note
If this feels a bit too complex, fear not. We will be going through all the relevant steps in detail next week.
Manipulate the
penguins
data frame and report basic descriptive statistics (e.g., group means, proportions and so on). Then, generate interesting data visualizations usingggplot2
to highlight cool patterns in the data. You may also fit and visualize models.
Sign up for GitHub by clicking here.
usethis
Note: Scroll to access the entire bibliography