TidyTuesday: Income Inequality & Health

SDG 10: Reduced Inequalities

Author

Nichodemus Amollo

Published

June 8, 2021

Overview

This project explores the Income Inequality & Health dataset from TidyTuesday, focusing on data visualization and analysis techniques.

SDG Alignment: SDG 10: Reduced Inequalities

Load Required Packages

Data Import

Rows: 100
Columns: 4
$ id       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ value    <dbl> -0.5604756, -0.7906531, 0.7680552, 0.8385636, 0.9678513, 2.68…
$ category <chr> "B", "B", "A", "B", "C", "C", "A", "B", "A", "B", "C", "C", "…
$ date     <date> 2021-06-08, 2021-06-09, 2021-06-10, 2021-06-11, 2021-06-12, …

       id             value          category              date           
 Min.   :  1.00   Min.   :-2.667   Length:100         Min.   :2021-06-08  
 1st Qu.: 25.75   1st Qu.: 1.465   Class :character   1st Qu.:2021-07-02  
 Median : 50.50   Median : 2.192   Mode  :character   Median :2021-07-27  
 Mean   : 50.50   Mean   : 2.473                      Mean   :2021-07-27  
 3rd Qu.: 75.25   3rd Qu.: 3.383                      3rd Qu.:2021-08-21  
 Max.   :100.00   Max.   :10.303                      Max.   :2021-09-15

Data Exploration

  id      value category       date
1  1 -0.5604756        B 2021-06-08
2  2 -0.7906531        B 2021-06-09
3  3  0.7680552        A 2021-06-10
4  4  0.8385636        B 2021-06-11
5  5  0.9678513        C 2021-06-12
6  6  2.6829163        C 2021-06-13

'data.frame':   100 obs. of  4 variables:
 $ id      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ value   : num  -0.56 -0.791 0.768 0.839 0.968 ...
 $ category: chr  "B" "B" "A" "B" ...
 $ date    : Date, format: "2021-06-08" "2021-06-09" ...

      id    value category     date 
       0        0        0        0

       id             value          category              date           
 Min.   :  1.00   Min.   :-2.667   Length:100         Min.   :2021-06-08  
 1st Qu.: 25.75   1st Qu.: 1.465   Class :character   1st Qu.:2021-07-02  
 Median : 50.50   Median : 2.192   Mode  :character   Median :2021-07-27  
 Mean   : 50.50   Mean   : 2.473                      Mean   :2021-07-27  
 3rd Qu.: 75.25   3rd Qu.: 3.383                      3rd Qu.:2021-08-21  
 Max.   :100.00   Max.   :10.303                      Max.   :2021-09-15

Data Wrangling

Visualizations

Visualization 1: Main Analysis

Visualization 2: Distribution Analysis

Interactive Visualization

Analysis & Insights

Key Findings

Finding 1: [Description of key insight]
Finding 2: [Description of key insight]
Finding 3: [Description of key insight]

Statistical Summary

# A tibble: 3 × 6
  category  mean median    sd   min   max
  <fct>    <dbl>  <dbl> <dbl> <dbl> <dbl>
1 A         2.54   2.07  2.62 -2.52 10.3 
2 B         2.20   1.90  2.20 -1.68 10.1 
3 C         2.74   2.66  2.19 -2.67  8.77

Policy Implications

[Provide policy-relevant insights and recommendations based on the analysis]

Next Steps

Additional statistical modeling
Geographic analysis if spatial data available
Time series forecasting
Comparative analysis across regions

References

Session Info

Code

utils::sessionInfo()

R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Africa/Nairobi
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] plotly_4.12.0   patchwork_1.3.2 ggtext_0.1.2    showtext_0.9-7 
 [5] showtextdb_3.0  sysfonts_0.8.9  here_1.0.2      lubridate_1.9.5
 [9] forcats_1.0.1   stringr_1.6.0   dplyr_1.2.0     purrr_1.2.1    
[13] readr_2.1.6     tidyr_1.3.2     tibble_3.3.1    ggplot2_4.0.2  
[17] tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] utf8_1.2.6          generics_0.1.4      renv_1.0.7         
 [4] xml2_1.5.2          stringi_1.8.7       hms_1.1.4          
 [7] digest_0.6.39       magrittr_2.0.4      evaluate_1.0.5     
[10] grid_4.5.1          timechange_0.4.0    RColorBrewer_1.1-3 
[13] fastmap_1.2.0       rprojroot_2.1.1     jsonlite_2.0.0     
[16] httr_1.4.8          crosstalk_1.2.2     viridisLite_0.4.3  
[19] scales_1.4.0        lazyeval_0.2.2      cli_3.6.5          
[22] rlang_1.1.7         withr_3.0.2         yaml_2.3.12        
[25] otel_0.2.0          tools_4.5.1         tzdb_0.5.0         
[28] vctrs_0.7.1         R6_2.6.1            lifecycle_1.0.5    
[31] htmlwidgets_1.6.4   pkgconfig_2.0.3     pillar_1.11.1      
[34] gtable_0.3.6        data.table_1.18.2.1 glue_1.8.0         
[37] Rcpp_1.1.1          xfun_0.56           tidyselect_1.2.1   
[40] knitr_1.51          farver_2.1.2        htmltools_0.5.9    
[43] labeling_0.4.3      rmarkdown_2.30      compiler_4.5.1     
[46] S7_0.2.1            gridtext_0.1.5

⬅️ Back to TidyTuesday Index

--- title: "TidyTuesday: Income Inequality & Health" subtitle: "SDG 10: Reduced Inequalities" author: "Nichodemus Amollo" date: "2021-06-08" format: html: toc: true toc-depth: 2 code-fold: show code-tools: true code-copy: true theme: light: [cosmo, ../../custom.scss] dark: [darkly, ../../custom.scss] css: ../../styles.scss --- ::: {.hero-banner} # **Income Inequality & Health** Analysis of income inequality & health data from TidyTuesday 2021 - Week of 2021-06-08 ::: ## Overview This project explores the **Income Inequality & Health** dataset from TidyTuesday, focusing on data visualization and analysis techniques. **SDG Alignment:** SDG 10: Reduced Inequalities ## Load Required Packages ```{r load-packages, echo=FALSE, message=FALSE, warning=FALSE} library(tidyverse) library(lubridate) library(here) library(showtext) library(ggtext) library(patchwork) library(plotly) # For interactive visualizations # Optional: Load additional packages based on analysis needs # library(sf) # For spatial data # library(rnaturalearth) # For map data # library(gganimate) # For animations ``` ## Data Import ```{r load-data, echo=FALSE} # Load data using tidytuesdayR # library(tidytuesdayR) # tuesdata <- tt_load('2021-06-08') # df <- tuesdata$data_name # Alternative: Direct CSV download # df <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-06-08/data.csv') # Sample data for demonstration set.seed(123) df <- data.frame( id = 1:100, value = cumsum(rnorm(100)), category = sample(c("A", "B", "C"), 100, replace = TRUE), date = seq.Date(from = as.Date("2021-06-08"), by = "day", length.out = 100) ) glimpse(df) summary(df) ``` ## Data Exploration ```{r data-exploration, echo=FALSE} # Explore data structure head(df) str(df) # Check for missing values colSums(is.na(df)) # Summary statistics summary(df) ``` ## Data Wrangling ```{r data-wrangling, echo=FALSE} # Clean and prepare data df_clean <- df %>% filter(!is.na(value)) %>% mutate( category = as.factor(category), value_group = cut(value, breaks = 5, labels = c("Low", "Medium-Low", "Medium", "Medium-High", "High")) ) # Group by category if applicable df_summary <- df_clean %>% group_by(category) %>% summarise( mean_value = mean(value, na.rm = TRUE), median_value = median(value, na.rm = TRUE), count = n() ) ``` ## Visualizations ### Visualization 1: Main Analysis ```{r visualization-1, echo=FALSE} p1 <- ggplot(df_clean, aes(x = date, y = value, color = category)) + geom_line(size = 1.2, alpha = 0.7) + geom_point(size = 2, alpha = 0.8) + scale_color_manual(values = c("#667eea", "#764ba2", "#f093fb")) + labs( title = "Income Inequality & Health", subtitle = "Time series analysis", x = "Date", y = "Value", color = "Category", caption = "Source: TidyTuesday | Visualization: Nichodemus Amollo" ) + theme_minimal() + theme( plot.title = element_text(face = "bold", size = 18, hjust = 0.5), plot.subtitle = element_text(size = 12, hjust = 0.5, color = "gray50"), legend.position = "bottom" ) print(p1) ``` ### Visualization 2: Distribution Analysis ```{r visualization-2, echo=FALSE} p2 <- ggplot(df_clean, aes(x = category, y = value, fill = category)) + geom_violin(alpha = 0.7) + geom_boxplot(width = 0.2, alpha = 0.5) + scale_fill_manual(values = c("#667eea", "#764ba2", "#f093fb")) + labs( title = "Distribution by Category", x = "Category", y = "Value" ) + theme_minimal() print(p2) ``` ### Interactive Visualization ```{r visualization-interactive, echo=FALSE} # Create interactive plotly visualization p_interactive <- plot_ly( df_clean, x = ~date, y = ~value, color = ~category, type = "scatter", mode = "lines+markers", hovertemplate = "Date: %{x} Value: %{y}<extra></extra>" ) %>% layout( title = "Income Inequality & Health", xaxis = list(title = "Date"), yaxis = list(title = "Value"), hovermode = "x unified" ) p_interactive ``` ## Analysis & Insights ### Key Findings 1. **Finding 1**: [Description of key insight] 2. **Finding 2**: [Description of key insight] 3. **Finding 3**: [Description of key insight] ### Statistical Summary ```{r statistical-summary, echo=FALSE} # Statistical analysis df_clean %>% group_by(category) %>% summarise( mean = mean(value), median = median(value), sd = sd(value), min = min(value), max = max(value) ) ``` ## Policy Implications [Provide policy-relevant insights and recommendations based on the analysis] ## Next Steps - [ ] Additional statistical modeling - [ ] Geographic analysis if spatial data available - [ ] Time series forecasting - [ ] Comparative analysis across regions ## References - [TidyTuesday GitHub Repository](https://github.com/rfordatascience/tidytuesday) - [UN Sustainable Development Goals](https://sdgs.un.org/goals) - [Data Source](https://github.com/rfordatascience/tidytuesday/tree/master/data/2021/2021-06-08) ## Session Info ```{r session-info} utils::sessionInfo() ``` --- [⬅️ Back to TidyTuesday Index](index.qmd)