Resources & Learning Materials

Curated collection of data science tools, courses, and resources

Data Analytics & Research Resources

A comprehensive collection of free and paid resources for aspiring and practicing data analysts, researchers, and health economists.


πŸ“„ Resume

Latest resume for download and quick review.

View PDF β†— Download PDF ↓ Open CV Page β†’

πŸ“Š Data Analytics Tools

R Programming

Essential Packages:

  • tidyverse - Data manipulation and visualization ecosystem
  • ggplot2 - Advanced data visualization
  • dplyr - Data manipulation
  • tidyr - Data tidying
  • readr - Data import
  • shiny - Interactive web applications
  • rmarkdown - Dynamic documents
  • caret - Machine learning workflows
  • survival - Survival analysis
  • data.table - High-performance data manipulation

Development Tools:

Python for Data Science

Core Libraries:

IDEs & Platforms:

Statistical Software

  • Stata - Comprehensive statistical package (Paid)
  • SPSS - User-friendly statistics (Paid)
  • SAS - Enterprise analytics (Paid)
  • JASP - Open-source alternative to SPSS (FREE)
  • jamovi - Free statistical software (FREE)

Survey Tools

Mobile Data Collection:

Online Surveys:

Visualization Tools

Business Intelligence:

Specialized Visualization:

  • D3.js - JavaScript visualization library (FREE)
  • Plotly - Interactive graphs (Freemium)
  • Flourish - Data storytelling (Freemium)
  • RAWGraphs - Vector-based visualizations (FREE)

Databases

Relational Databases:

NoSQL Databases:

  • MongoDB - Document database (Freemium)
  • Redis - In-memory database (FREE)
  • Firebase - Google’s backend platform (Freemium)

πŸŽ“ Learning Resources

Free Online Courses

Data Analytics Fundamentals:

  1. Google Data Analytics Certificate - Coursera (FREE audit)
  2. IBM Data Analyst Certificate - Coursera (FREE audit)
  3. Microsoft: Data Science for Beginners - GitHub curriculum (FREE)
  4. freeCodeCamp: Data Analysis with Python - Comprehensive course (FREE)

R Programming:

  1. R for Data Science (Online Book) - Hadley Wickham (FREE)
  2. Swirl - Learn R interactively in R console (FREE)
  3. R-bloggers - Community tutorials (FREE)
  4. DataCamp: Introduction to R - First chapter free (FREE)

Python for Data Science:

  1. Python for Everybody - Dr. Chuck (FREE)
  2. Kaggle Learn - Micro-courses (FREE)
  3. DataCamp: Intro to Python - First chapter (FREE)
  4. Harvard CS109: Data Science - Full course materials (FREE)

SQL:

  1. SQLBolt - Interactive SQL lessons (FREE)
  2. Mode SQL Tutorial - Comprehensive guide (FREE)
  3. W3Schools SQL - Reference and practice (FREE)
  4. Khan Academy: Intro to SQL - Video lessons (FREE)

Statistics:

  1. Khan Academy Statistics - Comprehensive stats course (FREE)
  2. Stat110 Harvard - Probability course (FREE)
  3. Penn State STAT 500 - Applied statistics (FREE)
  4. Statistics with R (Duke) - Coursera (FREE audit)

Machine Learning:

  1. Andrew Ng’s Machine Learning - Coursera classic (FREE audit)
  2. Fast.ai - Practical deep learning (FREE)
  3. Google’s Machine Learning Crash Course - With TensorFlow (FREE)
  4. Elements of AI - AI basics (FREE)

YouTube Channels

General Data Science:

R Programming:

Python:

Health Data Science:

Books (Free & Paid)

Free Online Books:

  1. R for Data Science - Hadley Wickham & Garrett Grolemund
  2. Introduction to Statistical Learning (ISLR) - With R examples
  3. Python Data Science Handbook - Jake VanderPlas
  4. Forecasting: Principles and Practice - Rob Hyndman
  5. The Effect: An Introduction to Research Design and Causality - Nick Huntington-Klein
  6. Causal Inference: The Mixtape - Scott Cunningham
  7. Advanced R - Hadley Wickham
  8. R Packages - Hadley Wickham & Jenny Bryan

Worth Buying:

  1. The Art of Statistics - David Spiegelhalter
  2. Storytelling with Data - Cole Nussbaumer Knaflic
  3. Naked Statistics - Charles Wheelan
  4. Data Science for Business - Foster Provost & Tom Fawcett
  5. Practical Statistics for Data Scientists - Peter Bruce & Andrew Bruce

πŸ₯ Health Economics & Research

Key Journals

Health Economics:

Public Health:

Methods:

Data Sources

Global Health Data:

  1. DHS Program - Demographic & Health Surveys (FREE)
  2. World Bank Open Data - Development indicators (FREE)
  3. WHO Global Health Observatory - Health statistics (FREE)
  4. IHME GBD - Global Burden of Disease (FREE)
  5. UN Data - United Nations statistics (FREE)
  6. Gapminder - Development data (FREE)

Kenya-Specific:

  1. Kenya National Bureau of Statistics (KNBS) - National data
  2. Kenya Open Data - Government open data (FREE)
  3. Kenya Health Information System (KHIS) - Health data
  4. APHRC Data - Population & health research (FREE)

Research Repositories:

  1. Harvard Dataverse - Research data (FREE)
  2. Figshare - Research outputs (FREE)
  3. Zenodo - Research data repository (FREE)
  4. Open ICPSR - Social science data (FREE)

Survey & Microdata:

  1. World Bank Microdata Library (catalog 5911 example) - Household and health survey microdata (FREE with registration)
  2. DHS Program: Stata Indicator Library - Stata code templates for DHS survey indicators (FREE)
  3. IPUMS DHS - Harmonized DHS microdata (FREE with registration)
  4. UNICEF MICS Microdata - Multiple Indicator Cluster Surveys (FREE with registration)

Streaming & Real-Time Data:

  1. Awesome Public Real-Time Datasets - Curated feeds for streaming and pipeline projects (FREE)
  2. OpenSky Network API - Live air traffic data (FREE)
  3. USGS Earthquake API - Near real-time seismic data (FREE)
  4. Transport for London Unified API - Live transport status (FREE with key)

Setup notes (real-time feeds):

  • Quarto can mix R and Python chunks; use httr2/jsonlite in R or requests in Python for GET calls.
  • Most APIs return JSON; wrap responses into tibble::as_tibble() or pandas.DataFrame for quick plotting.
  • OpenSky and USGS require no keys for basic use; TfL needs a free app key (register, set as env var TFL_APP_KEY).
  • For streaming experiments, pair the Bytewax list with websocket/httpuv in R or websockets/asyncio in Python.

Open Practice Datasets:

  1. Kaggle Datasets - Community datasets for modeling and dashboards (FREE)
  2. Google Dataset Search - Search engine for open datasets (FREE)
  3. Data.gov - US government open data (FREE)
  4. data.europa.eu - European Union open data portal (FREE)
  5. Registry of Open Data on AWS - Ready-to-use public datasets in the cloud (FREE)

Sample Project Ideas (R, Python, SQL in Quarto)

  1. Flight traffic monitor (OpenSky API + R/Python): Pull live flights every 60 seconds, cache to SQLite with DBI::dbWriteTable() or pandas.to_sql(), then map routes with leaflet (R) or plotly (Python).
  2. Earthquake alert map (USGS + R): Fetch last 24h quakes, bucket magnitudes in dplyr, visualize with ggplot2/geom_point() and annotate cities; add a SQL chunk to aggregate by country from SQLite.
  3. Transit reliability dashboard (TfL + SQL): Collect bus arrival predictions to a Postgres/SQLite table, compute headway irregularity via SQL window functions, chart peak-hour gaps with ggplot2 or seaborn.
  4. Streaming sentiment mini-pipeline (Bytewax list + Python): Pick any public websocket feed, stream into DuckDB using duckdb Python API, run live SQL for hourly aggregates, and render in a Quarto page.
  5. Household survey microdata cleaner (DHS/MICS + R + SQL): Use DHS Stata templates for indicator definitions, import microdata with haven, stage to DuckDB via dbplyr, and build a reproducible ETL Quarto doc with R/Python chunks side by side.

Health Economics Tools

Cost-Effectiveness Analysis:

Software:


πŸ’» Software Development

Version Control

Learn Git:

  1. Learn Git Branching - Interactive tutorial (FREE)
  2. Git Immersion - Guided tour (FREE)
  3. GitHub Skills - Interactive courses (FREE)

Package Development

R Packages:

Python Packages:


🌐 Communities & Networks

Online Communities

Forums & Q&A:

Social Learning:

African Data Science:

Professional Organizations

Statistics & Data Science:

Health Economics:

Research:


πŸš€ Career Development

Job Boards

General Data Science:

Data-Specific:

Africa/Kenya:

Freelancing Platforms

Portfolio Building

Showcase Platforms:


πŸ“– My Recommendations

If I were starting over in data analytics, here’s what I’d focus on:

Months 1-3: Foundations 1. Learn SQL (SQLBolt + Mode tutorials) 2. Pick ONE language: R OR Python 3. Master Excel/Google Sheets 4. Learn basic statistics (Khan Academy) 5. Build 3 simple projects

Months 4-6: Intermediate Skills 1. Data visualization (ggplot2 or matplotlib) 2. Exploratory data analysis 3. Statistical inference 4. Git & GitHub basics 5. Build portfolio website 6. Complete 2-3 Kaggle competitions

Months 7-9: Specialization 1. Choose domain (health, finance, marketing) 2. Learn domain-specific tools 3. Machine learning basics 4. Dashboard building (Tableau/Power BI) 5. Write 5 blog posts explaining projects

Months 10-12: Job Ready 1. Advanced analytics projects 2. Real-world datasets (not just tutorials) 3. Practice interviews (technical + behavioral) 4. Network on LinkedIn 5. Apply to entry-level roles 6. Consider freelancing for experience

Key principle: Build projects, not just skills. Employers hire problem-solvers.


🀝 Connect & Collaborate

Have resources to recommend? Want to collaborate on learning materials?

πŸ“§ Email: nichodemuswerre@gmail.com
πŸ’Ό LinkedIn: linkedin.com/in/nichodemusamollo
πŸ™ GitHub: github.com/gondamol
🐦 Twitter: @nwerre


This resource list is continually updated. Last update: October 2025
Bookmark this page and check back regularly for new additions!