Resources & Learning Materials
Curated collection of data science tools, courses, and resources
Data Analytics & Research Resources
A comprehensive collection of free and paid resources for aspiring and practicing data analysts, researchers, and health economists.
π Resume
Latest resume for download and quick review.
π Data Analytics Tools
R Programming
Essential Packages:
- tidyverse - Data manipulation and visualization ecosystem
- ggplot2 - Advanced data visualization
- dplyr - Data manipulation
- tidyr - Data tidying
- readr - Data import
- shiny - Interactive web applications
- rmarkdown - Dynamic documents
- caret - Machine learning workflows
- survival - Survival analysis
- data.table - High-performance data manipulation
Development Tools:
- RStudio - IDE for R (FREE)
- Posit Cloud - Cloud-based RStudio
- GitHub - Version control
Python for Data Science
Core Libraries:
- pandas - Data manipulation
- numpy - Numerical computing
- matplotlib - Plotting library
- seaborn - Statistical visualization
- scikit-learn - Machine learning
- statsmodels - Statistical modeling
- plotly - Interactive visualizations
- jupyter - Interactive notebooks
IDEs & Platforms:
- Anaconda - Python distribution for data science (FREE)
- VS Code - Versatile code editor (FREE)
- Google Colab - Cloud-based Jupyter (FREE)
- Kaggle Notebooks - Cloud computing + datasets (FREE)
Statistical Software
Survey Tools
Mobile Data Collection:
- ODK (Open Data Kit) - Free & open source (FREE)
- KoboToolbox - Humanitarian data collection (FREE)
- SurveyCTO - Professional survey platform (Paid)
- REDCap - Research data capture (FREE for academic)
- CommCare - Mobile data collection (Freemium)
Online Surveys:
- Google Forms - Simple surveys (FREE)
- Qualtrics - Advanced survey platform (Paid)
- SurveyMonkey - Popular survey tool (Freemium)
- Typeform - Beautiful surveys (Freemium)
Visualization Tools
Business Intelligence:
- Tableau Public - Free version (FREE)
- Tableau Desktop - Professional BI (Paid, student discount)
- Power BI - Microsoftβs BI platform (Freemium)
- Looker Studio (Google Data Studio) - Free BI tool (FREE)
Specialized Visualization:
Databases
Relational Databases:
- PostgreSQL - Powerful open-source DB (FREE)
- MySQL - Popular open-source DB (FREE)
- SQLite - Lightweight database (FREE)
- Microsoft SQL Server - Enterprise DB (Paid)
NoSQL Databases:
- MongoDB - Document database (Freemium)
- Redis - In-memory database (FREE)
- Firebase - Googleβs backend platform (Freemium)
π Learning Resources
Free Online Courses
Data Analytics Fundamentals:
- Google Data Analytics Certificate - Coursera (FREE audit)
- IBM Data Analyst Certificate - Coursera (FREE audit)
- Microsoft: Data Science for Beginners - GitHub curriculum (FREE)
- freeCodeCamp: Data Analysis with Python - Comprehensive course (FREE)
R Programming:
- R for Data Science (Online Book) - Hadley Wickham (FREE)
- Swirl - Learn R interactively in R console (FREE)
- R-bloggers - Community tutorials (FREE)
- DataCamp: Introduction to R - First chapter free (FREE)
Python for Data Science:
- Python for Everybody - Dr. Chuck (FREE)
- Kaggle Learn - Micro-courses (FREE)
- DataCamp: Intro to Python - First chapter (FREE)
- Harvard CS109: Data Science - Full course materials (FREE)
SQL:
- SQLBolt - Interactive SQL lessons (FREE)
- Mode SQL Tutorial - Comprehensive guide (FREE)
- W3Schools SQL - Reference and practice (FREE)
- Khan Academy: Intro to SQL - Video lessons (FREE)
Statistics:
- Khan Academy Statistics - Comprehensive stats course (FREE)
- Stat110 Harvard - Probability course (FREE)
- Penn State STAT 500 - Applied statistics (FREE)
- Statistics with R (Duke) - Coursera (FREE audit)
Machine Learning:
- Andrew Ngβs Machine Learning - Coursera classic (FREE audit)
- Fast.ai - Practical deep learning (FREE)
- Googleβs Machine Learning Crash Course - With TensorFlow (FREE)
- Elements of AI - AI basics (FREE)
YouTube Channels
General Data Science:
- StatQuest with Josh Starmer - Best statistics explanations
- 3Blue1Brown - Math visualizations
- freeCodeCamp.org - Full courses
- Data School - Python & R tutorials
- Ken Jee - Data science career advice
R Programming:
- David Robinson - Tidy Tuesday screencasts
- Andrew Couch - R tutorials
- Business Science - R for business
Python:
- Corey Schafer - Python fundamentals
- Keith Galli - Data science with Python
- Krish Naik - ML & AI
Health Data Science:
- Mike Marin - Biostatistics
- MarinStatsLectures - R & statistics
Books (Free & Paid)
Free Online Books:
- R for Data Science - Hadley Wickham & Garrett Grolemund
- Introduction to Statistical Learning (ISLR) - With R examples
- Python Data Science Handbook - Jake VanderPlas
- Forecasting: Principles and Practice - Rob Hyndman
- The Effect: An Introduction to Research Design and Causality - Nick Huntington-Klein
- Causal Inference: The Mixtape - Scott Cunningham
- Advanced R - Hadley Wickham
- R Packages - Hadley Wickham & Jenny Bryan
Worth Buying:
- The Art of Statistics - David Spiegelhalter
- Storytelling with Data - Cole Nussbaumer Knaflic
- Naked Statistics - Charles Wheelan
- Data Science for Business - Foster Provost & Tom Fawcett
- Practical Statistics for Data Scientists - Peter Bruce & Andrew Bruce
π₯ Health Economics & Research
Key Journals
Health Economics:
- Health Economics
- Health Affairs
- Social Science & Medicine
- The Lancet Global Health
- BMC Health Services Research
- PLoS Medicine - Open access (FREE)
Public Health:
- American Journal of Public Health
- BMC Public Health - Open access (FREE)
- Global Health Action - Open access (FREE)
Methods:
Data Sources
Global Health Data:
- DHS Program - Demographic & Health Surveys (FREE)
- World Bank Open Data - Development indicators (FREE)
- WHO Global Health Observatory - Health statistics (FREE)
- IHME GBD - Global Burden of Disease (FREE)
- UN Data - United Nations statistics (FREE)
- Gapminder - Development data (FREE)
Kenya-Specific:
- Kenya National Bureau of Statistics (KNBS) - National data
- Kenya Open Data - Government open data (FREE)
- Kenya Health Information System (KHIS) - Health data
- APHRC Data - Population & health research (FREE)
Research Repositories:
- Harvard Dataverse - Research data (FREE)
- Figshare - Research outputs (FREE)
- Zenodo - Research data repository (FREE)
- Open ICPSR - Social science data (FREE)
Survey & Microdata:
- World Bank Microdata Library (catalog 5911 example) - Household and health survey microdata (FREE with registration)
- DHS Program: Stata Indicator Library - Stata code templates for DHS survey indicators (FREE)
- IPUMS DHS - Harmonized DHS microdata (FREE with registration)
- UNICEF MICS Microdata - Multiple Indicator Cluster Surveys (FREE with registration)
Streaming & Real-Time Data:
- Awesome Public Real-Time Datasets - Curated feeds for streaming and pipeline projects (FREE)
- OpenSky Network API - Live air traffic data (FREE)
- USGS Earthquake API - Near real-time seismic data (FREE)
- Transport for London Unified API - Live transport status (FREE with key)
Setup notes (real-time feeds):
- Quarto can mix R and Python chunks; use
httr2/jsonlitein R orrequestsin Python for GET calls. - Most APIs return JSON; wrap responses into
tibble::as_tibble()orpandas.DataFramefor quick plotting. - OpenSky and USGS require no keys for basic use; TfL needs a free app key (register, set as env var
TFL_APP_KEY). - For streaming experiments, pair the Bytewax list with
websocket/httpuvin R orwebsockets/asyncioin Python.
Open Practice Datasets:
- Kaggle Datasets - Community datasets for modeling and dashboards (FREE)
- Google Dataset Search - Search engine for open datasets (FREE)
- Data.gov - US government open data (FREE)
- data.europa.eu - European Union open data portal (FREE)
- Registry of Open Data on AWS - Ready-to-use public datasets in the cloud (FREE)
Sample Project Ideas (R, Python, SQL in Quarto)
- Flight traffic monitor (OpenSky API + R/Python): Pull live flights every 60 seconds, cache to SQLite with
DBI::dbWriteTable()orpandas.to_sql(), then map routes withleaflet(R) orplotly(Python). - Earthquake alert map (USGS + R): Fetch last 24h quakes, bucket magnitudes in dplyr, visualize with
ggplot2/geom_point()and annotate cities; add a SQL chunk to aggregate by country from SQLite. - Transit reliability dashboard (TfL + SQL): Collect bus arrival predictions to a Postgres/SQLite table, compute headway irregularity via SQL window functions, chart peak-hour gaps with
ggplot2orseaborn. - Streaming sentiment mini-pipeline (Bytewax list + Python): Pick any public websocket feed, stream into DuckDB using
duckdbPython API, run live SQL for hourly aggregates, and render in a Quarto page. - Household survey microdata cleaner (DHS/MICS + R + SQL): Use DHS Stata templates for indicator definitions, import microdata with
haven, stage to DuckDB viadbplyr, and build a reproducible ETL Quarto doc with R/Python chunks side by side.
Health Economics Tools
Cost-Effectiveness Analysis:
- CHEERS Checklist - Reporting guidelines
- iDSI Reference Case - HTA guidelines
- WHO-CHOICE - Cost-effectiveness
Software:
- TreeAge Pro - Decision tree analysis (Paid)
- R package: heemod - Health economic modeling (FREE)
- R package: dampack - Decision-analytic modeling (FREE)
π» Software Development
Version Control
- Git Documentation - Official docs (FREE)
- GitHub Guides - Learning resources (FREE)
- Pro Git Book - Free online book (FREE)
- GitHub Desktop - GUI for Git (FREE)
- GitKraken - Git client (Freemium)
Learn Git:
- Learn Git Branching - Interactive tutorial (FREE)
- Git Immersion - Guided tour (FREE)
- GitHub Skills - Interactive courses (FREE)
Package Development
R Packages:
- R Packages Book - Hadley Wickham (FREE)
- Writing R Extensions - Official guide (FREE)
- usethis package - Package development workflow (FREE)
- devtools package - Development tools (FREE)
Python Packages:
- Python Packaging Guide - Official guide (FREE)
- Real Python: Publishing Package - Tutorial (FREE)
π Communities & Networks
Online Communities
Forums & Q&A:
- Stack Overflow - Programming Q&A (FREE)
- Cross Validated - Statistics Q&A (FREE)
- RStudio Community - R help (FREE)
- Reddit r/datascience - Data science community (FREE)
- Reddit r/rstats - R community (FREE)
Social Learning:
- Kaggle - Competitions & datasets (FREE)
- DataCamp Community - Tutorials (FREE)
- Towards Data Science - Medium publication (FREE)
- Analytics Vidhya - Data science blog (FREE)
African Data Science:
- Africa R Users - R users in Africa (FREE)
- Data Science Africa - DSA community (FREE)
- AfroCHI - African HCI community (FREE)
Professional Organizations
Statistics & Data Science:
- American Statistical Association (ASA)
- Royal Statistical Society (RSS)
- International Statistical Institute (ISI)
- Kenya Statistical Society
Health Economics:
- International Health Economics Association (iHEA)
- African Health Economics and Policy Association (AfHEA)
- Health Economics Research Unit (HERU)
Research:
π Career Development
Job Boards
General Data Science:
- LinkedIn Jobs - Professional network
- Indeed - Job aggregator
- Glassdoor - Jobs + reviews
- AngelList - Startup jobs
Data-Specific:
- Kaggle Jobs - Data science roles
- DataJobs - Data careers
- iHire Data Science - Specialized board
- Remote OK - Remote data jobs
Africa/Kenya:
- BrighterMonday Kenya - Kenyan jobs
- FuZu - East African jobs
- Devex - Development sector jobs
- ReliefWeb - Humanitarian jobs
Freelancing Platforms
- Upwork - General freelancing
- Toptal - Top 3% freelancers
- Fiverr - Gig marketplace
- Freelancer - Project bidding
- Kolabtree - Scientific freelancing
Portfolio Building
Showcase Platforms:
- GitHub - Code repository (FREE)
- Kaggle - Competitions + notebooks (FREE)
- Tableau Public - Dashboards (FREE)
- Observable - JavaScript notebooks (FREE)
- RStudio Connect - App hosting (Paid)
- Shinyapps.io - Shiny hosting (Freemium)
π My Recommendations
If I were starting over in data analytics, hereβs what Iβd focus on:
Months 1-3: Foundations 1. Learn SQL (SQLBolt + Mode tutorials) 2. Pick ONE language: R OR Python 3. Master Excel/Google Sheets 4. Learn basic statistics (Khan Academy) 5. Build 3 simple projects
Months 4-6: Intermediate Skills 1. Data visualization (ggplot2 or matplotlib) 2. Exploratory data analysis 3. Statistical inference 4. Git & GitHub basics 5. Build portfolio website 6. Complete 2-3 Kaggle competitions
Months 7-9: Specialization 1. Choose domain (health, finance, marketing) 2. Learn domain-specific tools 3. Machine learning basics 4. Dashboard building (Tableau/Power BI) 5. Write 5 blog posts explaining projects
Months 10-12: Job Ready 1. Advanced analytics projects 2. Real-world datasets (not just tutorials) 3. Practice interviews (technical + behavioral) 4. Network on LinkedIn 5. Apply to entry-level roles 6. Consider freelancing for experience
Key principle: Build projects, not just skills. Employers hire problem-solvers.
π€ Connect & Collaborate
Have resources to recommend? Want to collaborate on learning materials?
π§ Email: nichodemuswerre@gmail.com
πΌ LinkedIn: linkedin.com/in/nichodemusamollo
π GitHub: github.com/gondamol
π¦ Twitter: @nwerre
This resource list is continually updated. Last update: October 2025
Bookmark this page and check back regularly for new additions!