Git & GitHub for Data Analysts: Stop Losing Your Work (15-Minute Setup)

The Version Control Skills That Will Save Your Career

Git
GitHub
Tools
Tutorial
Author

Nichodemus Amollo

Published

October 15, 2025

Why Data Analysts Need Git

Scenario 1: The Classic Nightmare

final_analysis.xlsx
final_analysis_v2.xlsx
final_analysis_v2_FINAL.xlsx
final_analysis_v2_FINAL_revised.xlsx
final_analysis_v2_FINAL_revised_USE_THIS_ONE.xlsx

Scenario 2: The Code Disaster You run a script, something breaks, you panic, make changes, break it more, can’t remember what worked…

Solution: Git


What is Git?

Git = Time machine for your code/data

It tracks every change you make, so you can: - Go back to any previous version - See what changed and when - Work on experiments without breaking things - Collaborate without overwriting each other’s work

GitHub = Cloud storage for Git projects


5-Minute Git Setup

Step 1: Install Git

Windows: - Download: git-scm.com - Run installer (default options)

Mac:

# Open Terminal and paste:
brew install git

Linux:

sudo apt-get install git  # Ubuntu/Debian
sudo yum install git      # CentOS/RHEL

Step 2: Configure Git

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Step 3: Create GitHub Account

  • Go to github.com
  • Sign up (it’s free!)
  • Verify email

Done! You’re ready.


The 10 Git Commands You’ll Use Daily

1. Create a Project

mkdir my-data-project
cd my-data-project
git init  # Initialize Git

2. Check Status

git status  # See what's changed

3. Stage Changes

git add analysis.py  # Stage one file
git add .            # Stage all files

4. Commit Changes

git commit -m "Add initial analysis script"

5. View History

git log  # See all commits
git log --oneline  # Compact view

6. Create Branch (for experiments)

git branch feature-new-model  # Create branch
git checkout feature-new-model  # Switch to it
# Or both in one:
git checkout -b feature-new-model

7. Merge Branch

git checkout main  # Go back to main
git merge feature-new-model  # Merge changes

8. Push to GitHub

git remote add origin https://github.com/yourusername/repo-name.git
git push -u origin main

9. Pull from GitHub

git pull  # Get latest changes

10. Clone Repository

git clone https://github.com/username/repo-name.git

The Git Workflow for Data Analysts

Daily Workflow:

# 1. Start your day
git pull  # Get latest changes

# 2. Work on analysis
# ... make changes to files ...

# 3. Check what changed
git status
git diff  # See line-by-line changes

# 4. Stage your changes
git add analysis.py
git add cleaned_data.csv

# 5. Commit with meaningful message
git commit -m "Clean customer data and add age segmentation"

# 6. Push to GitHub
git push

# 7. Repeat throughout the day

Branching Workflow:

# Main branch = production code
# Feature branches = experiments

# Create experiment branch
git checkout -b experiment-new-feature

# Work on experiment
# ... make changes ...

# Commit changes
git add .
git commit -m "Test new clustering algorithm"

# If it works, merge to main
git checkout main
git merge experiment-new-feature

# If it doesn't work, just delete branch
git branch -d experiment-new-feature

# No harm done! Main branch is untouched.

GitHub for Your Portfolio

Why GitHub Matters:

  1. Recruiters search GitHub for candidates
  2. Shows your work is real, not just bullet points
  3. Demonstrates collaboration skills
  4. Proves you write clean, documented code
  5. FREE hosting for websites (GitHub Pages)

What to Put on GitHub:

✅ Great for GitHub: - Analysis notebooks (Jupyter, R Markdown) - Data cleaning scripts - Visualization code - Portfolio projects - Tutorials you’ve written - Practice exercises

❌ Don’t put on GitHub: - Passwords or API keys - Proprietary company code - Large data files (>100MB) - Sensitive information


Make Your GitHub Shine:

1. Professional Profile

# README.md on your profile repo (username/username)

# Hi, I'm [Your Name] 👋

## About Me
Data Analyst passionate about turning data into actionable insights. 
Currently learning machine learning and building projects in Python.

## Skills
- **Languages:** Python, SQL, R
- **Tools:** Pandas, NumPy, Scikit-learn, Tableau
- **Databases:** PostgreSQL, MySQL

## Featured Projects
- [Project 1](link) - Description
- [Project 2](link) - Description
- [Project 3](link) - Description

## Connect With Me
- LinkedIn: [link]
- Portfolio: [link]
- Email: [email]

2. README for Each Project

# Project Name

## Problem Statement
What business problem does this solve?

## Data
- Source: [link]
- Size: X rows, Y columns
- Period: Date range

## Tools
- Python 3.9
- pandas, numpy, matplotlib, seaborn

## Key Findings
1. Finding 1 with impact
2. Finding 2 with impact
3. Finding 3 with impact

## How to Run
```bash
pip install -r requirements.txt
python analysis.py

Results

[Screenshots or visualizations]

Author

[Your name] - LinkedIn


#### **3. .gitignore File**

Don’t commit these files

pycache/ .pyc .env .DS_Store .csv # If data files are large venv/ .vscode/ .idea/


---

## Common Git Problems & Solutions

### **Problem 1: "I committed the wrong file!"**
```bash
# Undo last commit (keeps changes)
git reset --soft HEAD~1

# Or remove specific file
git reset HEAD file-to-remove

Problem 2: “I need to go back to a previous version!”

# See history
git log --oneline

# Go back to specific commit
git checkout abc123  # Replace with commit hash

# Or create new branch from old commit
git checkout -b old-version abc123

Problem 3: “I accidentally committed a password!”

# Remove from last commit
git rm --cached file-with-password
git commit --amend

# If already pushed, you MUST change the password immediately
# Then use tools like BFG Repo-Cleaner to remove from history

Problem 4: “Merge conflict!”

# Open conflicted file, you'll see:
<<<<<<< HEAD
Your changes
=======
Their changes
>>>>>>> branch-name

# Edit file to keep what you want
# Then:
git add conflicted-file
git commit -m "Resolve merge conflict"

GitHub Features You Should Use

1. GitHub Issues

Track TODOs, bugs, ideas

2. GitHub Projects

Kanban board for organizing work

3. GitHub Actions

Automate testing, deployment (advanced)

4. GitHub Pages

FREE website hosting - Your portfolio site - Project documentation

5. GitHub Gists

Share code snippets


FREE GitHub Resources

Learning:

  1. GitHub Skills - Interactive tutorials
  2. Git Handbook - Official guide
  3. Learn Git Branching - Visual, interactive
  4. Git Cheat Sheet - PDF reference

Tools:

  1. GitHub Desktop - GUI for Git (easier for beginners)
  2. GitKraken - Beautiful Git client (free tier)
  3. VS Code - Editor with built-in Git support

Practice:

  1. First Contributions - Practice contributing
  2. Awesome for Beginners - Beginner-friendly projects

Git Commit Message Best Practices

Bad:

fixed stuff
updates
asdfasdf
final version

Good:

Add customer segmentation analysis
Fix missing value handling in data cleaning
Update visualization colors for accessibility
Remove outdated product_sales.py script
Refactor SQL queries for better performance

Format:

[Type] Brief description (50 chars or less)

Optional longer description explaining:
- Why you made the change
- What problem it solves
- Any side effects

Types: - feat: New feature - fix: Bug fix - docs: Documentation - style: Formatting - refactor: Code restructuring - test: Adding tests - chore: Maintenance


30-Day GitHub Challenge

Week 1:

Week 2:

Week 3:

Week 4:


Take Action (Next 30 Minutes)

  1. Install Git (10 min)
  2. Create GitHub account (5 min)
  3. Create first repository (5 min)
  4. Upload a project (10 min)

Don’t overthink it. Just start.


Related Posts: - Build a Portfolio That Gets You Hired - Kaggle for Beginners - Your Ultimate 100-Day Roadmap

Tags: #Git #GitHub #VersionControl #Tools #Portfolio #Career