Why Data Analysts Need Git
Scenario 1: The Classic Nightmare
final_analysis.xlsx
final_analysis_v2.xlsx
final_analysis_v2_FINAL.xlsx
final_analysis_v2_FINAL_revised.xlsx
final_analysis_v2_FINAL_revised_USE_THIS_ONE.xlsx
Scenario 2: The Code Disaster You run a script, something breaks, you panic, make changes, break it more, can’t remember what worked…
Solution: Git
What is Git?
Git = Time machine for your code/data
It tracks every change you make, so you can: - Go back to any previous version - See what changed and when - Work on experiments without breaking things - Collaborate without overwriting each other’s work
GitHub = Cloud storage for Git projects
5-Minute Git Setup
Step 1: Install Git
Windows: - Download: git-scm.com - Run installer (default options)
Mac:
# Open Terminal and paste:
brew install gitLinux:
sudo apt-get install git # Ubuntu/Debian
sudo yum install git # CentOS/RHELStep 2: Configure Git
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"Step 3: Create GitHub Account
- Go to github.com
- Sign up (it’s free!)
- Verify email
Done! You’re ready.
The 10 Git Commands You’ll Use Daily
1. Create a Project
mkdir my-data-project
cd my-data-project
git init # Initialize Git2. Check Status
git status # See what's changed3. Stage Changes
git add analysis.py # Stage one file
git add . # Stage all files4. Commit Changes
git commit -m "Add initial analysis script"5. View History
git log # See all commits
git log --oneline # Compact view6. Create Branch (for experiments)
git branch feature-new-model # Create branch
git checkout feature-new-model # Switch to it
# Or both in one:
git checkout -b feature-new-model7. Merge Branch
git checkout main # Go back to main
git merge feature-new-model # Merge changes8. Push to GitHub
git remote add origin https://github.com/yourusername/repo-name.git
git push -u origin main9. Pull from GitHub
git pull # Get latest changes10. Clone Repository
git clone https://github.com/username/repo-name.gitThe Git Workflow for Data Analysts
Daily Workflow:
# 1. Start your day
git pull # Get latest changes
# 2. Work on analysis
# ... make changes to files ...
# 3. Check what changed
git status
git diff # See line-by-line changes
# 4. Stage your changes
git add analysis.py
git add cleaned_data.csv
# 5. Commit with meaningful message
git commit -m "Clean customer data and add age segmentation"
# 6. Push to GitHub
git push
# 7. Repeat throughout the dayBranching Workflow:
# Main branch = production code
# Feature branches = experiments
# Create experiment branch
git checkout -b experiment-new-feature
# Work on experiment
# ... make changes ...
# Commit changes
git add .
git commit -m "Test new clustering algorithm"
# If it works, merge to main
git checkout main
git merge experiment-new-feature
# If it doesn't work, just delete branch
git branch -d experiment-new-feature
# No harm done! Main branch is untouched.GitHub for Your Portfolio
Why GitHub Matters:
- Recruiters search GitHub for candidates
- Shows your work is real, not just bullet points
- Demonstrates collaboration skills
- Proves you write clean, documented code
- FREE hosting for websites (GitHub Pages)
What to Put on GitHub:
✅ Great for GitHub: - Analysis notebooks (Jupyter, R Markdown) - Data cleaning scripts - Visualization code - Portfolio projects - Tutorials you’ve written - Practice exercises
❌ Don’t put on GitHub: - Passwords or API keys - Proprietary company code - Large data files (>100MB) - Sensitive information
Make Your GitHub Shine:
1. Professional Profile
# README.md on your profile repo (username/username)
# Hi, I'm [Your Name] 👋
## About Me
Data Analyst passionate about turning data into actionable insights.
Currently learning machine learning and building projects in Python.
## Skills
- **Languages:** Python, SQL, R
- **Tools:** Pandas, NumPy, Scikit-learn, Tableau
- **Databases:** PostgreSQL, MySQL
## Featured Projects
- [Project 1](link) - Description
- [Project 2](link) - Description
- [Project 3](link) - Description
## Connect With Me
- LinkedIn: [link]
- Portfolio: [link]
- Email: [email]2. README for Each Project
# Project Name
## Problem Statement
What business problem does this solve?
## Data
- Source: [link]
- Size: X rows, Y columns
- Period: Date range
## Tools
- Python 3.9
- pandas, numpy, matplotlib, seaborn
## Key Findings
1. Finding 1 with impact
2. Finding 2 with impact
3. Finding 3 with impact
## How to Run
```bash
pip install -r requirements.txt
python analysis.pyResults
[Screenshots or visualizations]
Don’t commit these files
pycache/ .pyc .env .DS_Store .csv # If data files are large venv/ .vscode/ .idea/
---
## Common Git Problems & Solutions
### **Problem 1: "I committed the wrong file!"**
```bash
# Undo last commit (keeps changes)
git reset --soft HEAD~1
# Or remove specific file
git reset HEAD file-to-remove
Problem 2: “I need to go back to a previous version!”
# See history
git log --oneline
# Go back to specific commit
git checkout abc123 # Replace with commit hash
# Or create new branch from old commit
git checkout -b old-version abc123Problem 3: “I accidentally committed a password!”
# Remove from last commit
git rm --cached file-with-password
git commit --amend
# If already pushed, you MUST change the password immediately
# Then use tools like BFG Repo-Cleaner to remove from historyProblem 4: “Merge conflict!”
# Open conflicted file, you'll see:
<<<<<<< HEAD
Your changes
=======
Their changes
>>>>>>> branch-name
# Edit file to keep what you want
# Then:
git add conflicted-file
git commit -m "Resolve merge conflict"GitHub Features You Should Use
1. GitHub Issues
Track TODOs, bugs, ideas
2. GitHub Projects
Kanban board for organizing work
3. GitHub Actions
Automate testing, deployment (advanced)
4. GitHub Pages
FREE website hosting - Your portfolio site - Project documentation
5. GitHub Gists
Share code snippets
FREE GitHub Resources
Learning:
- GitHub Skills - Interactive tutorials
- Git Handbook - Official guide
- Learn Git Branching - Visual, interactive
- Git Cheat Sheet - PDF reference
Tools:
- GitHub Desktop - GUI for Git (easier for beginners)
- GitKraken - Beautiful Git client (free tier)
- VS Code - Editor with built-in Git support
Practice:
- First Contributions - Practice contributing
- Awesome for Beginners - Beginner-friendly projects
Git Commit Message Best Practices
Bad:
fixed stuff
updates
asdfasdf
final version
Good:
Add customer segmentation analysis
Fix missing value handling in data cleaning
Update visualization colors for accessibility
Remove outdated product_sales.py script
Refactor SQL queries for better performance
Format:
[Type] Brief description (50 chars or less)
Optional longer description explaining:
- Why you made the change
- What problem it solves
- Any side effects
Types: - feat: New feature - fix: Bug fix - docs: Documentation - style: Formatting - refactor: Code restructuring - test: Adding tests - chore: Maintenance
GitHub in Your Job Search
Resume:
TECHNICAL SKILLS
- Version Control: Git, GitHub (50+ public repositories)
- Link: github.com/yourname
Cover Letter:
"I've built a portfolio of 15+ data analysis projects,
all available on my GitHub at github.com/yourname.
My most popular project [link] has been forked 50+ times
and demonstrates my ability to [skill]."
LinkedIn:
# In "Featured" section:
- Link your best GitHub repos
- Add screenshots
# In experience:
"All code available at: github.com/yourname/project-name"
30-Day GitHub Challenge
Week 1:
Week 2:
Week 3:
Week 4:
Take Action (Next 30 Minutes)
- Install Git (10 min)
- Create GitHub account (5 min)
- Create first repository (5 min)
- Upload a project (10 min)
Don’t overthink it. Just start.
Related Posts: - Build a Portfolio That Gets You Hired - Kaggle for Beginners - Your Ultimate 100-Day Roadmap
Tags: #Git #GitHub #VersionControl #Tools #Portfolio #Career