Git Filter Branch
Learn about Git Filter Branch for advanced history rewriting
Git Filter Branch
Git filter-branch is a powerful command that allows you to rewrite Git history by filtering and modifying commits across entire branches or repositories. It's useful for large-scale history modifications that can't be easily accomplished with interactive rebase.
What is Git Filter Branch?
Filter-branch walks through your repository's history and allows you to:
- Remove files from all commits
- Modify file contents across history
- Change author information
- Restructure directories
- Remove sensitive data
- Split repositories
Basic Usage
Basic Syntax
git filter-branch [options] [revision-range]
Simple Example
# Remove a file from all commits
git filter-branch --tree-filter 'rm -f passwords.txt' HEAD
Common Filter Types
--tree-filter
)
1. Tree Filter (Modifies the working tree for each commit:
# Remove a file from all commits
git filter-branch --tree-filter 'rm -f secret.txt' HEAD
# Remove a directory from all commits
git filter-branch --tree-filter 'rm -rf old-directory' HEAD
# Rename files across all commits
git filter-branch --tree-filter 'find . -name "*.txt" -exec mv {} {}.bak \;' HEAD
--index-filter
)
2. Index Filter (Modifies the index (staging area) - faster than tree filter:
# Remove a file from all commits (faster)
git filter-branch --index-filter 'git rm --cached --ignore-unmatch secret.txt' HEAD
# Remove multiple files
git filter-branch --index-filter 'git rm --cached --ignore-unmatch *.log' HEAD
--env-filter
)
3. Environment Filter (Modifies environment variables like author/committer:
# Change author email for all commits
git filter-branch --env-filter '
if [ "$GIT_AUTHOR_EMAIL" = "old@example.com" ]; then
export GIT_AUTHOR_EMAIL="new@example.com"
fi
' HEAD
# Change both author and committer
git filter-branch --env-filter '
if [ "$GIT_AUTHOR_NAME" = "Old Name" ]; then
export GIT_AUTHOR_NAME="New Name"
export GIT_AUTHOR_EMAIL="new@example.com"
export GIT_COMMITTER_NAME="New Name"
export GIT_COMMITTER_EMAIL="new@example.com"
fi
' HEAD
--msg-filter
)
4. Message Filter (Modifies commit messages:
# Add prefix to all commit messages
git filter-branch --msg-filter 'echo "[MIGRATED] $1"' HEAD
# Remove sensitive information from commit messages
git filter-branch --msg-filter 'sed "s/password=.*/password=***/"' HEAD
--subdirectory-filter
)
5. Subdirectory Filter (Extract a subdirectory as the new root:
# Make subdirectory the new root
git filter-branch --subdirectory-filter my-subdirectory HEAD
Advanced Usage Examples
Remove Sensitive Data
# Remove file with sensitive data from entire history
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch config/secrets.yml' \
--prune-empty --tag-name-filter cat -- --all
Change Author Information
# Change author info for specific email
git filter-branch --env-filter '
OLD_EMAIL="old@company.com"
CORRECT_NAME="Correct Name"
CORRECT_EMAIL="correct@company.com"
if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]; then
export GIT_COMMITTER_NAME="$CORRECT_NAME"
export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]; then
export GIT_AUTHOR_NAME="$CORRECT_NAME"
export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags
Split Repository
# Extract subdirectory into new repository
git filter-branch --subdirectory-filter path/to/subdirectory -- --all
Remove Large Files
# Remove files larger than 10MB
git filter-branch --tree-filter '
find . -type f -size +10M -delete
' HEAD
Filter-Branch Options
Important Options
# Force operation (overwrites existing backup)
--force
# Remove empty commits after filtering
--prune-empty
# Filter all branches and tags
-- --all
# Filter specific branches
-- --branches
# Filter specific tags
-- --tags
# Original references backup location
--original refs/original/
Complete Example
git filter-branch \
--force \
--index-filter 'git rm --cached --ignore-unmatch large-file.zip' \
--prune-empty \
--tag-name-filter cat \
-- --all
Common Use Cases
1. Remove Sensitive Files
# Remove API keys file from all history
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch api-keys.txt' \
--prune-empty --tag-name-filter cat -- --all
# Clean up
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now --aggressive
2. Repository Cleanup
# Remove all .DS_Store files
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch .DS_Store' \
--prune-empty --tag-name-filter cat -- --all
# Remove all log files
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch --recursive "*.log"' \
--prune-empty --tag-name-filter cat -- --all
3. Directory Restructuring
# Move all files from root to subdirectory
git filter-branch --tree-filter '
mkdir -p new-structure
find . -maxdepth 1 -not -name "new-structure" -not -name ".git" -not -name "." \
-exec mv {} new-structure/ \;
' HEAD
4. Author Correction
# Fix author information for multiple authors
git filter-branch --env-filter '
case "$GIT_AUTHOR_EMAIL" in
"old1@example.com")
export GIT_AUTHOR_NAME="New Name 1"
export GIT_AUTHOR_EMAIL="new1@example.com"
;;
"old2@example.com")
export GIT_AUTHOR_NAME="New Name 2"
export GIT_AUTHOR_EMAIL="new2@example.com"
;;
esac
' HEAD
Best Practices
1. Backup Before Using
# Create a backup branch
git branch backup-branch
# Or clone the repository
git clone original-repo backup-repo
2. Work on a Copy
# Work on a separate repository
git clone original-repo temp-repo
cd temp-repo
# Perform filter-branch operations
3. Clean Up Afterwards
# Remove original refs
rm -rf .git/refs/original/
# Expire reflog
git reflog expire --expire=now --all
# Garbage collect
git gc --prune=now --aggressive
4. Force Push Carefully
# Force push to update remote (DANGEROUS!)
git push --force-with-lease origin --all
git push --force-with-lease origin --tags
Performance Tips
Use Index Filter Instead of Tree Filter
# Slower (checks out each commit)
git filter-branch --tree-filter 'rm -f file.txt' HEAD
# Faster (works with index only)
git filter-branch --index-filter 'git rm --cached --ignore-unmatch file.txt' HEAD
Filter Specific Branches
# Filter only main branch
git filter-branch --index-filter 'git rm --cached --ignore-unmatch file.txt' main
# Filter specific range
git filter-branch --index-filter 'git rm --cached --ignore-unmatch file.txt' HEAD~10..HEAD
Troubleshooting
Common Issues
1. Filter-Branch Refuses to Run
# Error: Cannot create a new backup
# Solution: Use --force or remove existing backup
git filter-branch --force --index-filter '...' HEAD
# Or remove backup
rm -rf .git/refs/original/
2. Empty Repository After Filter
# Check if all commits were pruned
git log --oneline
# Recovery from backup
git reset --hard backup-branch
3. Performance Issues
# Use index filter instead of tree filter
# Work on smaller ranges
# Use --prune-empty to remove empty commits
Recovery Options
# Reset to original state
git reset --hard refs/original/refs/heads/main
# Or use reflog
git reflog
git reset --hard HEAD@{5}
Alternatives to Filter-Branch
Git Filter-Repo (Recommended)
# Modern replacement for filter-branch
pip install git-filter-repo
# Remove file
git filter-repo --path secret.txt --invert-paths
# Change author
git filter-repo --mailmap mailmap.txt
BFG Repo-Cleaner
# Fast alternative for removing large files
java -jar bfg.jar --delete-files secret.txt my-repo.git
Migration Script Example
#!/bin/bash
# Complete migration script
# 1. Create backup
git branch backup-original
# 2. Remove sensitive files
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch secrets.txt config/database.yml' \
--prune-empty --tag-name-filter cat -- --all
# 3. Fix author information
git filter-branch --force --env-filter '
if [ "$GIT_AUTHOR_EMAIL" = "old@example.com" ]; then
export GIT_AUTHOR_NAME="Correct Name"
export GIT_AUTHOR_EMAIL="correct@example.com"
export GIT_COMMITTER_NAME="Correct Name"
export GIT_COMMITTER_EMAIL="correct@example.com"
fi
' -- --all
# 4. Clean up
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now --aggressive
# 5. Verify results
git log --oneline -10
git show-branch --all
echo "Migration complete. Review changes before force pushing."