jiji262 / douyin-downloader
A practical Douyin downloader for both single-item and profile batch downloads, with progress display, retries, SQLite deduplication, and browser fallback support. 抖音批量下载工具,去水印,支持视频、图集、合集、音乐(原声)。免费!免费!免费!
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing jiji262/douyin-downloader in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Summary (README)
PreviewDouyin Downloader V2.0
中文文档 (Chinese): README.zh-CN.md
A practical Douyin downloader supporting videos, image-notes, collections, music, and profile batch downloads, with progress display, retries, SQLite deduplication, download integrity checks, and browser fallback support.
This document targets V2.0 (
mainbranch).
For the legacy version, switch to V1.0:git fetch --all && git switch V1.0
Feature Overview
Supported
| Feature | Description |
|---|---|
| Single video download | /video/{aweme_id} |
| Single image-note download | /note/{note_id} |
| Single collection download | /collection/{mix_id} and /mix/{mix_id} |
| Single music download | /music/{music_id} (prefers direct audio, fallback to first related aweme) |
| Short link parsing | https://v.douyin.com/... |
| Profile batch download | /user/{sec_uid} + mode: [post, like, mix, music] |
| No-watermark preferred | Automatically selects watermark-free video source |
| Extra assets | Cover, music, avatar, JSON metadata |
| Video transcription | Optional, using OpenAI Transcriptions API |
| Concurrent downloads | Configurable concurrency, default 5 |
| Retry with backoff | Exponential backoff (1s, 2s, 5s) |
| Rate limiting | Default 2 req/s |
| SQLite deduplication | Database + local file dual dedup |
| Incremental downloads | increase.post/like/mix/music |
| Time filters | start_time / end_time |
| Browser fallback | Launches browser when pagination is blocked, manual CAPTCHA supported |
| Download integrity check | Content-Length validation, auto-cleanup of incomplete files |
| Progress display | Rich progress bars, supports progress.quiet_logs quiet mode |
| Docker deployment | Dockerfile included |
| CI/CD | GitHub Actions for testing and linting |
Current Limitations
- Browser fallback is fully validated for
post;like/mix/musiccurrently relies on API pagination number.allmix/increase.allmixare retained as compatibility aliases and normalized tomix
Quick Start
1) Requirements
- Python 3.8+
- macOS / Linux / Windows
2) Install dependencies
pip install -r requirements.txt
For browser fallback and automatic cookie capture:
pip install playwright
python -m playwright install chromium
3) Copy config file
cp config.example.yml config.yml
4) Get cookies (recommended: automatic)
python -m tools.cookie_fetcher --config config.yml
After logging into Douyin, return to the terminal and press Enter. Cookies will be written to your config automatically.
5) Docker deployment (optional)
docker build -t douyin-downloader .
docker run -v $(pwd)/config.yml:/app/config.yml -v $(pwd)/Downloaded:/app/Downloaded douyin-downloader
Minimal Working Config
link:
- https://www.douyin.com/user/MS4wLjABAAAAxxxx
path: ./Downloaded/
mode:
- post
number:
post: 0
thread: 5
retry_times: 3
database: true
progress:
quiet_logs: true
cookies:
msToken: ""
ttwid: YOUR_TTWID
odin_tt: YOUR_ODIN_TT
passport_csrf_token: YOUR_CSRF_TOKEN
sid_guard: ""
browser_fallback:
enabled: true
headless: false
max_scrolls: 240
idle_rounds: 8
wait_timeout_seconds: 600
transcript:
enabled: false
model: gpt-4o-mini-transcribe
output_dir: ""
response_formats: ["txt", "json"]
api_url: https://api.openai.com/v1/audio/transcriptions
api_key_env: OPENAI_API_KEY
api_key: ""
Usage
Run with a config file
python run.py -c config.yml
Append CLI arguments
python run.py -c config.yml \
-u "https://www.douyin.com/video/7604129988555574538" \
-t 8 \
-p ./Downloaded
Arguments
| Argument | Description |
|---|---|
-u, --url | Append download link(s), can be repeated |
-c, --config | Specify config file (default: config.yml) |
-p, --path | Specify download directory |
-t, --thread | Specify concurrency |
--show-warnings | Show warning/error logs |
-v, --verbose | Show info/warning/error logs |
--version | Show version number |
Typical Scenarios
Download one video
link:
- https://www.douyin.com/video/7604129988555574538
Download one image-note
link:
- https://www.douyin.com/note/7341234567890123456
Download a collection
link:
- https://www.douyin.com/collection/7341234567890123456
Download a music track
link:
- https://www.douyin.com/music/7341234567890123456
Batch download a creator's posts
link:
- https://www.douyin.com/user/MS4wLjABAAAAxxxx
mode:
- post
number:
post: 50
Batch download a creator's liked posts
link:
- https://www.douyin.com/user/MS4wLjABAAAAxxxx
mode:
- like
number:
like: 0 # 0 means download all
Download multiple modes at once
link:
- https://www.douyin.com/user/MS4wLjABAAAAxxxx
mode:
- post
- like
- mix
- music
Cross-mode deduplication: the same aweme_id won't be downloaded twice across different modes.
Incremental download (only new items)
increase:
post: true
database: true # incremental mode requires database
Full crawl (no item limit)
number:
post: 0
Optional Feature: Video Transcription (transcript)
Current behavior applies to video items only (image-note items do not generate transcripts).
1) Enable in config
transcript:
enabled: true
model: gpt-4o-mini-transcribe
output_dir: "" # empty: same folder as video; non-empty: mirrored to target dir
response_formats:
- txt
- json
api_key_env: OPENAI_API_KEY
api_key: "" # can be set directly, or via environment variable
Recommended to provide key through environment variable:
export OPENAI_API_KEY="sk-xxxx"
2) Output files
When enabled, it generates:
xxx.transcript.txtxxx.transcript.json
If database: true, job status is also recorded in SQLite table transcript_job (success/failed/skipped).
Key Config Fields
| Field | Description |
|---|---|
mode | Supports post/like/mix/music, can be combined |
number.post/like/mix/music | Per-mode download limit, 0 = unlimited |
increase.post/like/mix/music | Per-mode incremental toggle |
start_time / end_time | Time filter (format: YYYY-MM-DD) |
folderstyle | Create per-item subdirectories |
browser_fallback.* | Browser fallback for post when pagination is restricted |
progress.quiet_logs | Quiet logs during progress stage |
transcript.* | Optional transcription after video download |
database | Enable SQLite deduplication and history |
thread | Concurrent download count |
retry_times | Retry count on failure |
Output Structure
Default with folderstyle: true:
Downloaded/
├── download_manifest.jsonl
├── dy_downloader.db # when database: true
└── AuthorName/
├── post/
│ └── 2024-02-07_Title_aweme_id/
│ ├── ...mp4
│ ├── ..._cover.jpg
│ ├── ..._music.mp3
│ ├── ..._data.json
│ ├── ..._avatar.jpg
│ ├── ...transcript.txt
│ └── ...transcript.json
├── like/
│ └── ...
├── mix/
│ └── ...
└── music/
└── ...
Re-downloading Content
The program uses a database record + local file dual check to decide whether to skip already-downloaded content. To force re-download, you need to clean up accordingly:
Re-download a specific item
# Delete local files (folder name contains the aweme_id)
rm -rf Downloaded/AuthorName/post/*_<aweme_id>/
# Delete database record
sqlite3 dy_downloader.db "DELETE FROM aweme WHERE aweme_id = '<aweme_id>';"
Re-download all items from a specific author
rm -rf Downloaded/AuthorName/
sqlite3 dy_downloader.db "DELETE FROM aweme WHERE author_name = 'AuthorName';"
Full reset (re-download everything)
rm -rf Downloaded/
rm dy_downloader.db
Note: Deleting only the database but keeping files will NOT trigger re-download — the program scans local filenames for aweme_id to detect existing downloads. Deleting only files but keeping the database WILL trigger re-download (the program treats "in DB but missing locally" as needing retry).
FAQ
1) Why do I only get around 20 posts?
This is a common pagination risk-control behavior. Make sure:
browser_fallback.enabled: truebrowser_fallback.headless: false- complete verification manually in the browser popup, and do not close it too early
2) Why is the progress output noisy/repeated?
By default, progress.quiet_logs: true suppresses logs during progress stage.
Use --show-warnings or -v temporarily when debugging.
3) What if cookies are expired?
Run:
python -m tools.cookie_fetcher --config config.yml
4) Why are transcript files not generated?
Check in order:
- whether
transcript.enabledistrue - whether downloaded items are videos (image-notes are not transcribed)
- whether
OPENAI_API_KEY(ortranscript.api_key) is valid - whether
response_formatsincludestxtorjson
5) How to view download history?
sqlite3 dy_downloader.db "SELECT aweme_id, title, author_name, datetime(download_time, 'unixepoch', 'localtime') FROM aweme ORDER BY download_time DESC LIMIT 20;"
Legacy Version (V1.0)
If you prefer the legacy script style (V1.0):
git fetch --all
git switch V1.0
Community Group
Disclaimer
This project is for technical research, learning, and personal data management only. Please use it legally and responsibly:
- Do not use it to infringe others' privacy, copyright, or other legal rights
- Do not use it for any illegal purpose
- Users are solely responsible for all risks and liabilities arising from usage
- If platform policies or interfaces change and features break, this is a normal technical risk
By continuing to use this project, you acknowledge and accept the statements above.
License
This project is licensed under the MIT License. See LICENSE for details.