back to home

jiji262 / douyin-downloader

A practical Douyin downloader for both single-item and profile batch downloads, with progress display, retries, SQLite deduplication, and browser fallback support. 抖音批量下载工具,去水印,支持视频、图集、合集、音乐(原声)。免费!免费!免费!

6,749 stars
1,049 forks
2 issues
PythonDockerfile

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing jiji262/douyin-downloader in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/jiji262/douyin-downloader)
Preview:Analyzed by RepoMind

Repository Summary (README)

Preview

Douyin Downloader V2.0

douyin-downloader

中文文档 (Chinese): README.zh-CN.md

A practical Douyin downloader supporting videos, image-notes, collections, music, and profile batch downloads, with progress display, retries, SQLite deduplication, download integrity checks, and browser fallback support.

This document targets V2.0 (main branch).
For the legacy version, switch to V1.0: git fetch --all && git switch V1.0

Feature Overview

Supported

FeatureDescription
Single video download/video/{aweme_id}
Single image-note download/note/{note_id}
Single collection download/collection/{mix_id} and /mix/{mix_id}
Single music download/music/{music_id} (prefers direct audio, fallback to first related aweme)
Short link parsinghttps://v.douyin.com/...
Profile batch download/user/{sec_uid} + mode: [post, like, mix, music]
No-watermark preferredAutomatically selects watermark-free video source
Extra assetsCover, music, avatar, JSON metadata
Video transcriptionOptional, using OpenAI Transcriptions API
Concurrent downloadsConfigurable concurrency, default 5
Retry with backoffExponential backoff (1s, 2s, 5s)
Rate limitingDefault 2 req/s
SQLite deduplicationDatabase + local file dual dedup
Incremental downloadsincrease.post/like/mix/music
Time filtersstart_time / end_time
Browser fallbackLaunches browser when pagination is blocked, manual CAPTCHA supported
Download integrity checkContent-Length validation, auto-cleanup of incomplete files
Progress displayRich progress bars, supports progress.quiet_logs quiet mode
Docker deploymentDockerfile included
CI/CDGitHub Actions for testing and linting

Current Limitations

  • Browser fallback is fully validated for post; like/mix/music currently relies on API pagination
  • number.allmix / increase.allmix are retained as compatibility aliases and normalized to mix

Quick Start

1) Requirements

  • Python 3.8+
  • macOS / Linux / Windows

2) Install dependencies

pip install -r requirements.txt

For browser fallback and automatic cookie capture:

pip install playwright
python -m playwright install chromium

3) Copy config file

cp config.example.yml config.yml

4) Get cookies (recommended: automatic)

python -m tools.cookie_fetcher --config config.yml

After logging into Douyin, return to the terminal and press Enter. Cookies will be written to your config automatically.

5) Docker deployment (optional)

docker build -t douyin-downloader .
docker run -v $(pwd)/config.yml:/app/config.yml -v $(pwd)/Downloaded:/app/Downloaded douyin-downloader

Minimal Working Config

link:
  - https://www.douyin.com/user/MS4wLjABAAAAxxxx

path: ./Downloaded/
mode:
  - post

number:
  post: 0

thread: 5
retry_times: 3
database: true

progress:
  quiet_logs: true

cookies:
  msToken: ""
  ttwid: YOUR_TTWID
  odin_tt: YOUR_ODIN_TT
  passport_csrf_token: YOUR_CSRF_TOKEN
  sid_guard: ""

browser_fallback:
  enabled: true
  headless: false
  max_scrolls: 240
  idle_rounds: 8
  wait_timeout_seconds: 600

transcript:
  enabled: false
  model: gpt-4o-mini-transcribe
  output_dir: ""
  response_formats: ["txt", "json"]
  api_url: https://api.openai.com/v1/audio/transcriptions
  api_key_env: OPENAI_API_KEY
  api_key: ""

Usage

Run with a config file

python run.py -c config.yml

Append CLI arguments

python run.py -c config.yml \
  -u "https://www.douyin.com/video/7604129988555574538" \
  -t 8 \
  -p ./Downloaded

Arguments

ArgumentDescription
-u, --urlAppend download link(s), can be repeated
-c, --configSpecify config file (default: config.yml)
-p, --pathSpecify download directory
-t, --threadSpecify concurrency
--show-warningsShow warning/error logs
-v, --verboseShow info/warning/error logs
--versionShow version number

Typical Scenarios

Download one video

link:
  - https://www.douyin.com/video/7604129988555574538

Download one image-note

link:
  - https://www.douyin.com/note/7341234567890123456

Download a collection

link:
  - https://www.douyin.com/collection/7341234567890123456

Download a music track

link:
  - https://www.douyin.com/music/7341234567890123456

Batch download a creator's posts

link:
  - https://www.douyin.com/user/MS4wLjABAAAAxxxx
mode:
  - post
number:
  post: 50

Batch download a creator's liked posts

link:
  - https://www.douyin.com/user/MS4wLjABAAAAxxxx
mode:
  - like
number:
  like: 0    # 0 means download all

Download multiple modes at once

link:
  - https://www.douyin.com/user/MS4wLjABAAAAxxxx
mode:
  - post
  - like
  - mix
  - music

Cross-mode deduplication: the same aweme_id won't be downloaded twice across different modes.

Incremental download (only new items)

increase:
  post: true
database: true    # incremental mode requires database

Full crawl (no item limit)

number:
  post: 0

Optional Feature: Video Transcription (transcript)

Current behavior applies to video items only (image-note items do not generate transcripts).

1) Enable in config

transcript:
  enabled: true
  model: gpt-4o-mini-transcribe
  output_dir: ""        # empty: same folder as video; non-empty: mirrored to target dir
  response_formats:
    - txt
    - json
  api_key_env: OPENAI_API_KEY
  api_key: ""           # can be set directly, or via environment variable

Recommended to provide key through environment variable:

export OPENAI_API_KEY="sk-xxxx"

2) Output files

When enabled, it generates:

  • xxx.transcript.txt
  • xxx.transcript.json

If database: true, job status is also recorded in SQLite table transcript_job (success/failed/skipped).

Key Config Fields

FieldDescription
modeSupports post/like/mix/music, can be combined
number.post/like/mix/musicPer-mode download limit, 0 = unlimited
increase.post/like/mix/musicPer-mode incremental toggle
start_time / end_timeTime filter (format: YYYY-MM-DD)
folderstyleCreate per-item subdirectories
browser_fallback.*Browser fallback for post when pagination is restricted
progress.quiet_logsQuiet logs during progress stage
transcript.*Optional transcription after video download
databaseEnable SQLite deduplication and history
threadConcurrent download count
retry_timesRetry count on failure

Output Structure

Default with folderstyle: true:

Downloaded/
├── download_manifest.jsonl
├── dy_downloader.db          # when database: true
└── AuthorName/
    ├── post/
    │   └── 2024-02-07_Title_aweme_id/
    │       ├── ...mp4
    │       ├── ..._cover.jpg
    │       ├── ..._music.mp3
    │       ├── ..._data.json
    │       ├── ..._avatar.jpg
    │       ├── ...transcript.txt
    │       └── ...transcript.json
    ├── like/
    │   └── ...
    ├── mix/
    │   └── ...
    └── music/
        └── ...

Re-downloading Content

The program uses a database record + local file dual check to decide whether to skip already-downloaded content. To force re-download, you need to clean up accordingly:

Re-download a specific item

# Delete local files (folder name contains the aweme_id)
rm -rf Downloaded/AuthorName/post/*_<aweme_id>/

# Delete database record
sqlite3 dy_downloader.db "DELETE FROM aweme WHERE aweme_id = '<aweme_id>';"

Re-download all items from a specific author

rm -rf Downloaded/AuthorName/
sqlite3 dy_downloader.db "DELETE FROM aweme WHERE author_name = 'AuthorName';"

Full reset (re-download everything)

rm -rf Downloaded/
rm dy_downloader.db

Note: Deleting only the database but keeping files will NOT trigger re-download — the program scans local filenames for aweme_id to detect existing downloads. Deleting only files but keeping the database WILL trigger re-download (the program treats "in DB but missing locally" as needing retry).

FAQ

1) Why do I only get around 20 posts?

This is a common pagination risk-control behavior. Make sure:

  • browser_fallback.enabled: true
  • browser_fallback.headless: false
  • complete verification manually in the browser popup, and do not close it too early

2) Why is the progress output noisy/repeated?

By default, progress.quiet_logs: true suppresses logs during progress stage.
Use --show-warnings or -v temporarily when debugging.

3) What if cookies are expired?

Run:

python -m tools.cookie_fetcher --config config.yml

4) Why are transcript files not generated?

Check in order:

  • whether transcript.enabled is true
  • whether downloaded items are videos (image-notes are not transcribed)
  • whether OPENAI_API_KEY (or transcript.api_key) is valid
  • whether response_formats includes txt or json

5) How to view download history?

sqlite3 dy_downloader.db "SELECT aweme_id, title, author_name, datetime(download_time, 'unixepoch', 'localtime') FROM aweme ORDER BY download_time DESC LIMIT 20;"

Legacy Version (V1.0)

If you prefer the legacy script style (V1.0):

git fetch --all
git switch V1.0

Community Group

qun

Disclaimer

This project is for technical research, learning, and personal data management only. Please use it legally and responsibly:

  • Do not use it to infringe others' privacy, copyright, or other legal rights
  • Do not use it for any illegal purpose
  • Users are solely responsible for all risks and liabilities arising from usage
  • If platform policies or interfaces change and features break, this is a normal technical risk

By continuing to use this project, you acknowledge and accept the statements above.

License

This project is licensed under the MIT License. See LICENSE for details.