soundwave/docs/LYRICS_FEATURE.md
Iulian 51679d1943 Initial commit - SoundWave v1.0
- Full PWA support with offline capabilities
- Comprehensive search across songs, playlists, and channels
- Offline playlist manager with download tracking
- Pre-built frontend for zero-build deployment
- Docker-based deployment with docker compose
- Material-UI dark theme interface
- YouTube audio download and management
- Multi-user authentication support
2025-12-16 23:43:07 +00:00

6.8 KiB

Lyrics Feature - SoundWave

Overview

SoundWave now includes automatic lyrics fetching and synchronized display powered by the LRCLIB API. This feature provides:

  • Automatic lyrics fetching for newly downloaded audio
  • Synchronized lyrics display with real-time highlighting
  • Caching system to minimize API requests
  • Background polling to gradually build lyrics library
  • Manual controls for fetching, updating, and managing lyrics

How It Works

1. Automatic Fetching

When you download an audio file, SoundWave automatically:

  1. Extracts metadata (title, artist, duration)
  2. Queries LRCLIB API for lyrics
  3. Stores synchronized (.lrc) or plain text lyrics
  4. Caches results to avoid duplicate requests

2. Background Polling

A Celery beat schedule runs periodic tasks:

  • Every hour: Auto-fetch lyrics for up to 50 tracks without lyrics
  • Weekly (Sunday 3 AM): Clean up old cache entries (30+ days)
  • Weekly (Sunday 4 AM): Retry failed lyrics fetches (7+ days old)

3. Smart Caching

Two-level caching system:

  1. Django Cache: In-memory cache for API responses (7 days)
  2. Database Cache: LyricsCache table stores lyrics by title/artist/duration

This ensures:

  • Minimal API requests (respecting rate limits)
  • Fast lyrics retrieval
  • Shared cache across tracks with same metadata

API Endpoints

Get Lyrics for Audio

GET /api/audio/{youtube_id}/lyrics/

Returns lyrics data or triggers async fetch if not attempted.

Manually Fetch Lyrics

POST /api/audio/{youtube_id}/lyrics/fetch/
Body: { "force": true }

Forces immediate lyrics fetch from LRCLIB API.

Update Lyrics Manually

PUT /api/audio/{youtube_id}/lyrics/
Body: {
  "synced_lyrics": "[00:12.00]Lyrics text...",
  "plain_lyrics": "Plain text lyrics...",
  "is_instrumental": false,
  "language": "en"
}

Delete Lyrics

DELETE /api/audio/{youtube_id}/lyrics/

Batch Fetch

POST /api/audio/lyrics/fetch_batch/
Body: { "youtube_ids": ["abc123", "def456"] }

Fetch All Missing

POST /api/audio/lyrics/fetch_all_missing/
Body: { "limit": 50 }

Statistics

GET /api/audio/lyrics/stats/

Returns:

{
  "total_audio": 1250,
  "total_lyrics_attempted": 980,
  "with_synced_lyrics": 720,
  "with_plain_lyrics": 150,
  "instrumental": 30,
  "failed": 80,
  "coverage_percentage": 72.0
}

Frontend Components

LyricsPlayer Component

import LyricsPlayer from '@/components/LyricsPlayer';

<LyricsPlayer
  youtubeId="abc123"
  currentTime={45.2}
  onClose={() => setShowLyrics(false)}
  embedded={false}
/>

Features:

  • Real-time synchronized highlighting
  • Auto-scroll with toggle
  • Synced/Plain text tabs
  • Retry fetch button
  • Instrumental detection

Props

  • youtubeId: YouTube video ID
  • currentTime: Current playback time in seconds
  • onClose: Callback when closed (optional)
  • embedded: Compact mode flag (optional)

Database Models

Lyrics Model

class Lyrics(models.Model):
    audio = OneToOneField(Audio)
    synced_lyrics = TextField()
    plain_lyrics = TextField()
    is_instrumental = BooleanField()
    source = CharField()  # 'lrclib', 'genius', 'manual'
    language = CharField()
    fetched_date = DateTimeField()
    fetch_attempted = BooleanField()
    fetch_attempts = IntegerField()
    last_error = TextField()

LyricsCache Model

class LyricsCache(models.Model):
    title = CharField()
    artist_name = CharField()
    album_name = CharField()
    duration = IntegerField()
    synced_lyrics = TextField()
    plain_lyrics = TextField()
    is_instrumental = BooleanField()
    language = CharField()
    source = CharField()
    cached_date = DateTimeField()
    last_accessed = DateTimeField()
    access_count = IntegerField()
    not_found = BooleanField()

Celery Tasks

fetch_lyrics_for_audio

from audio.tasks_lyrics import fetch_lyrics_for_audio

fetch_lyrics_for_audio.delay('youtube_id', force=False)

fetch_lyrics_batch

from audio.tasks_lyrics import fetch_lyrics_batch

fetch_lyrics_batch.delay(['id1', 'id2', 'id3'], delay_seconds=2)

auto_fetch_lyrics

from audio.tasks_lyrics import auto_fetch_lyrics

auto_fetch_lyrics.delay(limit=50, max_attempts=3)

cleanup_lyrics_cache

from audio.tasks_lyrics import cleanup_lyrics_cache

cleanup_lyrics_cache.delay(days_old=30)

refetch_failed_lyrics

from audio.tasks_lyrics import refetch_failed_lyrics

refetch_failed_lyrics.delay(days_old=7, limit=20)

Configuration

Celery Beat Schedule

Located in backend/config/celery.py:

app.conf.beat_schedule = {
    'auto-fetch-lyrics': {
        'task': 'audio.auto_fetch_lyrics',
        'schedule': crontab(minute=0),  # Every hour
        'kwargs': {'limit': 50, 'max_attempts': 3},
    },
    # ... more tasks
}

LRCLIB Instance

Default: https://lrclib.net

To use custom instance:

from audio.lyrics_service import LyricsService

service = LyricsService(lrclib_instance='https://custom.lrclib.net')

LRC Format

Synchronized lyrics use the LRC format:

[ar: Artist Name]
[ti: Song Title]
[al: Album Name]
[00:12.00]First line of lyrics
[00:15.50]Second line of lyrics
[00:18.20]Third line of lyrics

Timestamps format: [mm:ss.xx]

  • mm: Minutes (2 digits)
  • ss: Seconds (2 digits)
  • xx: Centiseconds (2 digits)

Admin Interface

Django Admin provides:

Lyrics Admin

  • List view with filters (source, language, fetch status)
  • Search by audio title/channel/youtube_id
  • Edit synced/plain lyrics
  • View fetch attempts and errors

LyricsCache Admin

  • List view with filters (source, not_found, date)
  • Search by title/artist
  • View access count statistics
  • Bulk action: Clear not_found entries

Rate Limiting

To avoid overwhelming LRCLIB API:

  1. Request delays: 1-2 second delays between batch requests
  2. Caching: 7-day cache for successful fetches, 1-day for not_found
  3. Max attempts: Stop after 3-5 failed attempts
  4. Retry backoff: Wait 7+ days before retrying failed fetches

Troubleshooting

No lyrics found

  • Check if track metadata (title, artist) is accurate
  • Try manual fetch with force=true
  • Check LRCLIB database has lyrics for this track
  • Verify track isn't instrumental

Sync issues

  • Ensure audio duration matches lyrics timing
  • Check LRC format is valid (use validator)
  • Verify current_time prop is updated correctly

Performance

  • Monitor cache hit rate: /api/audio/lyrics-cache/stats/
  • Clear old not_found entries regularly
  • Adjust Celery beat schedule if needed

Credits

License

This feature is part of SoundWave and follows the same MIT license.