About Our Project

Text-to-Speech (TTS) Bot Challenge

Challenge Overview

The task is to develop a Python program that converts text into speech using pre-recorded phonemes of the developer's own voice. The goal is to create this system without relying on external APIs or libraries for TTS conversion, showcasing ingenuity and technical skills in natural language processing and sound synthesis.

Input

A text string provided by the user.

Output

The user-provided text is articulated aloud using the pre-recorded samples of the developer's voice.

Requirements & Features

Key Components and Functionality

Phoneme Directory and Mapping


    # Example of phoneme mapping
    PHONEME_DIR = 'path_to_phoneme_directory/'
    phoneme_map = {
        'ا': 'a.wav',
        'ب': 'b.wav',
        'پا': 'pa.wav',
        # More mappings here
    }
    

The script uses a directory containing WAV files for each phoneme. A mapping dictionary associates Persian letters or letter combinations with phoneme file names.

Text Normalization


    def normalize_text(text):
        # Implement text normalization, e.g., remove punctuation, diacritics
        normalized_text = text.lower() # Simplified normalization example
        return normalized_text
    

The function cleans input text for consistent processing.

Text to Phonemes Conversion


    def text_to_phonemes(text):
        phonemes = []
        for char in text:
            if char in phoneme_map:
                phonemes.append(phoneme_map[char])
        return phonemes
    

Maps normalized text to phonemes using the defined phoneme map.

Audio Processing Functions


    def trim_silence(phoneme_file):
        # Remove silence from phoneme file
        pass
    
    def normalize_volume(phoneme_file):
        # Adjust volume for consistency
        pass
    

These functions ensure audio quality by trimming silence and normalizing volume.

Speech Synthesis


    def synthesize_speech(phonemes):
        # Combine phoneme WAV files into a single audio file
        speech_output = 'output.wav'
        # Audio processing logic to combine phonemes
        pass
    

Combines phoneme files into a coherent speech output, handling pauses and transitions.

Usage and Application

Input Requirement: The user inputs Persian text via console.
Output: The synthesized speech is saved as an audio file.
Customization: Modify phoneme mapping and add recordings for other languages or combinations.

Challenges and Considerations

This project provides insight into the mechanics of TTS systems and offers a foundational approach for further enhancements.

Project 1
Project 2