This repo provides a command-line tool for performing automatic speech-to-text tasks (i.e., "transcription") using open source models from Hugging Face Hub. For interactive tasks, it allows users to ...
Abstract: Image Caption generation is one of the challenging tasks in the field of artificial intelligence. It is used to generate a textual description for a given picture. But due to, the recent ...
An ESP32 client that captures audio over I2S and posts WAV to a server. A lightweight Flask/Gunicorn server that returns JSON transcriptions via speech_recognition. Designed for deterministic embedded ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果