The job is about joining Speechify's AI team as a Software Engineer focused on data collection for model training operations. This position plays a crucial role in building high-quality datasets at petabyte-scale, enabling the development of next-generation products. The team thrives in a 100% distributed environment, fostering collaboration and innovation.
You'll be responsible for
🔍
Finding new sources of audio data
Be scrappy to find new sources of audio data and bring it into our ingestion pipeline.☁️
Operating cloud infrastructure
Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform.🤝
Collaborating with scientists
Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost.Skills you'll need
🐍
Proficiency with bash/Python scripting
Ability to write scripts in bash or Python within Linux environments to automate tasks and processes.🐳
Proficiency in Docker and Infrastructure-as-Code
Experience with Docker for containerization and using Infrastructure-as-Code principles for managing cloud resources.☁️
Experience with cloud providers (GCP)
Professional experience working with Google Cloud Platform to manage cloud infrastructure and services.View more