Job Type: Contract
Job Category: IT
Job Description
Hiring: Unstructured.io Developer
Location: Remote (Boston, MA)
Contract: 6–12 Months (Extendable)
Job Summary: We are seeking an experienced Unstructured.io Developer to work on enterprise-grade data ingestion and document processing solutions. The ideal candidate will have strong hands-on experience with Unstructured.io framework, data transformation pipelines, and integration with LLM / Vector DB / Search platforms. In this role, you will develop and optimize workflows for parsing, cleaning, and indexing complex enterprise documents.
Key Responsibilities
- Develop and enhance data processing pipelines using Unstructured.io for converting unstructured data (PDF, DOCX, HTML, Emails, Scans) into structured formats.
- Integrate extracted data with Vector Databases or Search Indexing workflows for LLM/RAG applications.
- Optimize parsing performance, accuracy, and consistency across various document formats.
- Work with Python-based microservices, APIs, and orchestration frameworks.
- Collaborate with Data Engineering, ML, and Product teams to design scalable ingestion architectures.
- Implement best practices for scalable, reusable pipeline components.
- Monitor, debug, and resolve pipeline issues across staging and production environments.
Required Skills & Experience
- Overall IT Experience: 8+ Years
- 3+ years hands-on experience implementing Unstructured.io in production environments.
- Strong experience with Python, including parsing, data transformation, and API development.
- Experience building RAG (Retrieval-Augmented Generation) or Document AI workflows.
- Hands-on with Vector Databases (Pinecone, Weaviate, Chroma, FAISS, Milvus, etc.).
- Familiarity with Cloud Platforms (AWS preferred).
- Experience with Docker, Git, CI/CD pipelines.
Nice to Have
- Experience with frameworks like LangChain / LlamaIndex.
- Knowledge of NLP, embeddings, and tokenization.
- Experience integrating with LLM providers (OpenAI, Anthropic, Azure OpenAI, etc.).
- Familiarity with document OCR tools (Tesseract, Azure Form Recognizer, AWS Textract).
Required Skills
Cloud Developer SQL Application Developer