Khant Sint Heinn
NLP Engineer & Data Foundation Specialist
Pakokku, Myanmar
About
Passionate Machine Learning Engineer focused on NLP and data-centric AI. Experienced in building robust data foundations for low-resource languages, with a proven track record of curating open-source datasets and developing specialized linguistic tools to democratize AI innovation.
Experience
-
- Founder & Lead AI Scientist @ DatarrX FoundationGlobal - RemoteSummary:
- Leading DatarrX as a non-profit foundation dedicated to building a high-quality data foundation for the Burmese language in the AI era. Responsible for defining the long-term vision, mission, and technical roadmap while orchestrating a vibrant open-source community.
Responsibilities:
- Setting the strategic vision and roadmap to transform Burmese into a data-rich language for AI innovation.
- Reviewing, validating, and merging community contributions across GitHub, Hugging Face, and Kaggle repositories.
- Mentoring contributors and maintaining high coding/data quality standards through rigorous code reviews and documentation.
- Designing and implementing scalable data collection and synthetic generation pipelines to bridge language gaps.
- Facilitating collaborative workflows between developers, linguists, and tech enthusiasts to foster an inclusive AI ecosystem.
Achievements:
- Successfully built and maintained a diverse ecosystem of 30+ open-source datasets and tools.
- Translated and contributed the comprehensive LLM Course to the local community, promoting AI education accessibility.
- Maintained a transparent and open contribution model that encourages multi-disciplinary participation.
Projects
Skills
Education
University of the People
Computer Science
Languages
- Burmese Native speaker
- English Intermediate
Interests
- Artificial Intelligence NLP Research, Large Language Models, Data-Centric AI
- Open Source Community Building, Democratizing AI, Collaborative Development
- Linguistics Computational Linguistics, Script Morphology, Language Preservation