We are excited to announce the release of SynapseML v1.1 marking a host of powerful new features introduced since the initial v1.0 release. SynapseML is an open-source library that aims to streamline the development of massively scalable machine learning pipelines. It unifies several existing ML Frameworks and new Microsoft algorithms in a single, scalable API that is usable across Python, R, Scala, and Java. SynapseML is usable from any Apache Spark platform with first class enterprise support on Microsoft Fabric.
Highlights
![]() |
![]() |
![]() |
|---|---|---|
| Microsoft Fabric | AI Functions | OneLake |
| Build and operationalize distributed ML with SynapseML in Fabric | Apply Pandas and Spark LLM transformations with one line of code | Automatically derive AI insights for unstructured data in OneLake |
| Build Your First Model | Explore AI Functions | Learn More |
![]() |
![]() |
|---|---|
| Hugging Face | Azure AI Foundry |
| Use open source models hosted on Hugging Face | Run Azure AI Foundry models in your notebook |
| Try an Example | View Notebook |
More Hightlights
Spark 3.5 Support – In this version we transitioned to Spark 3.5 as our main Spark platform.
OpenAI Ecosystem – Comprehensive improvements including global parameter defaults, GPT-4 enablement, custom endpoints/headers, GPU-accelerated embeddings with KNN, and fine-grained control over model parameters (top_p, seed, responseFormat, temperature).
ML Innovation – HuggingFaceCausalLM transformer for distributed language model evaluation, custom embedder support, and synthetic difference-in-differences causal inference module.
Platform features – Spark Native OneLake support; MSI for Azure Storage; OpenAITranslate transformer.
AI Functions in Data Wrangler on Fabric – AI Functions built into Data Wrangler in Fabric allow you to apply LLM-powered operations to your dataframe without writing a single line of code.
New Features
Documentation 📚
- AI Functions
- AI Powered Transforms in OneLake
- Azure OpenAI for Big Data in SynapseML
- AI Functions in Data Wrangler
- AI Foundry
- Hugging Face
AI Functions ⚡
- Added support for AI Functions in Pandas (#1579613, #1585011, #1596611, #1509195, #1501185, #1494610, #1494951)
- Added support for AI Functions in PySpark (#1460790, #1572928, #1599735, #1439858, #1463533)
- Added support for async AI Functions execution (#1529058, #1523727)
- JSON response support & improved language validation. (#1551823, #1566189)
- Seed param for reproducible chat/completions (API 2024-10-21). (#1551883)
- Fuzzy case-insensitive matching for Classify. (#1515064)
- Add AI Functions Operations to Data Wrangler (#1590130, #1638257, #1718967, #1725101, #1730446)
Azure OpenAI 🌸
- Enhanced Model Parameters – Added top_p, seed, responseFormat, temperature, and subscription key support (#2410, #2329, #2324)
- GPT-4 Enablement – Full GPT-4 support in OpenAIPrompt (#2248)
- Custom Endpoints & Headers – Support for custom URL endpoints and HTTP headers (#2232)
- GPU-Accelerated Embeddings – OpenAI embeddings with GPU-based KNN pipeline (#2157)
- Embedding Dimensions Control – Configurable dimensions parameter for OpenAIEmbedding (#2215)
- Global Parameter Defaults – Centralized OpenAI parameter management with Python wrapper support (#2318, #2327)
- Updated OpenAI API version to 2024 (#2190)
- Updated OpenAIDefaults implementation (#2415)
- OpenAIPrompt bug fixes and improvements (#2334)
- Added responseFormat parameter to Chat Completion (#2329)
- Optimized getOptionalParams in HasOpenAITextParams (#2315)
OneLake 🌊
- Add Spark Native OneLake support (#1190687)
Machine Learning 🕸️
- HuggingFaceCausalLM – Transformer for evaluating language models on Spark clusters (#2301)
- Custom Embedder – Extensible custom embedding transformer support (#2236)
- Synthetic DiD – Synthetic difference-in-differences module for causal inference (#2095)
Azure AI Foundry 🔨
- AIFoundryChatCompletion – New transformer for Azure AI Foundry chat models (#2398)
- AI Foundry + OpenAI Prompt – Unified interface for OpenAI and Foundry deployments (#2404)
General ✨
- Add Spark 3.5 Support – Added full Spark 3.5 compatibility with new build variants (#2052)
- Python 3.11 Baseline – Upgraded to Python 3.11 as minimum version (#2193)
- Fabric Billing Integration – Enhanced Fabric Cognitive Service token for billing support (#2291)
- Fabric WSPL FQDN Selection – Configurable Fabric workspace FQDN endpoints (#2376)
- Added Bool input support for ONNX models (#2130)
- Switched to MSI-based tokens (#2221)
- Updated Azure KeyVault task version (#2313)
- Updated build system service principals (#2181)
- Fabric & Azure Integration (#2160, #2403, #2335, #2175, #2225, #2222, #2173)
Azure AI Services & Search 🧠
- Azure Search GeoJSON Support – AzureSearchWriter now handles GeoJSON data types (#2422)
- Async Language Summarization – Cognitive Services Language Service with asynchronous summarization support (#2342)
- Auto-convert DateType/TimestampType to ISO-8601 format (#2381)
- Added support for scoring profiles in index parsing (#2383)
- Fixed model checking logic (#2379)
Additional Updates
Bug Fixes 🐞
- Early-exit behavior for async first-group errors; validated max_concurrency; add Conf getters/setters. (#1566664)
- Early exit async + missing NaN for Series. (#1595139)
- Public-preview: appropriate client init errors. (#1579646)
- Handle BadRequestError for content_filter input flag. (#1552020)
- AIRateLimitError param passthrough; clearer AIFuncRateLimitError messages. (#1545892, #1530448)
- Prevent double delete in Spark notebook test cleanup. (#1569780)
- Ensure value_counts works with numeric outputs. (#1585011)
- Fix for _ResultDtype.name type. (#1586122)
- Fix failed Scala tests. (#1395337)
- Fixed KeyError in Python environments (#2264, #2368)
- Fixed pyCodeGenImpl tag handling (#2194)
- Fixed telemetry-properties header handling (#2375)
- Fixed GPT review action (#2112)
- Updated default Anomaly Detector version to non-preview (#2280)
- Repaired failing speech tests (#2179)
- Fixed TranslatorSuite unit test break (#2111)
- Fixed LightGBM model loading when placed in Spark Pipeline with custom transformers (#2357)
- Fixed companionModelClassName generic type variable issue (#2195)
- Fixed Java class loader usage (#2135)
- Fixed LangChain crash on OpenAI versions >1.0.0 (#2307)
- Improved OpenAI prompt behavior on RAI (Responsible AI) errors (#2279)
- Enhanced error handling in networking layer (#2412)
- Fixed error handling for LangChain transformer (#2137)
- Fixed token acquisition on system context (#2378)
- Made setCustomHeaders compatible with PySpark (#2247)
- Added trailing slash normalization for URL sets (#2364)
- Switched to MWC token for tenant setting checks (#2300)
- Stopped Double Releases – Streamlined release process (#2419)
Installation & Setup 💾
- Fixed install instructions (#2136)
- Updated CONTRIBUTING.md (#2138)
- Updated Developer Setup to remove WinUtils and include ScalaTest configuration (#2244)
- Removed Spark 3.2 instructions (no longer supported)
- Fixed README install instructions
API & Service Documentation 📃
- Updated Anomaly Detector docs (#2103)
- Updated Isolation Forest documentation (#2210)
- Updated multivariate anomaly detection Fabric doc (#2191)
- Added analyze text document examples (#2127)
- Updated to use new AnalyzeText API in docs (#2126)
- Updated find_secret on Fabric documentation (#2132)
- Pointed Cognitive APIs documentation to Azure AI (#2119)
- Clarified default dataTransferMode is streaming, not bulk (#2377)
- Added audiobook paper to README
- Raised error with documentation link for find_secret (#2180)
Contributor Spotlight
We are excited to highlight the contributions of the following SynapseML contributors:
![]() |
![]() |
![]() |
|---|---|---|
| Rana Singh | Farrukh Masud | Tom Finley |
| Rana is the Senior Engineering Manager for SynapseML and was instrumental in improving the prompt engineering that powers AI Functions. Working alongside Tom, he helped build the feature from the ground up, ensuring high-quality and reliable AI-powered transformations. His attention to detail in refining prompts has made AI Functions more accurate and his leadership has been essential to the initiative’s success. | Farrukh is a Principal Engineer on the Code-First AI team and a prolific contributor this release. He was key in lighting up AI-powered transforms in OneLake, allowing users to apply AI transformations directly through shortcuts, dramatically expanding the reach of cognitive services across Fabric. Farrukh's contributions to Fabric integrations continue to expand the possibilities for AI-powered data workflows. | Tom is a Principal Engineer on the Code-First AI team and was pivotal in the API design for AI Functions. His thoughtful decision-making shaped AI Functions that are both powerful and intuitive. Working closely with Rana, he helped architect the feature from its earliest stages, making key choices that have shaped how users interact with AI Functions. Tom's design sensibility and technical expertise have been foundational to the feature's success. |
![]() |
![]() |
![]() |
| Jessica Wang | Wendong Li | Samhitha Mamindla |
| Jessica is a Software Engineer on the SynapseML team and architected the AI Foundry integration. She has been a consistent and reliable contributor across multiple releases, building robust features and working directly with customers to understand their needs. Her integrations with AI Foundry and Hugging Face have been invaluable in helping SynapseML bridge the gap between closed and open source communities. We're excited for her continued impact on the team. | Wendong is a Software Engineer who recently joined the SynapseML team and has already emerged as a rising star. He single-handedly built multimodal AI Functions, enabling seamless transformations across text, images, and other data types. These are significant contributions for someone so new to the team, demonstrating both technical prowess and the ability to deliver complex features independently. We're eager to see what Wendong builds next. | Samhitha is a Software Engineer II who recently joined the SynapseML team and has quickly become pivotal in evaluating the quality and reliability of AI Functions. Samhitha works through logging infrastructure to ensure we can track, measure, and improve the performance of these features. Her meticulous approach to monitoring has been essential for continuously improving the user experience. |
![]() |
![]() |
![]() |
| Virginia Roman | Elias Yousefi | Shyam Sai |
| Virginia is a Senior Product Manager on the SynapseML team and leads AI Functions in Fabric. She presents and shares AI Functions with customers, gathering critical feedback that shapes the roadmap. Virginia also champions the integration of AI capabilities into Data Wrangler, making AI-powered data transformations accessible to a broader audience of data professionals. Virginia's product leadership is invaluabel, and we're lucky to have her as a collaborator. | Elias is a Senior Engineer on the Code-First AI team and is building core service infrastructure for AI Functions. He is driving the next generation of integrations with AI Functions, though many of these exciting developments are still under wraps. We look forward to sharing more of his contributions as they roll out publicly. | Shyam was a key contributor during his time with SynapseML and is now Co-Founder at Frizzle. Shyam brought AI Functions to PySpark, giving Spark users access to the same capabilities already available in Pandas and expanding the feature's reach to a broader developer community. Though Shyam has moved on to his next venture at Frizzle, his contributions and presence will be missed by the SynapseML team. |
Acknowledgements ❤️
We would like to acknowledge the developers and contributors, both internal and external, who helped create this version of SynapseML
| Mark Hamilton @mhamilton723 | Markus Weimer | Avrilia Floratou |
| Eric Dettinger @sandshadow | Virginia Roman | Eren Orbey @orbey |
| Rana Singh @ranadeepsingh | Farrukh Masud @FarrukhMasud | Tom Finley @TomFinley |
| Brendan Walsh @BrendanWalsh | Jessica Wang @JessicaXYWang | Wendong Li @levscaut |
| Elias Yousefi @elyousef | Shyam Sai @sss04 | Shirley Huang |
| Chongyu Wang | Ruoyu Jing | Bo Zhou |
| Paul Wang @pwang347 | Kyle Cutler @kycutler | Dan Vilnoiu |
| Scott Votaw @svotaw | Mark Niehaus @niehaus59 | Aydan Aksoylar @aydan-at-microsoft |
| Sheryl Zhao @sherylZhaoCode | Markus Cozowicz @eisber | Sailesh Baidya @saileshbaidya |
| Keerthi Yanda @KeerthiYandaOS | Kyle Rush @k-rush | Aadharsh Kannan @AKannanMSFT |
| Serena Ruan @serena-ruan | Cruise Li @mslhrotk @lhrotk | Jason Wang @memoryz |
| Haizhou (Dylan) Wang @dylanw-oss | Sarah Shy @sarahshy | Kashyap Patel @ms-kashyap |
| Puneet Pruthi @ppruthi | Ilya Matiach @imatiach-msft | Amir Jafari @amhjf |
| Nellie Gustafsson | Bogdan Crivat | Justyna Lucznik @juluczni |
| Richard Wydrowski @richwyd | Tania Arya @taniaarya | Adithya Mukund @adithyamukund |
| Roman Batoukov @RomanBat | Alexandra Savelieva @alsavelv | Jessica Wolk @msplants |
| Luis França @luisffranca | Paul Koch @paulbkoch | Rich Caruana |
| Martha Laguna @martthalch @marthalc | Jeff Zheng | Sicong Yang |
| Peixian Gong | Ruixin Xu | Chris Hoder |
| Derek Legenzoff | Misha Desai | Beverly Kodhek |
| Louise Han @jr-MS | Raj Rikhy | Brice Chung |
| Marcos Campos | Mike Estee | Kim Manis |
| Mitrabhanu Mohanty | Anand Raman | Sudarshan Raghunathan @drdarshan |
| William T. Freeman | Gregory B. Newby | John Moyer |
| Vidip Acharya | Ashit Gosalia | Miguel Fierro @miguelgfierro |
| Ismaël Mejía @iemejia | Kartavya Neema @kartavyaneema | Daniel Ciborowski @dciborow |
| Mark Tabladillo @marktab | Guilherme Beltramini @gcbeltramini | Akshaya Annavajhala (AK) |
| James Verbus @jverbus | Mopé Akande @msakande | Ikko Eltociear Ashimine @eltociear |
| Alexander Spiridonov @vonodiripsa | Hiroshi Yoshioka @hyoshioka0128 | Frank Solomon @fbsolo-ms1 |
| Leonard Herold @LeonardHd | David Smith @dsmith111 | Denniz Svens @DennizSvens |
| João Moura @operte | Sean Marihugh | ONNX Team |
| Azure Global | Vowpal Wabbit Team | LightGBM Team |
| MSFT Garage Team | MSR Outreach Team | Speech SDK Team |
| MLflow Team | Azure Docs Team |
Learn More
Full Changelog: v1.0.0...v1.1.0



















