Release SynapseML v1.1.0 · microsoft/SynapseML

We are excited to announce the release of SynapseML v1.1 marking a host of powerful new features introduced since the initial v1.0 release. SynapseML is an open-source library that aims to streamline the development of massively scalable machine learning pipelines. It unifies several existing ML Frameworks and new Microsoft algorithms in a single, scalable API that is usable across Python, R, Scala, and Java. SynapseML is usable from any Apache Spark platform with first class enterprise support on Microsoft Fabric.

Highlights


Microsoft Fabric	AI Functions	OneLake
Build and operationalize distributed ML with SynapseML in Fabric	Apply Pandas and Spark LLM transformations with one line of code	Automatically derive AI insights for unstructured data in OneLake
Build Your First Model	Explore AI Functions	Learn More


Hugging Face	Azure AI Foundry
Use open source models hosted on Hugging Face	Run Azure AI Foundry models in your notebook
Try an Example	View Notebook

More Hightlights

Spark 3.5 Support – In this version we transitioned to Spark 3.5 as our main Spark platform.

OpenAI Ecosystem – Comprehensive improvements including global parameter defaults, GPT-4 enablement, custom endpoints/headers, GPU-accelerated embeddings with KNN, and fine-grained control over model parameters (top_p, seed, responseFormat, temperature).

ML Innovation – HuggingFaceCausalLM transformer for distributed language model evaluation, custom embedder support, and synthetic difference-in-differences causal inference module.

Platform features – Spark Native OneLake support; MSI for Azure Storage; OpenAITranslate transformer.

AI Functions in Data Wrangler on Fabric – AI Functions built into Data Wrangler in Fabric allow you to apply LLM-powered operations to your dataframe without writing a single line of code.

New Features

Documentation 📚

AI Functions ⚡

Added support for AI Functions in Pandas (#1579613, #1585011, #1596611, #1509195, #1501185, #1494610, #1494951)
Added support for AI Functions in PySpark (#1460790, #1572928, #1599735, #1439858, #1463533)
Added support for async AI Functions execution (#1529058, #1523727)
JSON response support & improved language validation. (#1551823, #1566189)
Seed param for reproducible chat/completions (API 2024-10-21). (#1551883)
Fuzzy case-insensitive matching for Classify. (#1515064)
Add AI Functions Operations to Data Wrangler (#1590130, #1638257, #1718967, #1725101, #1730446)

Azure OpenAI 🌸

Enhanced Model Parameters – Added top_p, seed, responseFormat, temperature, and subscription key support (#2410, #2329, #2324)
GPT-4 Enablement – Full GPT-4 support in OpenAIPrompt (#2248)
Custom Endpoints & Headers – Support for custom URL endpoints and HTTP headers (#2232)
GPU-Accelerated Embeddings – OpenAI embeddings with GPU-based KNN pipeline (#2157)
Embedding Dimensions Control – Configurable dimensions parameter for OpenAIEmbedding (#2215)
Global Parameter Defaults – Centralized OpenAI parameter management with Python wrapper support (#2318, #2327)
Updated OpenAI API version to 2024 (#2190)
Updated OpenAIDefaults implementation (#2415)
OpenAIPrompt bug fixes and improvements (#2334)
Added responseFormat parameter to Chat Completion (#2329)
Optimized getOptionalParams in HasOpenAITextParams (#2315)

OneLake 🌊

Add Spark Native OneLake support (#1190687)

Machine Learning 🕸️

HuggingFaceCausalLM – Transformer for evaluating language models on Spark clusters (#2301)
Custom Embedder – Extensible custom embedding transformer support (#2236)
Synthetic DiD – Synthetic difference-in-differences module for causal inference (#2095)

Azure AI Foundry 🔨

AIFoundryChatCompletion – New transformer for Azure AI Foundry chat models (#2398)
AI Foundry + OpenAI Prompt – Unified interface for OpenAI and Foundry deployments (#2404)

General ✨

Add Spark 3.5 Support – Added full Spark 3.5 compatibility with new build variants (#2052)
Python 3.11 Baseline – Upgraded to Python 3.11 as minimum version (#2193)
Fabric Billing Integration – Enhanced Fabric Cognitive Service token for billing support (#2291)
Fabric WSPL FQDN Selection – Configurable Fabric workspace FQDN endpoints (#2376)
Added Bool input support for ONNX models (#2130)
Switched to MSI-based tokens (#2221)
Updated Azure KeyVault task version (#2313)
Updated build system service principals (#2181)
Fabric & Azure Integration (#2160, #2403, #2335, #2175, #2225, #2222, #2173)

Azure AI Services & Search 🧠

Azure Search GeoJSON Support – AzureSearchWriter now handles GeoJSON data types (#2422)
Async Language Summarization – Cognitive Services Language Service with asynchronous summarization support (#2342)
Auto-convert DateType/TimestampType to ISO-8601 format (#2381)
Added support for scoring profiles in index parsing (#2383)
Fixed model checking logic (#2379)

Additional Updates

Bug Fixes 🐞

Early-exit behavior for async first-group errors; validated max_concurrency; add Conf getters/setters. (#1566664)
Early exit async + missing NaN for Series. (#1595139)
Public-preview: appropriate client init errors. (#1579646)
Handle BadRequestError for content_filter input flag. (#1552020)
AIRateLimitError param passthrough; clearer AIFuncRateLimitError messages. (#1545892, #1530448)
Prevent double delete in Spark notebook test cleanup. (#1569780)
Ensure value_counts works with numeric outputs. (#1585011)
Fix for _ResultDtype.name type. (#1586122)
Fix failed Scala tests. (#1395337)
Fixed KeyError in Python environments (#2264, #2368)
Fixed pyCodeGenImpl tag handling (#2194)
Fixed telemetry-properties header handling (#2375)
Fixed GPT review action (#2112)
Updated default Anomaly Detector version to non-preview (#2280)
Repaired failing speech tests (#2179)
Fixed TranslatorSuite unit test break (#2111)
Fixed LightGBM model loading when placed in Spark Pipeline with custom transformers (#2357)
Fixed companionModelClassName generic type variable issue (#2195)
Fixed Java class loader usage (#2135)
Fixed LangChain crash on OpenAI versions >1.0.0 (#2307)
Improved OpenAI prompt behavior on RAI (Responsible AI) errors (#2279)
Enhanced error handling in networking layer (#2412)
Fixed error handling for LangChain transformer (#2137)
Fixed token acquisition on system context (#2378)
Made setCustomHeaders compatible with PySpark (#2247)
Added trailing slash normalization for URL sets (#2364)
Switched to MWC token for tenant setting checks (#2300)
Stopped Double Releases – Streamlined release process (#2419)

Installation & Setup 💾

Fixed install instructions (#2136)
Updated CONTRIBUTING.md (#2138)
Updated Developer Setup to remove WinUtils and include ScalaTest configuration (#2244)
Removed Spark 3.2 instructions (no longer supported)
Fixed README install instructions

API & Service Documentation 📃

Updated Anomaly Detector docs (#2103)
Updated Isolation Forest documentation (#2210)
Updated multivariate anomaly detection Fabric doc (#2191)
Added analyze text document examples (#2127)
Updated to use new AnalyzeText API in docs (#2126)
Updated find_secret on Fabric documentation (#2132)
Pointed Cognitive APIs documentation to Azure AI (#2119)
Clarified default dataTransferMode is streaming, not bulk (#2377)
Added audiobook paper to README
Raised error with documentation link for find_secret (#2180)

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:


Rana Singh	Farrukh Masud	Tom Finley
Rana is the Senior Engineering Manager for SynapseML and was instrumental in improving the prompt engineering that powers AI Functions. Working alongside Tom, he helped build the feature from the ground up, ensuring high-quality and reliable AI-powered transformations. His attention to detail in refining prompts has made AI Functions more accurate and his leadership has been essential to the initiative’s success.	Farrukh is a Principal Engineer on the Code-First AI team and a prolific contributor this release. He was key in lighting up AI-powered transforms in OneLake, allowing users to apply AI transformations directly through shortcuts, dramatically expanding the reach of cognitive services across Fabric. Farrukh's contributions to Fabric integrations continue to expand the possibilities for AI-powered data workflows.	Tom is a Principal Engineer on the Code-First AI team and was pivotal in the API design for AI Functions. His thoughtful decision-making shaped AI Functions that are both powerful and intuitive. Working closely with Rana, he helped architect the feature from its earliest stages, making key choices that have shaped how users interact with AI Functions. Tom's design sensibility and technical expertise have been foundational to the feature's success.

Jessica Wang	Wendong Li	Samhitha Mamindla
Jessica is a Software Engineer on the SynapseML team and architected the AI Foundry integration. She has been a consistent and reliable contributor across multiple releases, building robust features and working directly with customers to understand their needs. Her integrations with AI Foundry and Hugging Face have been invaluable in helping SynapseML bridge the gap between closed and open source communities. We're excited for her continued impact on the team.	Wendong is a Software Engineer who recently joined the SynapseML team and has already emerged as a rising star. He single-handedly built multimodal AI Functions, enabling seamless transformations across text, images, and other data types. These are significant contributions for someone so new to the team, demonstrating both technical prowess and the ability to deliver complex features independently. We're eager to see what Wendong builds next.	Samhitha is a Software Engineer II who recently joined the SynapseML team and has quickly become pivotal in evaluating the quality and reliability of AI Functions. Samhitha works through logging infrastructure to ensure we can track, measure, and improve the performance of these features. Her meticulous approach to monitoring has been essential for continuously improving the user experience.

Virginia Roman	Elias Yousefi	Shyam Sai
Virginia is a Senior Product Manager on the SynapseML team and leads AI Functions in Fabric. She presents and shares AI Functions with customers, gathering critical feedback that shapes the roadmap. Virginia also champions the integration of AI capabilities into Data Wrangler, making AI-powered data transformations accessible to a broader audience of data professionals. Virginia's product leadership is invaluabel, and we're lucky to have her as a collaborator.	Elias is a Senior Engineer on the Code-First AI team and is building core service infrastructure for AI Functions. He is driving the next generation of integrations with AI Functions, though many of these exciting developments are still under wraps. We look forward to sharing more of his contributions as they roll out publicly.	Shyam was a key contributor during his time with SynapseML and is now Co-Founder at Frizzle. Shyam brought AI Functions to PySpark, giving Spark users access to the same capabilities already available in Pandas and expanding the feature's reach to a broader developer community. Though Shyam has moved on to his next venture at Frizzle, his contributions and presence will be missed by the SynapseML team.

Acknowledgements ❤️

We would like to acknowledge the developers and contributors, both internal and external, who helped create this version of SynapseML


Mark Hamilton @mhamilton723	Markus Weimer	Avrilia Floratou
Eric Dettinger @sandshadow	Virginia Roman	Eren Orbey @orbey
Rana Singh @ranadeepsingh	Farrukh Masud @FarrukhMasud	Tom Finley @TomFinley
Brendan Walsh @BrendanWalsh	Jessica Wang @JessicaXYWang	Wendong Li @levscaut
Elias Yousefi @elyousef	Shyam Sai @sss04	Shirley Huang
Chongyu Wang	Ruoyu Jing	Bo Zhou
Paul Wang @pwang347	Kyle Cutler @kycutler	Dan Vilnoiu
Scott Votaw @svotaw	Mark Niehaus @niehaus59	Aydan Aksoylar @aydan-at-microsoft
Sheryl Zhao @sherylZhaoCode	Markus Cozowicz @eisber	Sailesh Baidya @saileshbaidya
Keerthi Yanda @KeerthiYandaOS	Kyle Rush @k-rush	Aadharsh Kannan @AKannanMSFT
Serena Ruan @serena-ruan	Cruise Li @mslhrotk @lhrotk	Jason Wang @memoryz
Haizhou (Dylan) Wang @dylanw-oss	Sarah Shy @sarahshy	Kashyap Patel @ms-kashyap
Puneet Pruthi @ppruthi	Ilya Matiach @imatiach-msft	Amir Jafari @amhjf
Nellie Gustafsson	Bogdan Crivat	Justyna Lucznik @juluczni
Richard Wydrowski @richwyd	Tania Arya @taniaarya	Adithya Mukund @adithyamukund
Roman Batoukov @RomanBat	Alexandra Savelieva @alsavelv	Jessica Wolk @msplants
Luis França @luisffranca	Paul Koch @paulbkoch	Rich Caruana
Martha Laguna @martthalch @marthalc	Jeff Zheng	Sicong Yang
Peixian Gong	Ruixin Xu	Chris Hoder
Derek Legenzoff	Misha Desai	Beverly Kodhek
Louise Han @jr-MS	Raj Rikhy	Brice Chung
Marcos Campos	Mike Estee	Kim Manis
Mitrabhanu Mohanty	Anand Raman	Sudarshan Raghunathan @drdarshan
William T. Freeman	Gregory B. Newby	John Moyer
Vidip Acharya	Ashit Gosalia	Miguel Fierro @miguelgfierro
Ismaël Mejía @iemejia	Kartavya Neema @kartavyaneema	Daniel Ciborowski @dciborow
Mark Tabladillo @marktab	Guilherme Beltramini @gcbeltramini	Akshaya Annavajhala (AK)
James Verbus @jverbus	Mopé Akande @msakande	Ikko Eltociear Ashimine @eltociear
Alexander Spiridonov @vonodiripsa	Hiroshi Yoshioka @hyoshioka0128	Frank Solomon @fbsolo-ms1
Leonard Herold @LeonardHd	David Smith @dsmith111	Denniz Svens @DennizSvens
João Moura @operte	Sean Marihugh	ONNX Team
Azure Global	Vowpal Wabbit Team	LightGBM Team
MSFT Garage Team	MSR Outreach Team	Speech SDK Team
MLflow Team	Azure Docs Team

Learn More


Visit our website for the latest docs, demos, and examples	See how to use AI Functions in Fabric Notebooks	Learn the basics of SynapseML

Read our highlights from Fabric Community Conference	Try Out AI Functions in Data Wrangler	Read our Latest Paper on Large Scale Audiobook Generation