We’re excited to announce that 191 ACCESS community members have utilized the web user interface (UI) to access the new inference service on Jetstream2. Since its release in April 2025, we have continued to develop and improve upon the initial service to keep up with accelerated developments and evolving community needs.
We continue to update the models offered based on community use and feedback. In order to have stronger performance on complex reasoning tasks, we upgraded the DeepSeek R1 model to the 0528 minor version. We also discontinued the Llama 3.3 model in favor of the multi-modal Llama 4 Scout.
These technologies are evolving daily (so keep this in mind when using any guide or reference material). To assist the Jetstream2 community we wrote an Orientation to Running Large Language Models on Jetstream2 guide which is available on the documentation site.
With support for the service in mind, we built automated monitoring to detect and alert when any model back-end has gone offline. This allows our team to quickly react should the community experience any roadblocks when using the service.
Future plans for the service include a self-healing mechanism which will help automatically recover services when a model back-end has gone offline. You can monitor the progress here.
Another update coming soon is an authenticated option to access the inference APIs from networks outside of Jetstream2.
Users can engage with the broader Jetstream2 community for support and collaboration by joining the inference-service channel in the Jetstream2 community chat. This space enables discussions on best practices, troubleshooting, and sharing ideas into how these LLMs can be effectively applied across various domains.