Skip to main content
Version: Next

Deploy InferenceService with models from ModelScope Hub

You can specify the storageUri field on InferenceService YAML with the following format to deploy the models from ModelScope Hub.

ms://${NAMESPACE}/${MODEL}:${REVISION}(optional)

e.g. ms://qwen/Qwen2-0.5B-Instruct

ModelScope is one of the largest model hubs in China, hosting popular models such as Qwen, DeepSeek, and many others.

Public ModelScope Models

If no credential is provided, an anonymous client will be used to download the model from the ModelScope repository.

Private ModelScope Models

KServe supports authenticating with MS_TOKEN for downloading the model. Create a Kubernetes secret to store the ModelScope token.

yaml
apiVersion: v1
kind: Secret
metadata:
name: storage-config
type: Opaque
data:
MS_TOKEN: bXN0X1ZOVXdSV0FHQmtJeFpmTEx1a3NlR3lvVVZvbnVOaUR1VU0==

Deploy InferenceService with Models from ModelScope Hub

Option 1: Use Service Account with Secret Ref

Create a Kubernetes ServiceAccount with the ModelScope token secret name reference and specify the ServiceAccountName in the InferenceService Spec.

apiVersion: v1
kind: ServiceAccount
metadata:
name: msserviceacc
secrets:
- name: storage-config
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: modelscope-qwen
spec:
predictor:
serviceAccountName: msserviceacc # Option 1 for authenticating with MS_TOKEN
model:
modelFormat:
name: huggingface
args:
- --model_name=qwen
- --model_dir=/mnt/models
storageUri: ms://qwen/Qwen2-0.5B-Instruct
resources:
limits:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
requests:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"

Option 2: Use Environment Variable with Secret Ref

Create a Kubernetes ModelScope token secret and specify the MS token secret reference using environment variable in the InferenceService Spec.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: modelscope-qwen
spec:
predictor:
model:
modelFormat:
name: huggingface
args:
- --model_name=qwen
- --model_dir=/mnt/models
storageUri: ms://qwen/Qwen2-0.5B-Instruct
resources:
limits:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
requests:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
env:
- name: MS_TOKEN # Option 2 for authenticating with MS_TOKEN
valueFrom:
secretKeyRef:
name: storage-config
key: MS_TOKEN
optional: false

Check the InferenceService status.

kubectl get inferenceservices modelscope-qwen
Expected Output
NAME              URL                                                READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                       AGE
modelscope-qwen http://modelscope-qwen.default.example.com True 100 modelscope-qwen-predictor-default-47q2g 7d23h