Deployment - ML Run

Quản lý model bằng Model Registry
Đóng gói mô hình thành container với
- Amazon SageMaker
- MLflow

1. Các công cụ serving model

1.1. Các công cụ Serving model

Category	SageMaker	Vertex AI	MLRun	Seldon	KServe	Triton
Open source	No	No	Yes	Yes	Yes	Yes
Managed option	AWS	GCP	cloud + on-prem	cloud + on-prem	No	No
Serverless	Yes	Yes	Yes	No	No	No
Protocol	Proprietary	Proprietary	Standard	Standard	Standard	Standard
Multi-stage pipelines	No	No	Yes	Yes	No	No
Streaming	No	No	Yes	Basic	No	No
Model monitoring	Yes	Yes	Yes	Yes	No	No

1.2. MLRun Serving

MLRun là một framework MLOps hỗ trợ mạnh mẽ cho việc phục vụ mô hình và ứng dụng trong môi trường production. MLRun hỗ trợ xây dựng các pipeline xử lý thời gian thực nhiều giai đoạn (multistage real-time pipelines) và triển khai nhanh chóng với sự hỗ trợ từ Nuclio – một engine serverless mã nguồn mở, hiệu năng cao, đàn hồi, tập trung vào workload nặng về dữ liệu, I/O và tính toán.

Nuclio hỗ trợ nhiều loại trigger như: HTTP, cron, Kafka, Kinesis,... và có khả năng tự phục hồi (self-healing), tự động scale (auto-scaling), đồng thời tích hợp sẵn tính năng monitoring và observability.

MLRun hỗ trợ 2 dạng kiến trúc phục vụ mô hình:

Router: phục vụ cơ bản cho một hoặc nhiều mô hình (mặc định)
Flow: pipeline nhiều giai đoạn (dạng DAG), có thể tùy biến từng bước như tích hợp API, enrich & xử lý dữ liệu, phục vụ mô hình, định tuyến, lưu trữ,...

MLRun cung cấp các lớp phục vụ tích hợp sẵn cho nhiều framework ML/DL phổ biến như: scikit-learn, TensorFlow, ONNX, XGBoost, LightGBM, PyTorch, Hugging Face, và hỗ trợ chuẩn giao tiếp giống như KServe, Seldon, hay Triton.

MLRun cũng có sẵn một số container image chứa các thư viện cần thiết, hoặc cho phép người dùng chỉ định base image và các gói cần thiết để tự động build image phù hợp.

# Create a new serving function
serving_fn = mlrun.new_function("serving", image="mlrun/mlrun",
                                kind="serving", requirements=[])

# Add a model object or file (can be in S3, GCS, local file, etc.)
serving_fn.add_model(
    "my-model",
    model_path=model_uri,
    class_name="mlrun.frameworks.sklearn.SklearnModelServer")

# Create a mock server (simulator) and test/debug the endpoint
server = serving_fn.to_mock_server()
sample = {"inputs": [[5.1, 3.5, 1.4, 0.2], [7.7, 3.8, 6.7, 2.2]]}
server.test(path=f"/v2/models/my-model/infer", body=sample)

# Result:
# {'id': '2b2e1703f98846b386965ce834a6c4ab',
#  'model_name': 'my-model',
#  'outputs': [0, 2]}

# Deploy the serving function to the cluster
project.deploy_function(serving_fn)

# Send prediction request to the live endpoint
serving_fn.invoke(path=f"/v2/models/my-model/infer", body=sample)

Sử dụng Graph với ML Run:

# Create an MLRun serving function from custom code
serving_function = mlrun.code_to_function(
    filename="src/serving.py",
    kind="serving",
    image="mlrun/mlrun",
    requirements=[],
)

# Set the serving topology
graph = serving_function.set_topology("flow", engine="async")

# Define a 3 step graph (preprocess -> hugging face model -> postprocess) 
# the custom preprocess and postprocess functions are in serving.py 
# while the HuggingFaceModelServer is a built-in MLRun class 
graph.to(handler="preprocess", name="preprocess")\
     .to(mlrun.frameworks.huggingface.HuggingFaceModelServer(
              name="sentiment-analysis",
              task="sentiment-analysis",
              model_name="distilbert-base-uncased",
              model_class="AutoModelForSequenceClassification",
              tokenizer_name="distilbert-base-uncased",
              tokenizer_class="AutoTokenizer"))\
     .to(handler="postprocess", name="postprocess").respond()

# Plot to graph:

serving_function.plot(rankdir='LR')


# Deploy the pipeline
project.deploy_function(serving_function)
# Send a text request and get the sentiment results
response = serving_function.invoke(path='/predict', body="good morning")
print(response)
# Result:
['The sentiment is POSITIVE', 'The prediction score is 0.7876932144165039']

1. Các công cụ serving model

1.1. Các công cụ Serving model

1.2. MLRun Serving