Skip to content

Serving

The main and baseline models are exposed behind a single FastAPI app (grnti_text_classifier.serving.main:app). Both are loaded lazily from artifacts/{main,baseline}/hf/ at first request and held in memory thereafter. The active model is selected via a query parameter on each call.

Run

# local dev (auto-reload)
uvicorn grnti_text_classifier.serving.main:app --host 0.0.0.0 --port 8000 --reload

# production (4 workers)
uvicorn grnti_text_classifier.serving.main:app --host 0.0.0.0 --port 8000 --workers 4

Endpoints

Method Path Purpose
GET /health Liveness probe — returns {"status": "ok", "version": <model_version>}.
GET /labels Returns the full list of 28 GRNTI label codes and human-readable names loaded from label_encoder.json.
POST /classify Classify a single Russian text. Query param model selects main (default) or baseline.

/classify returns HTTP 422 if text is empty or whitespace-only, and HTTP 503 if the HF snapshot directory is missing from disk.

Request — TextPayload

Schema source: grnti_text_classifier.serving.schemas.TextPayload.

class TextPayload(BaseModel):
    text: str   # Russian scientific text to classify (required, non-empty)

Response schemas

LabelProb

class LabelProb(BaseModel):
    label: str    # GRNTI class code, e.g. "27" (Mathematics)
    name: str     # Human-readable section name
    prob: float   # Softmax probability for this class

LabelEntry

class LabelEntry(BaseModel):
    code: str   # GRNTI top-level code (2-digit string)
    name: str   # Section name in Russian

ClassificationResponse

class ClassificationResponse(BaseModel):
    top1_label: str          # GRNTI code of the most likely class
    top1_name: str           # Human-readable name of the top-1 class
    top1_prob: float         # Softmax probability of the top-1 class
    top5: list[LabelProb]    # Top-5 classes with probabilities
    truncated: bool          # True if input exceeded max_length and was truncated
    input_length_tokens: int # Token count before any truncation
    request_id: str          # 12-char UUID prefix for tracing
    model_name: str          # "xlm-roberta-base" or "rubert-base-cased"
    model_version: str       # e.g. "v0.1.0"

Environment variables

Variable Default Purpose
GRNTI_MAIN_DIR artifacts/main/hf Path to XLM-RoBERTa save_pretrained snapshot.
GRNTI_BASELINE_DIR artifacts/baseline/hf Path to ruBERT save_pretrained snapshot.
GRNTI_LABEL_ENCODER data/processed/label_encoder.json Path to label encoder JSON.
GRNTI_MODEL_VERSION v0.1.0 Reported in /health and response body.

curl examples

GET /health

curl http://localhost:8000/health
{"status": "ok", "version": "v0.1.0"}

GET /labels

curl http://localhost:8000/labels
[
  {"code": "01", "name": "Общенаучное и междисциплинарное знание"},
  {"code": "03", "name": "История. Исторические науки"},
  ...
]

POST /classify — main model

curl -X POST http://localhost:8000/classify \
  -H "Content-Type: application/json" \
  -d '{"text":"Исследование квантовой электродинамики в кристаллах."}'
{
  "top1_label": "29",
  "top1_name": "Физика",
  "top1_prob": 0.923,
  "top5": [
    {"label": "29", "name": "Физика", "prob": 0.923},
    {"label": "30", "name": "Химия", "prob": 0.031},
    {"label": "44", "name": "Энергетика", "prob": 0.018},
    {"label": "27", "name": "Математика", "prob": 0.012},
    {"label": "50", "name": "Автоматика", "prob": 0.007}
  ],
  "truncated": false,
  "input_length_tokens": 14,
  "request_id": "a1b2c3d4e5f6",
  "model_name": "xlm-roberta-base",
  "model_version": "v0.1.0"
}

POST /classify — baseline model

curl -X POST "http://localhost:8000/classify?model=baseline" \
  -H "Content-Type: application/json" \
  -d '{"text":"Исследование квантовой электродинамики в кристаллах."}'

Response field notes

Field Notes
truncated True when input_length_tokens > max_length (256). The model still produces a prediction but context beyond 256 tokens was dropped.
input_length_tokens Raw token count before truncation, useful for monitoring distribution shift at inference time.
request_id First 12 characters of a UUID4 generated per request. Log this for end-to-end tracing.
model_name Reflects the actual HF model identifier, not the alias (main/baseline).
model_version Read from the GRNTI_MODEL_VERSION environment variable; matches the git tag of the published checkpoint.