Build Python, TypeScript, and Java SDKs with Streaming

This tutorial includes the following SDK languages

TypeScript / Javascript	Java / Kotlin	Python	C#	Go	PHP
✅	✅	✅	❌	❌	❌

Streaming data is a common feature when working with AI Large Language Models (LLMs) like ChatGPT, Claude, Llama, or Mistral. In this tutorial, you'll learn how to create Python, TypeScript, and Java SDKs with streaming features using the liblab SDK generator.

For this tutorial, you'll use Ollama to host an LLM locally on your computer. The principles in this guide can be applied to any LLM API that provides an OpenAPI file.

tip

Before getting started you'll want to make sure you have a liblab account and the liblab CLI installed.

You'll also need:

Python version 3.8 or higher.
Node.js version 18 or higher.
Java version 8 or higher.

Set up Ollama and install Llama 3.1

llama model

This tutorial uses the smaller Llama 3.1 model (8b) to ensure quicker response times. However, you can use any Llama 3.1 model if your machine can run it smoothly.

First, you'll need to install Ollama on your machine to run the LLM model.

Visit Ollama, download the latest version, and run the Ollama program.
Once Ollama is installed and running, execute the following command on your terminal to download the latest version of Llama 3.1 8b and confirm that it's working:

ollama run llama3.1:8b "tell me a joke"

If the setup was successful you'll get a response like:

A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs
and Schrödinger's cat?" The librarian replied, "It rings a bell, but I'm not sure if it's here
or not."

Initialize and configure liblab

After installing and configuring Ollama on your machine, you can create a new project with liblab. Run the following commands to create a new liblab project in the streaming directory:

mkdir -p streaming
cd streaming
liblab init

A new liblab.config.json file will be created, which contains all the project's configurations.

Enabling Streaming

Streaming is enabled by default for endpoints returning the text/event-stream Content Type. For other endpoints, you can enable it for each relevant endpoint with one of the following declarations:

In your liblab config: Set streaming: true
In your OpenAPI spec: Set x-liblab-streaming: true

You'll see both methods in full below. In practice you only need to implement one of these options.

Create the API spec

note

Since there is no local Ollama API Spec available one is provided below. In most cases API providers and frameworks provide these specs automatically.

Inside the streaming directory, create a file named openapi.yaml with the following content:

openapi: 3.0.0
info:
  title: Ollama API
  description: This is an open API Spec for Ollama, created internally by liblab. This is not an offical API Spec.
  version: 1.0.0
servers:
  - url: 'http://localhost:11434'
paths:
  /api/generate:
    post:
      description: Send a prompt to a LLM.
      operationId: generate
      x-liblab-streaming: true
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/GenerateRequest'
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/GenerateResponse'

components:
  schemas:
    GenerateRequest:
      type: object
      required:
        - model
        - prompt
      properties:
        model:
          type: string
        prompt:
          type: string
        stream:
          type: boolean

    GenerateResponse:
      type: object
      required:
        - model
        - created_at
        - response
      properties:
        model:
          type: string
        created_at:
          type: string
        response:
          type: string
        done:
          type: boolean
        done_reason:
          type: string
        context:
          type: array
          items:
            type: integer
        total_duration:
          type: integer
        load_duration:
          type: integer
        prompt_eval_count:
          type: integer
        prompt_eval_duration:
          type: integer
        eval_count:
          type: integer
        eval_duration:
          type: integer

Configure the project

Now you can start configuring the project. You can replace the liblab.config.json created earlier with the modified version below. The highlighting shows what has been changed:

{
  "sdkName": "ollama-sdk",
  "apiVersion": "1.0.0",
  "apiName": "ollama-api",
  "specFilePath": "./openapi.yaml",
  "languages": [
    "python",
    "typescript",
    "java"
  ],
  "auth": [],
  "customizations": {
    "includeOptionalSnippetParameters": true,
    "devContainer": false,
    "generateEnv": true,
    "inferServiceNames": false,
    "injectedModels": [],
    "license": {
      "type": "MIT"
    },
    "responseHeaders": false,
    "retry": {
      "enabled": true,
      "maxAttempts": 3,
      "retryDelay": 150
    },
    "endpointCustomizations": {
      "/api/generate": {
        "post": {
          "streaming": true
        }
      }
    }
  },
  "languageOptions": {
    "python": {
      "alwaysInitializeOptionals": false,
      "pypiPackageName": "",
      "githubRepoName": "",
      "ignoreFiles": [],
      "sdkVersion": "1.0.0",
      "liblabVersion": "2"
    },
    "typescript": {
      "bundle": true,
      "exportClassDefault": false,
      "httpClient": "fetch",
      "npmName": "",
      "npmOrg": "",
      "githubRepoName": "",
      "ignoreFiles": [],
      "sdkVersion": "1.0.0",
      "liblabVersion": "2",
      "generateEnumAs": "union"
    },
    "java": {
      "groupId": "com.swagger",
      "artifactId": "petstore",
      "ignoreFiles": [],
      "sdkVersion": "1.0.0",
      "liblabVersion": "2",
      "includeKotlinSnippets": true
    }
  },
  "publishing": {
    "githubOrg": ""
  }
}

Core Options and customizations

When generating SDKs using liblab, you can customize and fine-tune the SDK to meet specific client needs. Explore the Core SDK options and SDK customization options to discover all the available settings and enhancements.

Generate the SDKs

Now that you have the API spec and have configured the liblab.config.json, it's time to generate the SDK. To generate the SDK, execute the following command on your terminal:

liblab build -y

liblab will validate and notify you about any issues with the liblab.config.json or the openapi.yaml files. You should see the build started and the following finished messages:

Your SDKs are being generated. Visit the liblab portal (https://app.liblab.com/apis/ollama-api/builds/8364) to view more details on your build(s).
✓ Java built
✓ Python built
✓ TypeScript built
✓ Generate package-lock.json for TypeScript
Successfully generated SDKs for TypeScript, Python, Java. ♡ You can find them inside: <path-to-the-project-directory>/streaming/output

The SDKs are available in the output directory, which should have the following structure:

output/
├── api-schema-validation.json
├── java/
├── typescript/
└── python/

Test the SDKs

To test the SDKs, you can use the examples created by liblab when generating the SDKs. The following sections describe how to test the Python, TypeScript, and Java SDKs.

Testing the Python SDK

To test the Python SDK, follow the steps:

From your terminal, open the directory output/python/examples:

cd output/python/examples

Run one of the following scripts, depending on your operating system, to set up and activate a Python virtual environment:

Mac / Linux
Windows

chmod +x install.sh
./install.sh
source .venv/bin/activate

./install.cmd
.venv/Scripts/Activate.ps1

Activating Venv

The command to activate venv may vary depending on your operating system and shell. If you encounter any issues, refer to Python's venv documentation to determine the correct venv command.

Open the file sample.py and change the model and prompt, as displayed below. Here you define the model you'll use and also define the prompt to send to the LLM:

  request_body = GenerateRequest(model="llama3.1:8b", prompt="Tell me a joke", stream=True)

Run the sample.py script to execute the sdk.api.generate() function:

python sample.py

On your terminal, you'll receive the streamed response:

GenerateResponse(
    model='llama3.1:8b',
    created_at='2025-01-05T14:25:31.57605Z',
    response='Here',
    done=False
)
GenerateResponse(
    model='llama3.1:8b',
    created_at='2025-01-05T14:25:31.6006105Z',
    response="'s",
    done=False
)
GenerateResponse(
    model='llama3.1:8b',
    created_at='2025-01-05T14:25:31.617061Z',
    response=' one',
    done=False
)

...

GenerateResponse(
    model='llama3.1:8b',
    created_at='2025-01-05T14:25:31.8601164Z',
    response='',
    done=True,
    done_reason='stop',
    context=[128006, 882, 128007, 271, 41551, 757, 264, 22380, 128009, 128006, 78191, 128007, 271, 8586, 596, 832, 1473, 3923, 656, 499, 1650, 264, 12700, 46895, 273, 1980, 2127, 3242, 14635, 0],
    total_duration=7032005800,
    load_duration=6623531800,
    prompt_eval_count=14,
    prompt_eval_duration=120000000,
    eval_count=18,
    eval_duration=287000000
)

The SDK streams responses token by token through the response parameter until completion. A final status done=True confirms when the stream is done.

Testing the TypeScript SDK

To test the TypeScript SDK, follow the steps:

Access the output/typescript/examples directory with your terminal.
Run the following command to install the SDK:

npm run setup

Open the src/index.ts file and change the model and prompt as presented in the following code snippet. Here you define the model you'll use and also define the prompt to send to the LLM:

  const generateRequest: GenerateRequest = {
    model: 'llama3.1:8b',
    prompt: 'Tell me a joke',
    stream: true,
  };

Execute the SDK by running the following command at the terminal:

npm run start

The SDK will use the api.generate(generateRequest) function to perform an API request.

On your terminal, you'll receive the streamed response:

[email protected] start
> tsc && node dist/index.js

{
  model: 'llama3.1:8b',
  createdAt: '2024-12-10T00:25:03.9792376Z',
  response: 'Here',
  done: false,
  doneReason: undefined,
  context: undefined,
  totalDuration: undefined,
  loadDuration: undefined,
  promptEvalCount: undefined,
  promptEvalDuration: undefined,
  evalCount: undefined,
  evalDuration: undefined
}
...
{
  model: 'llama3.1:8b',
  createdAt: '2024-12-10T00:25:04.229332Z',
  response: 'asta',
  done: false,
  doneReason: undefined,
  context: undefined,
  totalDuration: undefined,
  loadDuration: undefined,
  promptEvalCount: undefined,
  promptEvalDuration: undefined,
  evalCount: undefined,
  evalDuration: undefined
}
{
  model: 'llama3.1:8b',
  createdAt: '2024-12-10T00:25:04.2435589Z',
  response: '.',
  done: false,
  doneReason: undefined,
  context: undefined,
  totalDuration: undefined,
  loadDuration: undefined,
  promptEvalCount: undefined,
  promptEvalDuration: undefined,
  evalCount: undefined,
  evalDuration: undefined
}
{
  model: 'llama3.1:8b',
  createdAt: '2024-12-10T00:25:04.2602098Z',
  response: '',
  done: true,
  doneReason: 'stop',
  context: [
    128006,    882, 128007,    271,  41551,
       757,    264,  22380, 128009, 128006,
     78191, 128007,    271,   8586,    596,
       832,   1473,   3923,    656,    499,
      1650,    264,  12700,  46895,    273,
      1980,   2127,   3242,  14635,     13
  ],
  totalDuration: 3104377400,
  loadDuration: 2724683300,
  promptEvalCount: 14,
  promptEvalDuration: 95000000,
  evalCount: 18,
  evalDuration: 283000000
}

The SDK streams responses token by token through the response parameter until completion. The status done=True confirms when the stream is done.

Testing the Java SDK

To test the Java SDK, follow the steps:

Access the output/java/example directory with your terminal.
Open the /src/main/java/com/example/Main.java file and update generateRequest's model and prompt arguments:

  GenerateRequest generateRequest = GenerateRequest
    .builder()
    .model("llama3.1:8b")
    .prompt("Tell me a joke")
    .stream(true)
    .build();

Run the following command to compile and run the SDK, which will use the api.generate(generateRequest) to perform an API request:

chmod +x run.sh
./run.sh

On your terminal, you'll receive a streamed response that looks similar to the following:

GenerateResponse(
    model=llama3.1:8b,
    createdAt=2025-01-05T13:53:13.6047604Z,
    response=Here,
    done=false,
    doneReason=null,
    context=null,
    totalDuration=null,
    loadDuration=null,
    promptEvalCount=null,
    promptEvalDuration=null,
    evalCount=null,
    evalDuration=null
)
GenerateResponse(
    model=llama3.1:8b,
    createdAt=2025-01-05T13:53:13.6298966Z,
    response='s,
    done=false,
    doneReason=null,
    context=null,
    totalDuration=null,
    loadDuration=null,
    promptEvalCount=null,
    promptEvalDuration=null,
    evalCount=null,
    evalDuration=null
)
GenerateResponse(
    model=llama3.1:8b,
    createdAt=2025-01-05T13:53:13.6449122Z,
    response= one,
    done=false,
    doneReason=null,
    context=null,
    totalDuration=null,
    loadDuration=null,
    promptEvalCount=null,
    promptEvalDuration=null,
    evalCount=null,
    evalDuration=null
)

...

GenerateResponse(
    model=llama3.1:8b,
    createdAt=2025-01-05T13:53:14.1793566Z,
    response=,
    done=true,
    doneReason=stop,
    context=[128006, 882, 128007, 271, 41551, 757, 264, 22380, 128009, 128006, 78191, 128007, 271, 8586, 596, 832, 1473, 3923, 656, 499, 1650, 264, 12700, 46895, 273, 1980, 65192, 369, 433, 62927, 2127, 3242, 14635, 2268, 39115, 430, 1903, 499, 12835, 0, 3234, 499, 1390, 311, 6865, 2500, 832, 30],
    totalDuration=6124368300,
    loadDuration=5308640200,
    promptEvalCount=14,
    promptEvalDuration=231000000,
    evalCount=36,
    evalDuration=574000000
)

The SDK streams responses token by token through the response parameter until completion. The status done=True confirms when the stream is done.

Next Steps

Now that you've packaged your SDKs you can learn how to integrate them with your CI/CD pipeline and publish them to their respective package manager repositories.

We currently have guides for:

Set up Ollama and install Llama 3.1​

Initialize and configure liblab​

Enabling Streaming​

Create the API spec​

Configure the project​

Generate the SDKs​

Test the SDKs​

Testing the Python SDK​

Testing the TypeScript SDK​

Testing the Java SDK​

Next Steps​