notdiamond.llms

notdiamond.llms.client

class notdiamond.llms.client.NotDiamond(nd_api_url: str | None = 'https://not-diamond-server.onrender.com', user_agent: str | None = 'Python-SDK/0.3.12', *args, api_key: str, llm_configs: List[LLMConfig | str] | None = None, default: LLMConfig | int | str, max_model_depth: int | None = None, latency_tracking: bool, hash_content: bool, tradeoff: str | None = None, preference_id: str | None = None, tools: Sequence[Dict[str, Any] | Callable] | None = None, callbacks: List | None = None)[source]

Bases: _NDRouterClient

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters:
  • nd_api_url (str | None)

  • user_agent (str | None)

  • api_key (str)

  • llm_configs (List[LLMConfig | str] | None)

  • default (LLMConfig | int | str)

  • max_model_depth (int | None)

  • latency_tracking (bool)

  • hash_content (bool)

  • tradeoff (str | None)

  • preference_id (str | None)

  • tools (Sequence[Dict[str, Any] | Callable] | None)

  • callbacks (List | None)

class Config[source]

Bases: object

arbitrary_types_allowed = True
api_key: str

API key required for making calls to NotDiamond. You can get an API key via our dashboard: https://app.notdiamond.ai If an API key is not set, it will check for NOTDIAMOND_API_KEY in .env file.

default: LLMConfig | int | str

Set a default LLM, so in case anything goes wrong in the flow, as for example NotDiamond API call fails, your code won’t break and you have a fallback model. There are various ways to configure a default model:

  • Integer, specifying the index of the default provider from the llm_configs list

  • String, similar how you can specify llm_configs, of structure ‘provider_name/model_name’

  • LLMConfig, just directly specify the object of the provider

By default, we will set your first LLM in the list as the default.

hash_content: bool

Hashing the content before being sent to the NotDiamond API. By default this is False.

latency_tracking: bool

Tracking and sending latency of LLM call to NotDiamond server as feedback, so we can improve our router. By default this is turned on, set it to False to turn off.

llm_configs: List[LLMConfig | str] | None

The list of LLMs that are available to route between.

max_model_depth: int | None

If your top recommended model is down, specify up to which depth of routing you’re willing to go. If max_model_depth is not set, it defaults to the length of the llm_configs list. If max_model_depth is set to 0, the init will fail. If the value is larger than the llm_configs list length, we reset the value to len(llm_configs).

nd_api_url: str | None

The URL of the NotDiamond API. Defaults to settings.NOTDIAMOND_API_URL.

preference_id: str | None

The ID of the router preference that was configured via the Dashboard. Defaults to None.

tools: Sequence[Dict[str, Any] | Callable] | None

Bind tools to the LLM object. The tools will be passed to the LLM object when invoking it.

tradeoff: str | None

Define tradeoff between “cost” and “latency” for the router to determine the best LLM for a given query. If None is specified, then the router will not consider either cost or latency.

The supported values: “cost”, “latency”

Defaults to None.

user_agent: str | None

notdiamond.llms.config

class notdiamond.llms.config.LLMConfig(provider: str, model: str, is_custom: bool = False, system_prompt: str | None = None, context_length: int | None = None, input_price: float | None = None, output_price: float | None = None, latency: float | None = None, api_key: str | None = None, **kwargs)[source]

Bases: object

A NotDiamond LLM provider config (or LLMConfig) is represented by a combination of provider and model. Provider refers to the company of the foundational model, such as openai, anthropic, google. The model represents the model name as defined by the owner company, such as gpt-3.5-turbo Beside this you can also specify the API key for each provider, specify extra arguments that are also supported by Langchain (eg. temperature), and a system prmopt to be used with the provider. If the provider is selected during routing, then the system prompt will be used, replacing the one in the message array if there are any.

All supported providers and models can be found in our docs.

If the API key it’s not specified, it will try to pick it up from an .env file before failing. As example for OpenAI it will look for OPENAI_API_KEY.

provider

The name of the LLM provider (e.g., “openai”, “anthropic”). Must be one of the predefined providers in POSSIBLE_PROVIDERS.

Type:

str

model

The name of the LLM model to use (e.g., “gpt-3.5-turbo”). Must be one of the predefined models in POSSIBLE_MODELS.

Type:

str

system_prompt

The system prompt to use for the provider. Defaults to None.

Type:

Optional[str], optional

api_key

The API key for accessing the LLM provider’s services. Defaults to None, in which case it tries to fetch from the settings.

Type:

Optional[str], optional

openrouter_model

The OpenRouter model equivalent for this provider / model

Type:

str

\*\*kwargs

Additional keyword arguments that might be necessary for specific providers or models.

Raises:

UnsupportedLLMProvider – If the provider or model specified is not supported.

Parameters:
  • provider (str)

  • model (str)

  • is_custom (bool)

  • system_prompt (str | None)

  • context_length (int | None)

  • input_price (float | None)

  • output_price (float | None)

  • latency (float | None)

  • api_key (str | None)

_summary_

Parameters:
  • provider (str) – The name of the LLM provider (e.g., “openai”, “anthropic”).

  • model (str) – The name of the LLM model to use (e.g., “gpt-3.5-turbo”).

  • is_custom (bool) – Whether this is a custom model. Defaults to False.

  • system_prompt (Optional[str], optional) – The system prompt to use for the provider. Defaults to None.

  • context_length (Optional[int], optional) – Custom context window length for the provider/model.

  • input_price (Optional[float], optional) – Custom input price (USD) per million tokens for provider/model.

  • output_price (Optional[float], optional) – Custom output price (USD) per million tokens for provider/model.

  • latency (Optional[float], optional) – Custom latency (time to first token) for provider/model.

  • api_key (Optional[str], optional) – The API key for accessing the LLM provider’s services. Defaults to None.

  • **kwargs – Additional keyword arguments that might be necessary for specific providers or models.

Raises:

UnsupportedLLMProvider – If the provider or model specified is not supported.

__init__(provider: str, model: str, is_custom: bool = False, system_prompt: str | None = None, context_length: int | None = None, input_price: float | None = None, output_price: float | None = None, latency: float | None = None, api_key: str | None = None, **kwargs)[source]

_summary_

Parameters:
  • provider (str) – The name of the LLM provider (e.g., “openai”, “anthropic”).

  • model (str) – The name of the LLM model to use (e.g., “gpt-3.5-turbo”).

  • is_custom (bool) – Whether this is a custom model. Defaults to False.

  • system_prompt (Optional[str], optional) – The system prompt to use for the provider. Defaults to None.

  • context_length (Optional[int], optional) – Custom context window length for the provider/model.

  • input_price (Optional[float], optional) – Custom input price (USD) per million tokens for provider/model.

  • output_price (Optional[float], optional) – Custom output price (USD) per million tokens for provider/model.

  • latency (Optional[float], optional) – Custom latency (time to first token) for provider/model.

  • api_key (Optional[str], optional) – The API key for accessing the LLM provider’s services. Defaults to None.

  • **kwargs – Additional keyword arguments that might be necessary for specific providers or models.

Raises:

UnsupportedLLMProvider – If the provider or model specified is not supported.

classmethod from_string(llm_provider: str)[source]

We allow our users to specify LLM providers for NotDiamond in the string format ‘provider_name/model_name’, as example ‘openai/gpt-3.5-turbo’. Underlying our workflows we want to ensure we use LLMConfig as the base type, so this class method converts a string specification of an LLM provider into an LLMConfig object.

Parameters:

llm_provider (str) – this is the string definition of the LLM provider

Returns:

initialized object with correct provider and model

Return type:

LLMConfig

prepare_for_request()[source]

Converts the LLMConfig object to a dict in the format accepted by the NotDiamond API.

Returns:

dict

set_api_key(api_key: str) LLMConfig[source]
Parameters:

api_key (str)

Return type:

LLMConfig

property openrouter_model

notdiamond.llms.providers

class notdiamond.llms.providers.NDLLMProviders(value)[source]

Bases: Enum

NDLLMProviders serves as a registry for the supported LLM models by NotDiamond. It allows developers to easily specify available LLM providers for the router.

GPT_3_5_TURBO

refers to ‘gpt-3.5-turbo’ model by OpenAI

Type:

NDLLMProvider

GPT_3_5_TURBO_0125

refers to ‘gpt-3.5-turbo-0125’ model by OpenAI

Type:

NDLLMProvider

GPT_4

refers to ‘gpt-4’ model by OpenAI

Type:

NDLLMProvider

GPT_4_0613

refers to ‘gpt-4-0613’ model by OpenAI

Type:

NDLLMProvider

GPT_4_1106_PREVIEW

refers to ‘gpt-4-1106-preview’ model by OpenAI

Type:

NDLLMProvider

GPT_4_TURBO

refers to ‘gpt-4-turbo’ model by OpenAI

Type:

NDLLMProvider

GPT_4_TURBO_PREVIEW

refers to ‘gpt-4-turbo-preview’ model by OpenAI

Type:

NDLLMProvider

GPT_4_TURBO_2024_04_09

refers to ‘gpt-4-turbo-2024-04-09’ model by OpenAI

Type:

NDLLMProvider

GPT_4o_2024_05_13

refers to ‘gpt-4o-2024-05-13’ model by OpenAI

Type:

NDLLMProvider

GPT_4o_2024_08_06

refers to ‘gpt-4o-2024-08-06’ model by OpenAI

Type:

NDLLMProvider

GPT_4o

refers to ‘gpt-4o’ model by OpenAI

Type:

NDLLMProvider

GPT_4o_MINI_2024_07_18

refers to ‘gpt-4o-mini-2024-07-18’ model by OpenAI

Type:

NDLLMProvider

GPT_4o_MINI

refers to ‘gpt-4o-mini’ model by OpenAI

Type:

NDLLMProvider

GPT_4_0125_PREVIEW

refers to ‘gpt-4-0125-preview’ model by OpenAI

Type:

NDLLMProvider

CLAUDE_2_1

refers to ‘claude-2.1’ model by Anthropic

Type:

NDLLMProvider

CLAUDE_3_OPUS_20240229

refers to ‘claude-3-opus-20240229’ model by Anthropic

Type:

NDLLMProvider

CLAUDE_3_SONNET_20240229

refers to ‘claude-3-sonnet-20240229’ model by Anthropic

Type:

NDLLMProvider

CLAUDE_3_5_SONNET_20240620

refers to ‘claude-3-5-sonnet-20240620’ model by Anthropic

Type:

NDLLMProvider

CLAUDE_3_HAIKU_20240307

refers to ‘claude-3-haiku-20240307’ model by Anthropic

Type:

NDLLMProvider

GEMINI_PRO

refers to ‘gemini-pro’ model by Google

Type:

NDLLMProvider

GEMINI_1_PRO_LATEST

refers to ‘gemini-1.0-pro-latest’ model by Google

Type:

NDLLMProvider

GEMINI_15_PRO_LATEST

refers to ‘gemini-1.5-pro-latest’ model by Google

Type:

NDLLMProvider

GEMINI_15_PRO_EXP_0801

refers to ‘gemini-1.5-pro-exp-0801’ model by Google

Type:

NDLLMProvider

GEMINI_15_FLASH_LATEST

refers to ‘gemini-1.5-flash-latest’ model by Google

Type:

NDLLMProvider

COMMAND_R

refers to ‘command-r’ model by Cohere

Type:

NDLLMProvider

COMMAND_R_PLUS

refers to ‘command-r-plus’ model by Cohere

Type:

NDLLMProvider

MISTRAL_LARGE_LATEST

refers to ‘mistral-large-latest’ model by Mistral AI

Type:

NDLLMProvider

MISTRAL_LARGE_2407

refers to ‘mistral-large-2407’ model by Mistral AI

Type:

NDLLMProvider

MISTRAL_LARGE_2402

refers to ‘mistral-large-2402’ model by Mistral AI

Type:

NDLLMProvider

MISTRAL_MEDIUM_LATEST

refers to ‘mistral-medium-latest’ model by Mistral AI

Type:

NDLLMProvider

MISTRAL_SMALL_LATEST

refers to ‘mistral-small-latest’ model by Mistral AI

Type:

NDLLMProvider

OPEN_MISTRAL_7B

refers to ‘open-mistral-7b’ model by Mistral AI

Type:

NDLLMProvider

OPEN_MIXTRAL_8X7B

refers to ‘open-mixtral-8x7b’ model by Mistral AI

Type:

NDLLMProvider

OPEN_MIXTRAL_8X22B

refers to ‘open-mixtral-8x22b’ model by Mistral AI

Type:

NDLLMProvider

TOGETHER_PHIND_CODELLAMA_34B_V2

refers to ‘Phind-CodeLlama-34B-v2’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_MISTRAL_7B_INSTRUCT_V0_2

refers to ‘Mistral-7B-Instruct-v0.2’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_MIXTRAL_8X7B_INSTRUCT_V0_1

refers to ‘Mixtral-8x7B-Instruct-v0.1’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_MIXTRAL_8X22B_INSTRUCT_V0_1

refers to ‘Mixtral-8x22B-Instruct-v0.1’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_LLAMA_3_70B_CHAT_HF

refers to ‘Llama-3-70b-chat-hf’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_LLAMA_3_8B_CHAT_HF

refers to ‘Llama-3-8b-chat-hf’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_QWEN2_72B_INSTRUCT

refers to ‘Qwen2-72B-Instruct’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_LLAMA_3_1_8B_INSTRUCT_TURBO

refers to ‘Meta-Llama-3.1-8B-Instruct-Turbo’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_LLAMA_3_1_70B_INSTRUCT_TURBO

refers to ‘Meta-Llama-3.1-70B-Instruct-Turbo’ model served via TogetherAI

Type:

NDLLMProvider

TOGETHER_LLAMA_3_1_405B_INSTRUCT_TURBO

refers to ‘Meta-Llama-3.1-405B-Instruct-Turbo’ model served via TogetherAI

Type:

NDLLMProvider

REPLICATE_MISTRAL_7B_INSTRUCT_V0_2

refers to “mistral-7b-instruct-v0.2” model served via Replicate

Type:

NDLLMProvider

REPLICATE_MIXTRAL_8X7B_INSTRUCT_V0_1

refers to “mixtral-8x7b-instruct-v0.1” model served via Replicate

Type:

NDLLMProvider

REPLICATE_META_LLAMA_3_70B_INSTRUCT

refers to “meta-llama-3-70b-instruct” model served via Replicate

Type:

NDLLMProvider

REPLICATE_META_LLAMA_3_8B_INSTRUCT

refers to “meta-llama-3-8b-instruct” model served via Replicate

Type:

NDLLMProvider

REPLICATE_META_LLAMA_3_1_405B_INSTRUCT

refers to “meta-llama-3.1-405b-instruct” model served via Replicate

Type:

NDLLMProvider

LLAMA_3_1_SONAR_LARGE_128K_ONLINE

refers to “llama-3.1-sonar-large-128k-online” model by Perplexity

Type:

NDLLMProvider

Note

This class is static and designed to be used without instantiation. Access its attributes directly to obtain configurations for specific LLM providers.

CLAUDE_2_1 = LLMConfig(anthropic/claude-2.1)
CLAUDE_3_5_SONNET_20240620 = LLMConfig(anthropic/claude-3-5-sonnet-20240620)
CLAUDE_3_HAIKU_20240307 = LLMConfig(anthropic/claude-3-haiku-20240307)
CLAUDE_3_OPUS_20240229 = LLMConfig(anthropic/claude-3-opus-20240229)
CLAUDE_3_SONNET_20240229 = LLMConfig(anthropic/claude-3-sonnet-20240229)
CODESTRAL_LATEST = LLMConfig(mistral/codestral-latest)
COMMAND_R = LLMConfig(cohere/command-r)
COMMAND_R_PLUS = LLMConfig(cohere/command-r-plus)
GEMINI_15_FLASH_LATEST = LLMConfig(google/gemini-1.5-flash-latest)
GEMINI_15_PRO_EXP_0801 = LLMConfig(google/gemini-1.5-pro-exp-0801)
GEMINI_15_PRO_LATEST = LLMConfig(google/gemini-1.5-pro-latest)
GEMINI_1_PRO_LATEST = LLMConfig(google/gemini-1.0-pro-latest)
GEMINI_PRO = LLMConfig(google/gemini-pro)
GPT_3_5_TURBO = LLMConfig(openai/gpt-3.5-turbo)
GPT_3_5_TURBO_0125 = LLMConfig(openai/gpt-3.5-turbo-0125)
GPT_4 = LLMConfig(openai/gpt-4)
GPT_4_0125_PREVIEW = LLMConfig(openai/gpt-4-0125-preview)
GPT_4_0613 = LLMConfig(openai/gpt-4-0613)
GPT_4_1106_PREVIEW = LLMConfig(openai/gpt-4-1106-preview)
GPT_4_TURBO = LLMConfig(openai/gpt-4-turbo)
GPT_4_TURBO_2024_04_09 = LLMConfig(openai/gpt-4-turbo-2024-04-09)
GPT_4_TURBO_PREVIEW = LLMConfig(openai/gpt-4-turbo-preview)
GPT_4o = LLMConfig(openai/gpt-4o)
GPT_4o_2024_05_13 = LLMConfig(openai/gpt-4o-2024-05-13)
GPT_4o_2024_08_06 = LLMConfig(openai/gpt-4o-2024-08-06)
GPT_4o_MINI = LLMConfig(openai/gpt-4o-mini)
GPT_4o_MINI_2024_07_18 = LLMConfig(openai/gpt-4o-mini-2024-07-18)
LLAMA_3_1_SONAR_LARGE_128K_ONLINE = LLMConfig(perplexity/llama-3.1-sonar-large-128k-online)
MISTRAL_LARGE_2402 = LLMConfig(mistral/mistral-large-2402)
MISTRAL_LARGE_2407 = LLMConfig(mistral/mistral-large-2407)
MISTRAL_LARGE_LATEST = LLMConfig(mistral/mistral-large-latest)
MISTRAL_MEDIUM_LATEST = LLMConfig(mistral/mistral-medium-latest)
MISTRAL_SMALL_LATEST = LLMConfig(mistral/mistral-small-latest)
OPEN_MISTRAL_7B = LLMConfig(mistral/open-mistral-7b)
OPEN_MIXTRAL_8X22B = LLMConfig(mistral/open-mixtral-8x22b)
OPEN_MIXTRAL_8X7B = LLMConfig(mistral/open-mixtral-8x7b)
REPLICATE_META_LLAMA_3_1_405B_INSTRUCT = LLMConfig(replicate/meta-llama-3.1-405b-instruct)
REPLICATE_META_LLAMA_3_70B_INSTRUCT = LLMConfig(replicate/meta-llama-3-70b-instruct)
REPLICATE_META_LLAMA_3_8B_INSTRUCT = LLMConfig(replicate/meta-llama-3-8b-instruct)
REPLICATE_MISTRAL_7B_INSTRUCT_V0_2 = LLMConfig(replicate/mistral-7b-instruct-v0.2)
REPLICATE_MIXTRAL_8X7B_INSTRUCT_V0_1 = LLMConfig(replicate/mixtral-8x7b-instruct-v0.1)
TOGETHER_LLAMA_3_1_405B_INSTRUCT_TURBO = LLMConfig(togetherai/Meta-Llama-3.1-405B-Instruct-Turbo)
TOGETHER_LLAMA_3_1_70B_INSTRUCT_TURBO = LLMConfig(togetherai/Meta-Llama-3.1-70B-Instruct-Turbo)
TOGETHER_LLAMA_3_1_8B_INSTRUCT_TURBO = LLMConfig(togetherai/Meta-Llama-3.1-8B-Instruct-Turbo)
TOGETHER_LLAMA_3_70B_CHAT_HF = LLMConfig(togetherai/Llama-3-70b-chat-hf)
TOGETHER_LLAMA_3_8B_CHAT_HF = LLMConfig(togetherai/Llama-3-8b-chat-hf)
TOGETHER_MISTRAL_7B_INSTRUCT_V0_2 = LLMConfig(togetherai/Mistral-7B-Instruct-v0.2)
TOGETHER_MIXTRAL_8X22B_INSTRUCT_V0_1 = LLMConfig(togetherai/Mixtral-8x22B-Instruct-v0.1)
TOGETHER_MIXTRAL_8X7B_INSTRUCT_V0_1 = LLMConfig(togetherai/Mixtral-8x7B-Instruct-v0.1)
TOGETHER_PHIND_CODELLAMA_34B_V2 = LLMConfig(togetherai/Phind-CodeLlama-34B-v2)
TOGETHER_QWEN2_72B_INSTRUCT = LLMConfig(togetherai/Qwen2-72B-Instruct)

notdiamond.llms.request

async notdiamond.llms.request.amodel_select(messages: List[Dict[str, str]], llm_configs: List[LLMConfig], metric: Metric, notdiamond_api_key: str, max_model_depth: int, hash_content: bool, tradeoff: str | None = None, preference_id: str | None = None, tools: Sequence[Dict[str, Any] | Callable] | None = [], previous_session: str | None = None, timeout: int | None = 5, nd_api_url: str | None = 'https://not-diamond-server.onrender.com', _user_agent: str = 'Python-SDK/0.3.12')[source]

This endpoint receives the prompt and routing settings, and makes a call to the NotDiamond API. It returns the best fitting LLM to call and a session ID that can be used for feedback.

Parameters:
  • messages (List[Dict[str, str]]) – list of messages to be used for the LLM call

  • llm_configs (List[LLMConfig]) – a list of available LLMs that the router can decide from

  • metric (Metric) – metric based off which the router makes the decision. As of now only ‘accuracy’ supported.

  • notdiamond_api_key (str) – API key generated via the NotDiamond dashboard.

  • max_model_depth (int) – if your top recommended model is down, specify up to which depth of routing you’re willing to go.

  • hash_content (Optional[bool]) – Flag for hashing content before sending to NotDiamond API.

  • tradeoff (Optional[str], optional) – Define the “cost” or “latency” tradeoff for the router to determine the best LLM for a given query.

  • preference_id (Optional[str], optional) – The ID of the router preference that was configured via the Dashboard. Defaults to None.

  • previous_session (Optional[str], optional) – The session ID of a previous session, allow you to link requests.

  • timeout (int, optional) – timeout for the request. Defaults to 5.

  • nd_api_url (Optional[str], optional) – The URL of the NotDiamond API. Defaults to None.

  • tools (Sequence[Dict[str, Any] | Callable] | None)

  • _user_agent (str)

Returns:

returns a tuple of the chosen LLMConfig to call and a session ID string.

In case of an error the LLM defaults to None and the session ID defaults to ‘NO-SESSION-ID’.

Return type:

tuple(LLMConfig, string)

notdiamond.llms.request.create_preference_id(notdiamond_api_key: str, name: str | None = None, nd_api_url: str | None = 'https://not-diamond-server.onrender.com', _user_agent: str = 'Python-SDK/0.3.12') str[source]

Create a preference id with an optional name. The preference name will appear in your dashboard on Not Diamond.

Parameters:
  • notdiamond_api_key (str)

  • name (str | None)

  • nd_api_url (str | None)

  • _user_agent (str)

Return type:

str

notdiamond.llms.request.get_tools_in_openai_format(tools: Sequence[Dict[str, Any] | Callable] | None)[source]

This function converts the tools list into the format that OpenAI expects. Does this by using langchains Model that automatically creates the dictionary on bind_tools

Parameters:

tools (Optional[Sequence[Union[Dict[str, Any], Callable]]]) – list of tools to be converted

Returns:

dictionary of tools in the format that OpenAI expects

Return type:

dict

notdiamond.llms.request.model_select(messages: List[Dict[str, str]], llm_configs: List[LLMConfig], metric: Metric, notdiamond_api_key: str, max_model_depth: int, hash_content: bool, tradeoff: str | None = None, preference_id: str | None = None, tools: Sequence[Dict[str, Any] | Callable] | None = [], previous_session: str | None = None, timeout: int | None = 5, nd_api_url: str | None = 'https://not-diamond-server.onrender.com', _user_agent: str = 'Python-SDK/0.3.12')[source]

This endpoint receives the prompt and routing settings, and makes a call to the NotDiamond API. It returns the best fitting LLM to call and a session ID that can be used for feedback.

Parameters:
  • messages (List[Dict[str, str]]) – list of messages to be used for the LLM call

  • llm_configs (List[LLMConfig]) – a list of available LLMs that the router can decide from

  • metric (Metric) – metric based off which the router makes the decision. As of now only ‘accuracy’ supported.

  • notdiamond_api_key (str) – API key generated via the NotDiamond dashboard.

  • max_model_depth (int) – if your top recommended model is down, specify up to which depth of routing you’re willing to go.

  • hash_content (Optional[bool]) – Flag for hashing content before sending to NotDiamond API.

  • tradeoff (Optional[str], optional) – Define the “cost” or “latency” tradeoff for the router to determine the best LLM for a given query.

  • preference_id (Optional[str], optional) – The ID of the router preference that was configured via the Dashboard. Defaults to None.

  • previous_session (Optional[str], optional) – The session ID of a previous session, allow you to link requests.

  • timeout (int, optional) – timeout for the request. Defaults to 5.

  • nd_api_url (Optional[str], optional) – The URL of the NotDiamond API. Defaults to None.

  • tools (Sequence[Dict[str, Any] | Callable] | None)

  • _user_agent (str)

Returns:

returns a tuple of the chosen LLMConfig to call and a session ID string.

In case of an error the LLM defaults to None and the session ID defaults to ‘NO-SESSION-ID’.

Return type:

tuple(LLMConfig, string)

notdiamond.llms.request.model_select_parse(response_code, response_json, llm_configs)[source]
notdiamond.llms.request.model_select_prepare(messages: List[Dict[str, str]], llm_configs: List[LLMConfig], metric: Metric, notdiamond_api_key: str, max_model_depth: int, hash_content: bool, tradeoff: str | None = None, preference_id: str | None = None, tools: Sequence[Dict[str, Any] | Callable] | None = [], previous_session: str | None = None, nd_api_url: str | None = 'https://not-diamond-server.onrender.com', _user_agent: str = 'Python-SDK/0.3.12')[source]

This is the core method for the model_select endpoint. It returns the best fitting LLM to call and a session ID that can be used for feedback.

Parameters:
  • messages (List[Dict[str, str]]) – list of messages to be used for the LLM call

  • llm_configs (List[LLMConfig]) – a list of available LLMs that the router can decide from

  • metric (Metric) – metric based off which the router makes the decision. As of now only ‘accuracy’ supported.

  • notdiamond_api_key (str) – API key generated via the NotDiamond dashboard.

  • max_model_depth (int) – if your top recommended model is down, specify up to which depth of routing you’re willing to go.

  • hash_content (Optional[bool]) – Flag for hashing content before sending to NotDiamond API.

  • tradeoff (Optional[str], optional) – Define the “cost” or “latency” tradeoff for the router to determine the best LLM for a given query.

  • preference_id (Optional[str], optional) – The ID of the router preference that was configured via the Dashboard. Defaults to None.

  • previous_session (Optional[str], optional) – The session ID of a previous session, allow you to link requests.

  • async_mode (bool, optional) – whether to run the request in async mode. Defaults to False.

  • nd_api_url (Optional[str], optional) – The URL of the NotDiamond API. Defaults to None.

  • tools (Sequence[Dict[str, Any] | Callable] | None)

  • _user_agent (str)

Returns:

returns data to be used for the API call of modelSelect

Return type:

tuple(url, payload, headers)

notdiamond.llms.request.report_latency(session_id: str, llm_config: LLMConfig, tokens_per_second: float, notdiamond_api_key: str, nd_api_url: str | None = 'https://not-diamond-server.onrender.com', _user_agent: str = 'Python-SDK/0.3.12')[source]

This method makes an API call to the NotDiamond server to report the latency of an LLM call. It helps fine-tune our model router and ensure we offer recommendations that meet your latency expectation.

This feature can be disabled on the NDLLM class level by setting latency_tracking to False.

Parameters:
  • session_id (str) – the session ID that was returned from the invoke or model_select calls, so we know which router call your latency report refers to.

  • llm_provider (LLMConfig) – specifying the LLM provider for which the latency is reported

  • tokens_per_second (float) – latency of the model call calculated based on time elapsed, input tokens, and output tokens

  • notdiamond_api_key (str) – NotDiamond API call used for authentication

  • nd_api_url (Optional[str], optional) – The URL of the NotDiamond API. Defaults to None.

  • llm_config (LLMConfig)

  • _user_agent (str)

Returns:

status code of the API call, 200 if it’s success

Return type:

int

Raises:

ApiError – if the API call to the NotDiamond backend fails, this error is raised