# Large Language Models

### Salamandra7b-Instruct

* Model: [BSC-LT/salamandra-7b-instruct](https://huggingface.co/BSC-LT/salamandra-7b-instruct)
* Inference:  <https://cx5unbuv4o2z8fhp.us-east-1.aws.endpoints.huggingface.cloud>&#x20;
* GPU: A100

### Salamandra2b-Instruct

* Model: [BSC-LT/salamandra-2b-instruct](https://huggingface.co/BSC-LT/salamandra-2b-instruct)
* Inference: <https://o9wl2lsjfs4966jz.eu-west-1.aws.endpoints.huggingface.cloud>&#x20;
* GPU: A10

### Salamandra2b-Instruct-Aina-hack (Recomended)

<mark style="color:green;">**This is the new version of salamandra. (It is based on the same foundation model but is tuned to better**</mark><mark style="color:green;">**&#x20;**</mark><mark style="color:green;">**follow the system prompt)**</mark>

* Model: [BSC-LT/salamandra-2b-instruct-aina-hack](https://huggingface.co/BSC-LT/salamandra-2b-instruct-aina-hack)&#x20;
* Inference: <https://j292uzvvh7z6h2r4.us-east-1.aws.endpoints.huggingface.cloud> &#x20;
* GPU: A10

### Salamandra7b-Instruct-Aina-hack (Recomended)

<mark style="color:green;">**This is the new version of salamandra. (It is based on the same foundation model but is tuned to better follow the system prompt).**</mark>

* Model: [BSC-LT/salamandra-7b-instruct-aina-hack](https://huggingface.co/BSC-LT/salamandra-7b-instruct-aina-hack) &#x20;
* Inference: <https://hijbc1ux6ie03ouo.us-east-1.aws.endpoints.huggingface.cloud>&#x20;
* GPU: A100

### Code examples

OpenAI Chat Completions

```python
#pip install openai
from dotenv import load_dotenv
import os
from openai import OpenAI
load_dotenv(".env")

HF_TOKEN = os.environ["HF_TOKEN"]
BASE_URL = os.environ["BASE_URL"]

#pip install openai
client = OpenAI(
       base_url=BASE_URL + "/v1/",
       api_key=HF_TOKEN
   )
messages = [{ "role": "system", "content": "you are a helpful assistant"}]
messages.append( {"role":"user", "content": "Tell me somthing about AI"})
stream = False
chat_completion = client.chat.completions.create(
   model="tgi",
   messages=messages,
   stream=stream,
   max_tokens=1000,
   # temperature=0.1,
   # top_p=0.95,
   # frequency_penalty=0.2,
)
text = ""
if stream:
 for message in chat_completion:
   text += message.choices[0].delta.content
   print(message.choices[0].delta.content, end="")
 print(text)
else:
 text = chat_completion.choices[0].message.content
 print(text)
```

Generate with requests

```python
import requests
HF_TOKEN = os.environ["HF_TOKEN"]
BASE_URL = os.environ["BASE_URL"]
model_name = "BSC-LT/salamandra-7b-instruct-aina-hack"
tokenizer = AutoTokenizer.from_pretrained(model_name)

headers = {
    "Accept" : "application/json",
    "Authorization": f"Bearer {HF_TOKEN}",
    "Content-Type": "application/json"
}
system_prompt = "you are a helpful assistant"
text = "Tell me somthing about AI"
message = [ { "role": "system", "content": system_prompt} ]
message += [ { "role": "user", "content": text } ]
prompt = tokenizer.apply_chat_template(
   message,
   tokenize=False,
   add_generation_prompt=True,
)

payload = {
   "inputs": prompt,
   "parameters": {}
}
api_url = BASE_URL + "/generate"
response = requests.post(api_url, headers=headers, json=payload)
print(response.json())
```

Curl Chat Completions

```bash
URL=replace_with_endpoint_hf_url
TOKEN=replace_with_provided_token
curl "${URL}/v1/chat/completions" -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
	"model": "tgi",
	"messages": [
    	{
        	"role": "user",
        	"content": "What is deep learning?"
    	}
	],
	"max_tokens": 150,
	"stream": true
}'
```

Curl Generate

```bash
URL=replace_with_endpoint_hf_url
TOKEN=replace_with_provided_token
curl "${URL}/generate" -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type: appli
cation/json" -d '{
	"model": "tgi",
	"inputs": "what is AI",
	"max_tokens": 150,
	"stream": false
}'
```

### How to fine tune a model

You can follow this example from Meta, just pointing to the Salamandra models instead of Llama:

<https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/finetuning/quickstart_peft_finetuning.ipynb>

<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://langtech-bsc.gitbook.io/aina-kit/aina-hack/large-language-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
