> For the complete documentation index, see [llms.txt](https://langtech-bsc.gitbook.io/alia-kit/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://langtech-bsc.gitbook.io/alia-kit/modelos/modelos-multimodales.md). # Modelos multimodales

Descripción / Función	Nombre modelo	Model card	Comentario
LLM especializado en imágenes y videos	Salamandra-VL-7B-2512	https://huggingface.co/BSC-LT/Salamandra-VL-7B-2512	Versión más reciente de la familia de modelos multimodales Salamandra. Combina el codificador visual SigLIP 2 Giant con Salamandra 7B ajustado para seguir instrucciones, con especial atención a las lenguas europeas, y mejora la comprensión visual detallada y el conteo mediante datos PixMo.
LLM especializado en imágenes y videos	salamandra-7b-vision	https://huggingface.co/BSC-LT/salamandra-7b-vision	Modelo salamandra-7b adaptado para el procesamiento de imágenes y videos.
Traducción de voz a texto	SalamandraTAV-7b	https://huggingface.co/BSC-LT/salamandra-TAV-7b	Modelo de lenguaje multimodal especializado en voz y traducción, afinado a partir de salamandraTA-7b-instruct, Admite seis lenguas ibéricas además del inglés y puede realizar reconocimiento automático del habla, traducción de texto, traducción de voz a texto e identificación de la lengua hablada.
Modelo multimodal (voz y texto)	speech-salamandra-es-en	https://huggingface.co/BSC-LT/speech-salamandra-es-en	Modelo multimodal de voz y texto en español e inglés, basado en Salamandra-7B-Instruct e integrado con el codificador de voz de SeamlessM4T v2. Puede recibir instrucciones de texto y audio para realizar reconocimiento automático del habla y responder preguntas sobre contenidos hablados.
Modelo multimodal y muiltilingüe instruido	Latxa-Qwen3.5-2B	https://huggingface.co/HiTZ/Latxa-Qwen3.5-2B	Modelo multimodal y multilingüe instruido, basado en Qwen3-VL-2B-Instruct, capaz de comprender y generar texto, procesar imágenes y seguir instrucciones. Ha sido adaptado para mejorar su rendimiento en euskera y, en su variante multilingüe, también en gallego y catalán.
Modelo multimodal y muiltilingüe instruido	Latxa-Qwen3.5-4B	https://huggingface.co/HiTZ/Latxa-Qwen3.5-4B	Modelo multimodal y multilingüe instruido, basado en Qwen3.5-4B, capaz de comprender y generar texto, procesar imágenes y seguir instrucciones. Ha sido adaptado para mejorar su rendimiento en euskera, gallego y catalán.
Modelo multimodal y muiltilingüe instruido	Latxa Qwen-3 VL 2B	https://huggingface.co/HiTZ/Latxa-Qwen3-VL-2B-Instruct	Modelo multimodal y multilingüe instruido, basado en Qwen3-VL-2B-Instruct, capaz de comprender y generar texto, procesar imágenes y seguir instrucciones. Ha sido adaptado para mejorar su rendimiento en euskera y, en su variante multilingüe, también en gallego y catalán.
Modelo multimodal y muiltilingüe instruido	Latxa Qwen-3 VL 4B	https://huggingface.co/HiTZ/Latxa-Qwen3-VL-4B-Instruct	Modelo multimodal y multilingüe instruido, basado en Qwen3-VL-4B-Instruct, capaz de comprender y generar texto, procesar imágenes y seguir instrucciones. Ha sido adaptado para mejorar su rendimiento en euskera y, en su variante multilingüe, también en gallego y catalán.
Modelo multimodal y muiltilingüe instruido	Latxa-Qwen3-VL-8B-Instruct	https://huggingface.co/HiTZ/Latxa-Qwen3-VL-8B-Instruct	Modelo multimodal y multilingüe instruido, basado en Qwen3-VL-8B-Instruct, capaz de comprender y generar texto, procesar imágenes y seguir instrucciones. Ha sido adaptado para mejorar su rendimiento en euskera y, en su variante multilingüe, también en gallego y catalán.
Modelo multimodal y muiltilingüe instruido	Latxa-Qwen3-VL-32B-Instruct	https://huggingface.co/HiTZ/Latxa-Qwen3-VL-32B-Instruct	Modelo multimodal y multilingüe instruido, basado en Qwen3-VL-32B-Instruct, capaz de comprender y generar texto, procesar imágenes y seguir instrucciones. Ha sido adaptado para mejorar su rendimiento en euskera y, en su variante multilingüe, también en gallego y catalán.

--- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://langtech-bsc.gitbook.io/alia-kit/modelos/modelos-multimodales.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.