Hacktoberfest 2024 | Llama 3.2 Vision 🤝 Workflows #694

PawelPeczek-Roboflow · 2024-09-30T12:59:38Z

Llama 3.2 Vision in Workflows

Are you ready to make a difference this Hacktoberfest? We’re excited to invite you to contribute by integrating LLama 3.2 Vision into our Workflows ecosystem! This new block for image generation will be a fantastic addition, broadening the horizons of what our platform can achieve.

Join us in enhancing our capabilities and empowering users to harness the power of vision technology. Whether you're a seasoned developer or just starting your journey in open source, your contributions will play a vital role in shaping the future of our ecosystem. Let’s collaborate and bring this innovative functionality to life!

Task description

The task is to integrate the new Llama 3.2 Vision into workflows
We haven't discover the model capabilities yet - that is also the part of the task 🥳
We prefer light integration to REST API through requests library - we've found that OpenRouter provides REST API access (see this) - but if you find a better option - feel free to discuss
We imagine the model can be implemented in similar way as other VLMs:
- OpenAI GPT
- Gemini
- Claude
- Florence 2
please raise any issues with the task in the discussion below

Cheatsheet

The text was updated successfully, but these errors were encountered:

AHB102 · 2024-11-14T06:13:57Z

@PawelPeczek-Roboflow Can I have a go at it ? And can you tell me what is expected, are we talking about complete integration end to end or breaking down this issue into sub issues which can be tackled.

PawelPeczek-Roboflow · 2024-11-14T10:48:45Z

Hi @AHB102, thanks for engaging into the issue.

Sure, you can pick up the task - so the point is we would like to:
a) find a suitable hosted version of Llama vision model such that we can integrate via making HTTP requests
b) once this is agreed - we need to create Workflow blocks similar to https://github.com/roboflow/inference/blob/main/inference/core/workflows/core_steps/models/foundation/openai/v2.py wrapping up the model prompting for various tasks - that would require a little bit of exploration of model capabilities

PawelPeczek-Roboflow · 2024-11-14T10:51:29Z

First step would definitely be agreeing on API that host llama
Options I see:

but was not investigating all of the options, which would be good to do.

I would try to find cheap and reliable third party

AHB102 · 2024-11-14T13:38:03Z

@PawelPeczek-Roboflow I looked into hosted Llama 3.2 Vision APIs and found a few options: Together.ai (https://api.together.xyz/models) , Google Vertex AI (https://cloud.google.com/blog/products/ai-machine-learning/llama-3-2-metas-new-generation-models-vertex-ai) , Azure (https://techcommunity.microsoft.com/blog/machinelearningblog/meta%E2%80%99s-new-llama-3-2-slms-and-image-reasoning-models-now-available-on-azure-ai-m/4255167) and AWS Bedrock(https://aws.amazon.com/blogs/machine-learning/vision-use-cases-with-llama-3-2-11b-and-90b-models-from-meta/).Except Together.ai all of the other options have massive scale , it would be reliable and cheap. Hugging face also has a offering for inferencing. I checked out OpenRouter's API limits. The Llama 3.2 11B model is currently free, and the usage rates are pretty good. I think 20 requests per minute should be plenty for most things Any thoughts ?

PawelPeczek-Roboflow · 2024-11-14T13:48:32Z

I do not have particular bias towards any of the vendor - I even see that the decision which is most handy for people to use is strictly related to individual preferences of the consumer.
I believe AWS / Google / MS would be the "stable" choice, whereas I expect other third parties to be more attractive cost-wise.
One thing to keep in mind is also how easy it is to integrate - I remember that at least part of google API clients are bulky and require setting API key at process-level, not for individual invocation (which is 🔴 flag for multi-tennant deployments which we do with workflows).

I see the construction of the block in the following way:

we support parameters required to deal with model
and on the "orthogonal" axis - we do support 2 parameters - api_key and provider - which will choose backend to use. This approach do also have cons, but at least we do not need multiple blocks to handle the same model from different providers
we could start easy with one provider, ensuring extensibility for the future
wdyt?

AHB102 · 2024-11-14T14:35:30Z

That sounds great! This approach provides a solid foundation for future scalability and flexibility. By not committing to a single vendor upfront, we can adapt to evolving needs and avoid potential vendor lock-in.

To start, I suggest we explore OpenRouter. It offers free API usage for Llama 2.3 11B, making it ideal for initial testing and development. Additionally, its compatibility with familiar libraries like requests and openai can streamline the integration process and minimize security risks.

Once we have a robust core structure in place, we can easily pivot to other providers. wdyt ?

PawelPeczek-Roboflow added the Hacktoberfest 2024 label Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hacktoberfest 2024 | Llama 3.2 Vision 🤝 Workflows #694

Hacktoberfest 2024 | Llama 3.2 Vision 🤝 Workflows #694

PawelPeczek-Roboflow commented Sep 30, 2024

AHB102 commented Nov 14, 2024 •

edited

Loading

PawelPeczek-Roboflow commented Nov 14, 2024

PawelPeczek-Roboflow commented Nov 14, 2024

AHB102 commented Nov 14, 2024

PawelPeczek-Roboflow commented Nov 14, 2024 •

edited

Loading

AHB102 commented Nov 14, 2024

Hacktoberfest 2024 | Llama 3.2 Vision 🤝 Workflows #694

Hacktoberfest 2024 | Llama 3.2 Vision 🤝 Workflows #694

Comments

PawelPeczek-Roboflow commented Sep 30, 2024

Llama 3.2 Vision in Workflows

Task description

Cheatsheet

AHB102 commented Nov 14, 2024 • edited Loading

PawelPeczek-Roboflow commented Nov 14, 2024

PawelPeczek-Roboflow commented Nov 14, 2024

AHB102 commented Nov 14, 2024

PawelPeczek-Roboflow commented Nov 14, 2024 • edited Loading

AHB102 commented Nov 14, 2024

AHB102 commented Nov 14, 2024 •

edited

Loading

PawelPeczek-Roboflow commented Nov 14, 2024 •

edited

Loading