Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hacktoberfest 2024 | Llama 3.2 Vision 🤝 Workflows #694

Open
PawelPeczek-Roboflow opened this issue Sep 30, 2024 · 6 comments
Open

Hacktoberfest 2024 | Llama 3.2 Vision 🤝 Workflows #694

PawelPeczek-Roboflow opened this issue Sep 30, 2024 · 6 comments

Comments

@PawelPeczek-Roboflow
Copy link
Collaborator

Llama 3.2 Vision in Workflows

Are you ready to make a difference this Hacktoberfest? We’re excited to invite you to contribute by integrating LLama 3.2 Vision into our Workflows ecosystem! This new block for image generation will be a fantastic addition, broadening the horizons of what our platform can achieve.

Join us in enhancing our capabilities and empowering users to harness the power of vision technology. Whether you're a seasoned developer or just starting your journey in open source, your contributions will play a vital role in shaping the future of our ecosystem. Let’s collaborate and bring this innovative functionality to life!

Task description

  • The task is to integrate the new Llama 3.2 Vision into workflows
  • We haven't discover the model capabilities yet - that is also the part of the task 🥳
  • We prefer light integration to REST API through requests library - we've found that OpenRouter provides REST API access (see this) - but if you find a better option - feel free to discuss
  • We imagine the model can be implemented in similar way as other VLMs:
  • please raise any issues with the task in the discussion below

Cheatsheet

@AHB102
Copy link

AHB102 commented Nov 14, 2024

@PawelPeczek-Roboflow Can I have a go at it ? And can you tell me what is expected, are we talking about complete integration end to end or breaking down this issue into sub issues which can be tackled.

@PawelPeczek-Roboflow
Copy link
Collaborator Author

Hi @AHB102, thanks for engaging into the issue.

Sure, you can pick up the task - so the point is we would like to:
a) find a suitable hosted version of Llama vision model such that we can integrate via making HTTP requests
b) once this is agreed - we need to create Workflow blocks similar to https://github.com/roboflow/inference/blob/main/inference/core/workflows/core_steps/models/foundation/openai/v2.py wrapping up the model prompting for various tasks - that would require a little bit of exploration of model capabilities

@PawelPeczek-Roboflow
Copy link
Collaborator Author

First step would definitely be agreeing on API that host llama
Options I see:

but was not investigating all of the options, which would be good to do.

I would try to find cheap and reliable third party

@AHB102
Copy link

AHB102 commented Nov 14, 2024

@PawelPeczek-Roboflow I looked into hosted Llama 3.2 Vision APIs and found a few options: Together.ai (https://api.together.xyz/models) , Google Vertex AI (https://cloud.google.com/blog/products/ai-machine-learning/llama-3-2-metas-new-generation-models-vertex-ai) , Azure (https://techcommunity.microsoft.com/blog/machinelearningblog/meta%E2%80%99s-new-llama-3-2-slms-and-image-reasoning-models-now-available-on-azure-ai-m/4255167) and AWS Bedrock(https://aws.amazon.com/blogs/machine-learning/vision-use-cases-with-llama-3-2-11b-and-90b-models-from-meta/).Except Together.ai all of the other options have massive scale , it would be reliable and cheap. Hugging face also has a offering for inferencing. I checked out OpenRouter's API limits. The Llama 3.2 11B model is currently free, and the usage rates are pretty good. I think 20 requests per minute should be plenty for most things Any thoughts ?

@PawelPeczek-Roboflow
Copy link
Collaborator Author

PawelPeczek-Roboflow commented Nov 14, 2024

I do not have particular bias towards any of the vendor - I even see that the decision which is most handy for people to use is strictly related to individual preferences of the consumer.
I believe AWS / Google / MS would be the "stable" choice, whereas I expect other third parties to be more attractive cost-wise.
One thing to keep in mind is also how easy it is to integrate - I remember that at least part of google API clients are bulky and require setting API key at process-level, not for individual invocation (which is 🔴 flag for multi-tennant deployments which we do with workflows).

I see the construction of the block in the following way:

  • we support parameters required to deal with model
  • and on the "orthogonal" axis - we do support 2 parameters - api_key and provider - which will choose backend to use. This approach do also have cons, but at least we do not need multiple blocks to handle the same model from different providers
  • we could start easy with one provider, ensuring extensibility for the future
    wdyt?

@AHB102
Copy link

AHB102 commented Nov 14, 2024

That sounds great! This approach provides a solid foundation for future scalability and flexibility. By not committing to a single vendor upfront, we can adapt to evolving needs and avoid potential vendor lock-in.

To start, I suggest we explore OpenRouter. It offers free API usage for Llama 2.3 11B, making it ideal for initial testing and development. Additionally, its compatibility with familiar libraries like requests and openai can streamline the integration process and minimize security risks.

Once we have a robust core structure in place, we can easily pivot to other providers. wdyt ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants