Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Challenge] Metadata and Clusters #14

Open
alinapark opened this issue Jul 12, 2023 · 2 comments
Open

[Challenge] Metadata and Clusters #14

alinapark opened this issue Jul 12, 2023 · 2 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@alinapark
Copy link
Contributor

The objective of this challenge is to train a deep learning model to predict coordinates or cluster regions coordinates of texts while improving on Yachay’s original infrastructure.

We offer an annotated dataset for training and testing, comprising texts and their region cluster IDs, coordinates, post metadata, and more. We recommend considering the post metadata field, but you are free to exclude/include any of the provided dataset fields if it leads to improved validation metrics on your end. Regression, classification, multi-task or else - all solutions and suggestions are welcome!

Yachay team will evaluate the model using the test dataset that is not shared here.

Note: metadata and clusters issue-challenge allows for a higher number/variety of experiments. No hard MSE or EER requirements, we're looking for innovative ideas for infrastructure development.

The provided dataset is here, which:

  • annotated corpus of ~600k+ texts, with respective regions (clusters), timestamps and over 40k user_id-s
  • a median number of 415 texts per region (cluster)
  • each user has at least 6 texts
  • an additional list of cluster_ids with coordinates of the cluster for mapping texts to coordinates.

As for the deliverables, we looking for:

  • a model which takes a text on the input and returns the coordinates on the output
  • evaluation metrics obtained on the development dataset, including Mean Absolute Error in Haversine Distance

Send a Pull Request with your results, comment here for questions, or ping on Discord for requests!

Thank you for contributing to Open Source and making a difference! ʕ•́ᴥ•̀ʔ

@ingakaspar ingakaspar added good first issue Good for newcomers help wanted Extra attention is needed labels Jul 14, 2023
@AnuravModak
Copy link

is this challenge still open and currently looking for contributions???? @alinapark @ingakaspar

@alinapark
Copy link
Contributor Author

@AnuravModak it is, as long as the issue is here

@alinapark alinapark self-assigned this Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants