Ethical-Document

This is the replication package for "Documenting Ethical Considerations in Open Source AI Models", accepted by 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM'24).

Folder Walkthrough

qualitative codes contains our thematic analysis codes for documents from three data sources.
mine_repositories folder contains code for mining repositories on both GitHub and Hugging Face.
duplicate_detection section contains the code we used to detect document reuse.
Curated documents folder contains documents we mined on three data sources. You can cross reference these raw documents with our codes under qualitative codes folder.

We will provide a more detailed folder explaination below.

qualitative codes

categories_mindmap.pdf contains the mindmap describing the hierarchical relationship for how base codes are synthesised into concepts and then categories.

codes contains the code number and its description.

sample_hf_mc contains the document name, keypoints, codes for the data source of HF_CARD.

sample_gh_rm contains the document name, keypoints, codes for the data source of GH_README.

sample_gh_mc contains the document name, keypoints, codes for the data source of GH_CARD.

mine_repositories

This folder contains the mining procedure as well as the keyword expansion procedure. The keyword-base.txt is used for initial keyword filtering. After keyword_filter_expansion.py, which is used to expand the keyword set based on the process we described in the paper, we get extra_keywords_paragraph.txt.

duplicate_detection

This folder contains the code for detecting duplicate documents. Please refer to section 3.3 in our paper for details. duplicate_detection.py for is responsible for generating similarity matrix and performing clustering.

Curated documents

This folder contains our collected documents. There are three subfolders in it, github_model_card_documents, github_readme_documents, and huggingface_model_card_documents corresponds to terms GH_CARD, GH_README, and HF_CARD respectively in our paper. Please note that this is the raw data before any filters.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Curated documents		Curated documents
duplicate_detection		duplicate_detection
mine_repositories		mine_repositories
qualitative codes		qualitative codes
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ethical-Document

Folder Walkthrough

qualitative codes

mine_repositories

duplicate_detection

Curated documents

About

Releases 1

Packages

Languages

Haoyu-Gao/Ethical-Document

Folders and files

Latest commit

History

Repository files navigation

Ethical-Document

Folder Walkthrough

qualitative codes

mine_repositories

duplicate_detection

Curated documents

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages