Projects.txt (3.54 kB)
Download fileSQL database dump for source code repository logs for 90 top ranked Java projects (hosted on GitHub) extracted using the CVSAnaly toolset
This dataset is used in a study performed to understand the semantic
content of the source code produced in a collaborative environment. The
semantic content is described as the `dictionary' of the key terms
contained within a source artifact. We posit that the semantic content
of a Java class will increase as long as more developers add more
content on the same class. This has a direct effect on its complexity,
maintainability and understandability.