Projects.txt (3.54 kB)
SQL database dump for source code repository logs for 90 top ranked Java projects (hosted on GitHub) extracted using the CVSAnaly toolset
This dataset is used in a study performed to understand the semantic
content of the source code produced in a collaborative environment. The
semantic content is described as the `dictionary' of the key terms
contained within a source artifact. We posit that the semantic content
of a Java class will increase as long as more developers add more
content on the same class. This has a direct effect on its complexity,
maintainability and understandability.