SQL database dump for source code repository logs for 90 top ranked Java projects (hosted on GitHub) extracted using the CVSAnaly toolset
This dataset is used in a study performed to understand the semantic content of the source code produced in a collaborative environment. The semantic content is described as the `dictionary' of the key terms contained within a source artifact. We posit that the semantic content of a Java class will increase as long as more developers add more content on the same class. This has a direct effect on its complexity, maintainability and understandability.