Skip to main content

GSoC 2024

Refactor Scraper and Integrate GitHub Discussions in Leaderboard

Objective

The Google Summer of Code 2024 Project for Leaderboard aims to integrate GitHub Discussions into the leaderboard. To achieve this, we need to scrape GitHub discussions using the GitHub API. However, the current Python scraper relies on the REST API, whereas GitHub Discussions can only be fetched using the GraphQL API. Additionally, Python does not natively support GraphQL integration as outlined here.


Therefore, the project can be divided into two parts as below:

1. Refactor the GitHub Scraper

Issues with the Current Python Scraper

  1. Type Safety: The existing scraper directly uses the GitHub REST API, resulting in a lack of type safety.
  2. GraphQL Support: Python does not natively support GitHub GraphQL, which is required for fetching GitHub discussions.
  3. Code Organization: The current implementation is contained within a single file, leading to reduced readability and maintainability.

Proposed Solutions

The goal is to enhance the GitHub scraper by:

  • Type Safety: Utilizing the Octokit library to ensure type-safe interactions with GitHub Discussions.
  • GraphQL Integration: Implementing GraphQL to facilitate more efficient and meaningful queries.
  • Improved Code Structure: Refactoring the codebase to improve readability and maintainability.

2. Integrate the GitHub Discussions Feature into the Leaderboard

To enrich the leaderboard with the GitHub Discussions feature, we will:

  1. Develop a user interface to display the fetched GitHub Discussions.
  2. Enable reward system for discussion