Project Guidelines

Objectives

  • Working in small teams, propose, design, present, and demonstrate a project that is based on processing large data sets.
  • Through project work
    • Apply your understanding of data structures and Python programming to develop a Python program that processes data sets of interest.
    • Benefit from the diversity of talent and experiences of the students in the class through their sharing of the work and participation in the review process.

The project learning activities will give more practice with:

  1. Python data structures including nested structures and text files
  2. Version control and project management with git and GitHub
  3. Debugging a more complex codebase
  4. Making individual contributions and collaborating with peers
  5. Communicating your ideas to diverse audiences

Project Structure

The project codebase resides in GitHub organization for this course in a remote repository. The codebase has Python modules that form the Python packages of your project implementation.

Locally, have a project directory in your comp801. Structure the project directory as follows:

Documentation

  • README.md briefly describes WHAT the project is about:
    • what the project investigates
    • what data is used for the investigation.
  • HOWTO.md has instructions about how to run the cloned repository locally, including any package installations
    • description of the project Python virtual environment: type and list of installed packages
    • description of the run configurations for individual features and the whole program
    • description of how to run the test cases.
  • In docs folder (see below Python codebade file structure)
    • PROPOSAL.md
    • REPORT.md

Python Codebase Structre

  • Two .py modules that implement the investigative questions
    • Each team member has their own module in .py module.
    • Each module implements a Python class. The class methods answer the questions assigned to each team member.
  • Testing modules.
    • Each team member has two testing modules.
    • Each testing module has at least three tests.
  • One client called module client.py
    • The module demonstrates the functionality of the .py modules.
    • The team decides how to structure client.py
  • docs directory has PROPOSAL.md and REPORT.md files.

Other Directories and Files

  • data directory has multiple data files from your data set, including:
    • Two small-size datasets for testing purposes (e.g., 10-15 records)
    • Complete CSV data files to demonstrate the full functionality of the program.
  • .gitignore must be included in the project directory
  • requirements.txt must be included with package dependencies

Development Process

  • Project Board describes the project development plan: who did what and when

    • Use Table view to add Start Date and End Date fields
    • Create project board notes based on assigned tasks
      • Assign team member to tasks
      • Determine start and end dates for task
    • Use Table or Board view to monitor progress
      • Update task status: from to-do to progress to complete
      • Add tasks as necessary
  • Branches correspond to units of development associated with the features or bug fixes of the project

    • Create a branch for each individual contribution
    • The name of the branch has the following format: `name-number-description``
      • The name is the team member’s first name
      • Number is the sequential number of the branch for the team member
      • Description briefly describes what contribution is done on the branch.
    • Synchronize local main with the remote origin/main before you create your own branch
    • Commits must have meaningful messages.
    • Do not merge or rebase your branch into main. See pull request below.
    • Push your branch to the remote
    • Do NOT delete any of the branches. Branches are evidence of your work.
  • A pull request is opened in GitHub after a local branch was pushed to the remote repository

    • Assign your team member as reviewer of the branch
    • Upon reviewer’s approval, merge the branch into remote main using GitHub

Documenting, Testing, Implementation, and Debugging

  • All modules, classes, methods, and functions have docstrings that have accurate descriptions of the unit of code they document
  • Coding style requirements are fully met (no pylint or pycodestyle errors, or PyCharm problems)
  • Small size testing files are used to provide evidence of testing that allows for deriving expected results through alternative ways of testing

Other Requirements

The project

  • Has client.py module, with a main() function in client package.
  • Is runnable using python client/client.py.

The code will be run automatically using GitHub Actions, and all the tests, code formatting and docstrings checks should be clean.