Comfac GPU Scaling and AI Research Goals

this is the gpu scaling project. How comfac will sell GPU solutions.

Objective: To develop and scale a high-performance AMD-based AI compute cluster, capable of running large-scale models (e.g., Qwen 2.5 235B) and supporting educational and R&D initiatives through open collaboration with partner schools.

Goals and Steps:

Platform and Motherboard Selection
- Identify and procure a motherboard or server platform that supports extensive GPU scaling and PCIe bifurcation (similar to the setup demonstrated by PewDiePie).
- Ensure compatibility with ROCm and vLLM for distributed inference and multi-GPU coordination.
Initial Scaling (Pilot Models)
- Begin with well-known, stable models to validate infrastructure performance and reliability.
- Pilot hardware: AMD Radeon PRO R9700 AI or equivalent AI-focused GPU.
- Validate thermal performance, power delivery, and driver stability for continuous inference workloads.
Progressive Hardware Replication
- Once stable results are achieved with R9700 PRO, replicate the same environment using RX 7900 XTX and other AMD GPUs to benchmark performance scaling.
- Document compatibility issues, driver updates, and quantization performance metrics.
Cluster and Swarm Development
- Establish a Cluster System for large-model distributed inference and training.
- Build a Swarm System capable of parallelizing smaller AI instances (e.g., 7700 and lower-end GPU nodes) for local and academic deployment.
- Optimize inter-node communication, synchronization, and monitoring tools for mixed hardware setups.
Funding and Laboratory Deployment
- Fund the creation of a dedicated AI Lab focused on testing, documentation, and educational use.
- Provide access to partner schools for research, benchmarking, and AI model fine-tuning.
Open Compute and Tokenization Participation
- Study and participate in open-source projects that allow community-based compute contributions (similar to Folding@home).
- Learn and experiment with decentralized compute-sharing models that enable contributors to sell tokens or compute time securely and transparently.

Reference: Inspirational video:

End Goal: To make Comfac and its academic partners a recognized hub for open, scalable, and sustainable AI research using AMD technologies.