Lecture 1
prof's first time teaching course, will fill out syllabus later (topics, etc. - due to 260, 340 not being prereqs)
because of this, some material from 340 and 260 will be discussed towards beginning of course
prof may be late to class b/c previous class is far away
course book - no official textbook. original idea was to take course from prev. prof but will be major differences. may provide links for addt'l material for the course
will try to cover material in course but may assign self-study material
will assign one extra assignment to masters students (official requirement)
current grading idea - 3 assignments (4 for masters students), midterm and final exams
will try to make 340 material part interactive b/c 50% of class has taken 340 already
no attendance grade except same check as 260
grading percentages will be decided on eventually
similar rules to 260 w.r.t. homework requirements
slack - contact via stack channel, link will be in syllabus, same as 260
office hours are joint with 340 students, on days where there are classes (same as 260) - days & times tbd
language will be c++, but will be doing mpi stuff as well
first, discuss parallel computing (high-level)
then, discuss 340 material - processes, threads, synchronization
then continue to actual parallel computing material
parallel computing rationale:
- want fast computers
- computation-intensive applications:
- games, graphics, llms/ml, gene sequencing/medical research, simulations
- computation-intensive applications:
methods:
- cpu speed limit reached a while ago - power wall
- memory limit is even worse
- modern approach: add more cores to cpus, etc.
- instead of faster, do things simultaneously
- same approach in memory with dual channel memory, and in individual cpu cores with pipelining
- concept of "SuperScalar" cpu:
- batch fetching: fetch multiple instructions, have multiple pipelines with multiple issue stations, multiple ALUs, analyze instructions and reorder for better performance (out-of-order execution), etc.
- (not covered in our course - not something actionable to us, just built-in to cpu)
- vector instructions, SIMD (won't deal with in this course)
- special vector registers that can store multiple values in cpu, special vector instructions that can process these registers at the same time
- next idea:
- first, there were multiple CPUs in a computer (supercomputers)
- however, expensive - individual caches, etc.
- next, cpu cores - individual/(almost) independent execution unit within cpu
- kind of like jamming multiple cpus into a single cpu
- but can do better
- next, clusters - multiple computers working on same task simultaneously
- e.g., beowulf - free software to create compute clusters
- graphics - computationally intense, very parallelism-friendly
- GPU - cpu optimized for graphical tasks
- cpu with many weak cores
- modern cpu may have 64 cores, modern gpu may have 10s of 1000s of cores
- gpus can be used for non-graphical tasks as well:
- CUDA - library for nvidia gpus
- examples: crypto mining, llm training
- first, there were multiple CPUs in a computer (supercomputers)
however, can only take advantage of multiple cores if program is optimized for it/written with multiple cores in mind; by default just running on single core
note - only can expect a speedup proportional to core amount for parts of program that are parallelism-friendly
- example: difference between bringing stack of bricks from one place to another (parallelism-friendly), and baking a cake (not parallelism-friendly)
as a rule: if half of program can be parallelized, and other half cannot, will take at least half the unparallelized time regardless of the amount of compute thrown at it
classic parallel computing course would have discussed e.g. scientific computations. these people will typically use libraries/packages that use these techniques
but we will also discuss concurrency - have multiple cores doing different things simultaneously
- example: when asked to read file from hard disk, program is put to sleep - the call to read the file is "blocking"
- usually program is more complicated, however - e.g. for a text editor, still need to run user interface
- can do these things simulataneously
concurrency is very related to parallel programming - do work on same task in parallel case; in concurrent case, different tasks/different work