ࡱ> jmi =bjbjVV :z<<y5 ff8yt D F F F F F F "E%JF F [ @@@D @D @@@/07 j@0 q 0 @%*%@%@@F F * %f o: CS159 Homework Project 3 The primary objective behind this lab is to learn about the fundamental OpenMP parallel programming constructs. Recall that OpenMP is not a stand-alone language; it is an API to an existing language. The OpenMP API consists of a set of compiler directives (called pragmas), runtime library routines and environment variables. The pragmas that are added to the original sequential program appear to be nothing more than comments to a non-OpenMP version of the compiler, and are simply ignored. However, by using an OpenMP-capable compiler (and activating that feature with the proper compiler switch or option), OpenMP can provide the compiler with additional information that can be used to parallelize the program. OpenMP APIs are currently available for the C, C++ and Fortran languages. We will use the C version via gcc. A secondary objective of this project is to examine race conditions and scaling effects that can be observed on the 64-core Intel MTL system. Use the Intel MTL system whenever possible for this project; however, keep in mind that in previous classes, certain students have not been able to connect (or stay connected for more than a few minutes). The potential problems were many and varied. In some cases, they were unsolvable. In the event that MTL is not usable for you, team with another member of the class who is able to get access, or use Knoppix instead (on a multi-core machine if at all possible). To compile with OpenMP activated, use the command gcc -fopenmp yourprogram.c For the following programs, turn in a printout (e.g., a screenshot) of the execution trace. In some cases, you may need to run several executions of the program. It is not necessary to turn in printouts of all executions, just a representative sample of them. It is also not necessary to show the entire trace of an execution. Some problems will generate output larger than what will fit in one screen of the terminal emulator (e.g., Putty). Just include what you feel is necessary to effectively answer the question. 1) OpenMP uses the fork-join model, which is essentially based on the same system fork command that was used in the previous projects. Just like the fork command, we can tell OpenMP to spawn a child process that will execute the exact same code that the parent process will execute. In OpenMP, the forking is handled automatically, in a more transparent and convenient manner for the programmer. In addition, with a single pragma, multiple child processes (actually threads) can be created. A thread is a runtime entity that is able to independently execute a stream of instructions. In OpenMP, a Parallel Region is a block of code executed by all threads simultaneously. Each thread is not thought of as being a parent or child, but rather the group is referred to as a team. The region of the program which should be run in parallel by the team is simply bounded as follows: #pragma omp parallel { } If the number of desired threads is not explicitly specified by the programmer, then OpenMP will set the number of threads by default to match the number of logical hardware cores in the system. Run the following program sequentially, that is, compile it without the -fopenmp option, then run it in parallel with the -fopenmp option specified during compilation. Explain what happens in each case. Recall that the Intel MTL is a 32-core machine, where each physical core is Hyper-Threaded. [5 pts] #include int main() { #pragma omp parallel { printf( Hello ); } printf(\n\n GoodBye Team Destroyed Exiting Program \n\n); } 2) Now, specify the number of threads in the team by using omp_set_num_threads(nthreads) Although some compilers may not require it, note that you may also need to add the following include clause as well #include The program from (1) should now appear as follows for nthreads = 2: #include #include int main() { omp_set_num_threads(2); #pragma omp parallel { printf( Hello ); } printf(\n\n GoodBye Team Destroyed Exiting Program \n\n); } Run the program above with the nthread parameter set to 2, 4 and 8. Verify that the output is as expected for the different number of threads created in the team. Is it possible to specify an odd number of threads? Is it possible to specify a number of threads greater than the number of logical cores (or physical cores if no HyperThreading is present) available in the hardware? [5 pts] 3) The omp function call omp_get_thread_num() retrieves the ID of the thread. Note that the master thread always has thread ID 0. That is, the numbering of the threads in the team starts from zero; not one. The other threads are given IDs of consecutively higher integers (1, 2, 3, etc.) up to nthreads-1. By using omp_get_thread_num() we can have each thread print out its ID in addition to Hello . Change the single printf line with Hello in it from (2) above to get the following: #include #include int main() { omp_set_num_threads(2); #pragma omp parallel { printf(\n Hello from thread = %d , omp_get_thread_num() ); } printf(\n\n GoodBye Team Destroyed Exiting Program \n\n); } Run the program for various values of nthreads. Perform several runs for each value of nthread and comment on the order in which the threads are run. Note that you can also specify an nthreads value of 1 to default back to the sequential, single thread case. [5 pts] 4) More often than not, a programmer is not just interested in having several threads all executing an identical section of code; (s)he would want to distribute the work among the threads so that it can be completed in less time and in parallel. In the previous labs that used fork, the programmer had to manually separate the work for parent vs. child by performing a test on the return value of the fork system call. For example, one branch of the IF represented parent executed code; the other branch represented the child executed code. OpenMP can do this partitioning of work among the available threads automatically. A work-sharing construct of OpenMP divides the execution of an enclosed code region (typically a loop) among the members of the team; in other words: OpenMP splits the work between the threads. The OpenMP construct that does this for loop constructs is the keyword for appended onto the previously seen parallel pragma: #pragma omp parallel for Inserting this statement before a C for loop will distribute the iterations of the for loop among the threads. Enter this program to trace the 16 iterations of the loop and observe how the iterations are distributed among the threads as a function of the number of threads. Run the program for nthreads=2, 4, 8, 16 and 32. Explain what happens. What happens if the number of iterations in the for loop is not evenly divisible by the number of threads? For example, explain what happens in the program below if nthreads were set to 5 or 7, instead of to 2 as currently shown. [10 pts] #include #include int main() { int i; omp_set_num_threads(2); #pragma omp parallel for for (i=0; i<16; i++) { printf(Hello from thread number: %d Iteration: %d \n, omp_get_thread_num(), i); } printf(\n GoodBye Team Destroyed Exiting Program \n\n); } Notice that the iterations are split into contiguous chunks (rather than round robin fashion), and each thread gets one chunk of iterations. By default, OpenMP splits the iterations of a loop into chunks of equal (or roughly equal) size, assigns each chunk to a thread, and lets each thread loop through its subset of the iterations. So, for example, given 4 threads and 12 iterations, each thread gets three iterations: T0: Iterations 0,1,2 T1: Iterations 3,4,5 T2: Iterations 6,7,8 T3: Iterations 9,10,11 5) Although OpenMP makes most variables shared by default and visible to all threads, this is not always what is needed to provide proper and correct behavior of the parallel version of the code. The code segment below produces the sum of integers from 0 up to (COUNT-1). When run in sequential mode, it produces the correct answer. However, when parallelized incorrectly as shown below, non-deterministic errors can occur. The errors are non-deterministic because they do not always happen they only occur under certain conditions the erroneous answers are different from run to run and there is no apparent error or warning message, so it appears that everything is fine. Run the code sequentially first for the three values of COUNT to produce the correct answer. Then run it for the following values of NTHREADS and COUNT. Suggested values for NTHREADS include 2, 4, 8, 32, and 64. Suggested values for COUNT include 10, 100, and 1000. Create a table similar to the one below, making at least ten runs at each of the 15 data points. For each of the 15 data points, estimate the probability of an incorrect answer. If you do exactly ten runs at each data point, then this probability will be easy to report. (You can optionally do a larger number runs at each data point to get a better statistical average) Why are some answers incorrect? What is causing the non-deterministic errors? What is the probability of obtaining an incorrect answer as a function of NTHREADS and COUNT? Is there a trend in the probability and what intuitive explanation might there be for it? Optionally, repeat the experiment on a single/dual/quad core machine, if you have access to one. [25 pts] Percent of Incorrect Answers Generated by Flawed Parallel OpenMP Program COUNTNTHREADS=2NTHREADS=4NTHREADS=8NTHREADS=32NTHREADS=64101001000 #include #include int main() { int i; int sum=0; omp_set_num_threads(NTHREADS); #pragma omp parallel for for (i=0; i #include int main() { int i; int sum=0; omp_set_num_threads(NTHREADS); #pragma omp parallel for reduction(+:sum) for (i=0; i #include #include long long num_steps = 1000000000; double step; int main(int argc, char* argv[]) { double x, pi, sum=0.0; int i; step = 1./(double)num_steps; for (i=0; i 32, the number of physical cores, and NHREADS > 64, the number of logical cores in MTL. Explain any observations. Optionally, repeat the experiment on single/dual/quad core machine(s), if you have access to these alternate hardware platforms. [25 pts]     "*# + 2 3 k & D m v   1 ;  ! 2 O ͷ¬¡¡¡¬‹uj_h*6OJQJ^JhT6OJQJ^Jh^ X6OJQJ^Jhxj6OJQJ^Jh#~6OJQJ^JhH6OJQJ^Jh6OJQJ^Jh`J_6OJQJ^JhZ 6OJQJ^Jh:u6OJQJ^Jho56OJQJ^J hZ ho5CJOJQJ^JaJho55OJQJ^Jh75OJQJ^J%c "23 37M $1$a$gd@$8h1$^8`ha$gd@ $1$a$gdYXe $1$a$gd,y $1$a$gd0 $1$a$gdo5  D F p ~ !'KVk.ȽȲȲȲ{pepeZh`J_6OJQJ^Jh^ X6OJQJ^Jhqe6OJQJ^Jh~6OJQJ^JhM=/6OJQJ^Jho56OJQJ^Jh:u6OJQJ^Jhxj6OJQJ^Jh 6OJQJ^Jh6OJQJ^Jh76OJQJ^JhT6OJQJ^Jh#~6OJQJ^Jh*6OJQJ^Jhnz6OJQJ^J#123457>XamsQ#´{{mmcYOYhL^OJQJ^Jh@OJQJ^JhOJQJ^JhYXehYXe6OJQJ^Jh OJQJ^JhYXeOJQJ^JhYXehYXeOJQJ^JhM=/OJQJ^JhTVghM=/5OJQJ^JhTVghYXe5OJQJ^J hTVgho5CJ OJQJ^JaJ ho56OJQJ^Jhnz6OJQJ^Jh^ X6OJQJ^Jh`J_6OJQJ^J 1238lt:;锊veXNhTVgOJQJ^JhTVghKOJQJ^J hKhKCJOJQJ^JaJh9OJQJ^JhnzOJQJ^JhZ OJQJ^Jh/OJQJ^JhKOJQJ^JhTVg6OJQJ^JhTVgh@6OJQJ^JhTVgh@OJQJ^J hZ h@CJ OJQJ^JaJ hL^OJQJ^Jh@OJQJ^JhYXeh@OJQJ^J;IJPQip*7:hqȽ{j```V`Vhw[OJQJ^JhOJQJ^J hhCJOJQJ^JaJhxjOJQJ^Jh!OJQJ^JhaOJQJ^JhTVgha5OJQJ^Jha5OJQJ^Jhnz5OJQJ^JhC5OJQJ^Jh`J_haOJQJ^Jh18yOJQJ^JhKOJQJ^JhTVgOJQJ^JhTVghKOJQJ^JMQ26LPQ $1$a$gdw[ $1$a$gda $1$a$gdYXe $1$a$gd@9:HIO "#$%&~%DQR˷ՙobhYXehw[OJQJ^J h8reh8reCJOJQJ^JaJhTVgh8re5OJQJ^Jh8re5OJQJ^Jh7OJQJ^Jh) OJQJ^Jh9OJQJ^JhnzOJQJ^Jh8reOJQJ^Jhw[OJQJ^Jh OJQJ^JhxjOJQJ^JhaOJQJ^JhTVghaOJQJ^J%!"#$~ef$%9KWZt $1$a$gd8re $1$a$gdYXe $1$a$gd) Ref$%8JYs #$)*+,-Ǻђ~ti[hTVghzsi5OJQJ^Jhzsi5OJQJ^JhCOJQJ^Jh9OJQJ^JhnzOJQJ^Jh_OJQJ^Jh/OJQJ^JhhkOJQJ^Jh8reOJQJ^JhTVgh8reOJQJ^Jh OJQJ^Jhz OJQJ^Jh8reOJQJ^J h/h/CJOJQJ^JaJh/OJQJ^J *+ ""T$U$i${$$$$$$$$,%W%^%%%%I'J' $1$a$gdL, $1$a$gd0 $1$a$gdYXe $1$a$gdz -L !!=!B!C!L!!"""9"Y"#7#@#B#I###$1$3$6$9$I$M$S$T$U$ċwmwmwcmh_OJQJ^Jh5OJQJ^Jh9OJQJ^JhGOJQJ^JheAnOJQJ^JhlWOJQJ^J h uzhL,CJOJQJ^JaJhYXeOJQJ^JhL,OJQJ^JhPOJQJ^JhYXehYXeOJQJ^J h uzhzsiCJOJQJ^JaJhe{OJQJ^JhzsiOJQJ^J&U$h$z$$$$$$$$$% %%%%'%)%*%T%Y%\%]%h%j%%%%%%%''I'J'K'^'c'w'y''''''鷭vhM5OJQJ^J hchXCJ OJQJ^JaJ hPhPCJ OJQJ^JaJ hPOJQJ^Jhe{OJQJ^JhM=/OJQJ^JhpyOJQJ^Jhp%&OJQJ^Jh;OJQJ^JhlOJQJ^JhL,OJQJ^JhTVghL,OJQJ^J,J'''S*T*N.O........ $$1$Ifa$gd E $1$a$gd0X $1$a$gd0 $h1$`ha$gdP '''''1(=(l((()')G)K))))))S*T**"+&+W+[+b+k+q++++++++++++++++, ,(,1,E,P,Z,,,ݿɿɿɤݚݐݚݐɆɆɵh OJQJ^Jh0XOJQJ^JhPOJQJ^J hPhPCJOJQJ^JaJh{7OJQJ^JhMeOJQJ^Jh6vOJQJ^Jh uzOJQJ^Jh'nOJQJ^JhXOJQJ^JhTVghX5OJQJ^J4,,,,)-0-D-N-O-Z-[-a-----'.B.E.G.H.M.N.O........ /ĺΰ⦗wj`jShTVghXOJQJ^JhXOJQJ^Jh EhXOJQJ^Jh0Xh 56OJQJ^Jh0Xh0X56OJQJ^JhNh0XCJOJQJ^JhXOJQJ^JhnzOJQJ^JhAOJQJ^JhcOJQJ^Jh\$OJQJ^Jh'nOJQJ^Jh9OJQJ^JhPOJQJ^Jh OJQJ^J.....7))) $$1$Ifa$gd Ekd$$Iflֈ G 7(PPPPP t0(44 laytX.....)kd$$Iflֈ G 7(PPPPP t0(44 laytX $$1$Ifa$gd E....... $$1$Ifa$gd E.....7))) $$1$Ifa$gd Ekd$$Iflֈ G 7(PPPPP t0(44 laytX.....)kd$$Iflֈ G 7(PPPPP t0(44 laytX $$1$Ifa$gd E.. //*/-/9/F/g///////060x0{0|02223334 $1$a$gd'n $1$a$gd0 $1$a$gdX $ ]1$a$gd  //,/F/d/f/}/////////////////,0104050x0z0{0|0}0~022B2N2q22222222r hhhXCJ OJQJ^JaJ hhOJQJ^JhXOJQJ^JhNOJQJ^JhTVghX5OJQJ^JhX5OJQJ^J hhhXCJ$OJQJ^JaJ$hOJQJ^Jh,OJQJ^Jh'nOJQJ^JhTVghXOJQJ^JhXOJQJ^J,22l333333333344Y4p444444444444505558595|5~5555ɿ管vvk`h6M'5OJQJ^Jh75OJQJ^JhOJQJ^JhTVghOJQJ^Jh,OJQJ^Jh'nOJQJ^JhTVgh'nOJQJ^J hhh'nCJOJQJ^JaJh5OJQJ^Jh\$OJQJ^JhOJQJ^Jh,OJQJ^Jh'nOJQJ^JhTVgh'n5OJQJ^J$44 4,494Z44444535:5|555666,6>6?6a6n6o666 $1$a$gdnz $1$a$gd $1$a$gd'n5555556666+666]7^7g7h7i7j7k7l7u7}7788888C8H8i8|88888888888ɼɝݝɝɝݓݓ݉uhcOJQJ^Jh_OJQJ^JhdOJQJ^Jht+OJQJ^Jh6M'OJQJ^Jhnz6OJQJ^JhnzOJQJ^Jh6M'hnzOJQJ^Jh7OJQJ^JhWMOJQJ^Jh\2;OJQJ^JhnzOJQJ^JhTVgh6M'5OJQJ^J*666666667$7'7)797:7g7h7j7k7l7888?9@9Z9 $h1$`ha$gdc $1$a$gdd $1$a$gd'n $1$a$gd\2; $1$a$gdnz888888888888@9Y999999:0::::; ;;(;F;;;;;;Q<y<<<</=l=m=o=p=r=w=x=Ҿܧܧܧܧܧҝҝҝ҉ғh_OJQJ^JhAOJQJ^Jh-OJQJ^Jht+OJQJ^JhWMOJQJ^Jh6M'hcOJQJ^JhcOJQJ^JhnzOJQJ^JhcOJQJ^Jh\2;OJQJ^JhTVgh\2;5OJQJ^Jh\2;5OJQJ^J.Z9999x=y={=|=~======== $1$a$gd'n $1$a$gd\2; $h1$`ha$gdcx=y=z=|=}=======h6?Ajh6?AUh6M'OJQJ^J 8P:p7/ =!8"8#@$% Dp$$If!vh55P5P5P5P5P#v#vP:V l t0(55P/ / / / ytX$$If!vh55P5P5P5P5P#v#vP:V l t0(55P/ / / / ytX$$If!vh55P5P5P5P5P#v#vP:V l t0(55P/ / ytX$$If!vh55P5P5P5P5P#v#vP:V l t0(55P/ / ytXf 666666666vvvvvvvvv666666>6666666666666666666666666666666666666666666666666hH6666666666666666666666666666666666666666666666666666666666666666662 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~OJQJ_HmH nH sH tH @`@ NormalCJ^J_HmH sH tH DA D Default Paragraph FontRi@R  Table Normal4 l4a (k (No List 4 4 Footer  !44 Header  !H&H Footnote Reference CJEHaJ:": Footnote TextCJ.)1. Page NumberRBR viewgraph textd5CJ$OJQJ^JjSj ` Table Grid7:V0PK![Content_Types].xmlj0Eжr(΢Iw},-j4 wP-t#bΙ{UTU^hd}㨫)*1P' ^W0)T9<l#$yi};~@(Hu* Dנz/0ǰ $ X3aZ,D0j~3߶b~i>3\`?/[G\!-Rk.sԻ..a濭?PK!֧6 _rels/.relsj0 }Q%v/C/}(h"O = C?hv=Ʌ%[xp{۵_Pѣ<1H0ORBdJE4b$q_6LR7`0̞O,En7Lib/SeеPK!kytheme/theme/themeManager.xml M @}w7c(EbˮCAǠҟ7՛K Y, e.|,H,lxɴIsQ}#Ր ֵ+!,^$j=GW)E+& 8PK!Ptheme/theme/theme1.xmlYOo6w toc'vuر-MniP@I}úama[إ4:lЯGRX^6؊>$ !)O^rC$y@/yH*񄴽)޵߻UDb`}"qۋJחX^)I`nEp)liV[]1M<OP6r=zgbIguSebORD۫qu gZo~ٺlAplxpT0+[}`jzAV2Fi@qv֬5\|ʜ̭NleXdsjcs7f W+Ն7`g ȘJj|h(KD- dXiJ؇(x$( :;˹! I_TS 1?E??ZBΪmU/?~xY'y5g&΋/ɋ>GMGeD3Vq%'#q$8K)fw9:ĵ x}rxwr:\TZaG*y8IjbRc|XŻǿI u3KGnD1NIBs RuK>V.EL+M2#'fi ~V vl{u8zH *:(W☕ ~JTe\O*tHGHY}KNP*ݾ˦TѼ9/#A7qZ$*c?qUnwN%Oi4 =3ڗP 1Pm \\9Mؓ2aD];Yt\[x]}Wr|]g- eW )6-rCSj id DЇAΜIqbJ#x꺃 6k#ASh&ʌt(Q%p%m&]caSl=X\P1Mh9MVdDAaVB[݈fJíP|8 քAV^f Hn- "d>znNJ ة>b&2vKyϼD:,AGm\nziÙ.uχYC6OMf3or$5NHT[XF64T,ќM0E)`#5XY`פ;%1U٥m;R>QD DcpU'&LE/pm%]8firS4d 7y\`JnίI R3U~7+׸#m qBiDi*L69mY&iHE=(K&N!V.KeLDĕ{D vEꦚdeNƟe(MN9ߜR6&3(a/DUz<{ˊYȳV)9Z[4^n5!J?Q3eBoCM m<.vpIYfZY_p[=al-Y}Nc͙ŋ4vfavl'SA8|*u{-ߟ0%M07%<ҍPK! ѐ'theme/theme/_rels/themeManager.xml.relsM 0wooӺ&݈Э5 6?$Q ,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-![Content_Types].xmlPK-!֧6 +_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!Ptheme/theme/theme1.xmlPK-! ѐ' theme/theme/_rels/themeManager.xml.relsPK] 5z  ;R-U$', /258x==!"#$&(*+-.568:<MJ'......46Z9= %'),/0123479;8@0(  B S  ?..%.*.6.<.D.H.I.R.o.r.x.{.|....................;/A/_/d/01@1S1[1a1b1e1222222y5{5|5~555555*1 :ASZ9@SZRVLOhpU\|LP`gZ^ Q!'"'2'5';'>'T'\'''''''(#(8(?( ****++,,%,(,.,1,G,O,u,,,,,,,,#-'-<-C-?.C.a.g.o.r......... / /*/,/;/B///00L1T1~11 2(23 333y5{5|5~55555533333333333333333333333333333333333333333333333333333333333333333333XY%E&....j/k/0000000000000011233333y444m5p5r55XY%E&....j/k/0000000000000011233333y444m5p5r55vucf/*) Gh5~u nzlX;!8 \$&p%&%1&6M'L,.M=/?U/c1o57{7\2;^<<A6?ACHK?L#QzR0X^ Xw[L^`J_GdYXe_e8re)gMgTVgzsi'neAnr:u6vfypy,y18y uz#~Ndz ~a_-hJNMlWqe&`WMWwPE;B,3([09K@ ExjkNTMeN/Te{hkyt+ Z YLX y5{5@5@UnknownG* Times New Roman5Symbol3. * Arial3* Times?= * Courier NewY New YorkTimes New Roman;. * HelveticaA BCambria Math AhSU&F&>~-a~-a$+xx2^5^53Q+HX?02!xxM 2 White, Kathy Oh+'0  0 < H T`hpxM 2White, KathyNormal 8Microsoft Office Word@J@gk@F@P' ~-՜.+,0 hp|   a^5 M 2 Title  !"#$%&'()*+,-./0123456789:;<=?@ABCDEGHIJKLMNOPQRSTUVWXZ[\]^_`bcdefghkloRoot Entry Fp47 nData >1TableF%WordDocument :zSummaryInformation(YDocumentSummaryInformation8aMsoDataStore7 /07 WPXC4BHILWOQ==27 /07 Item  PropertiesUCompObj y   F'Microsoft Office Word 97-2003 Document MSWordDocWord.Document.89q