Commit e06b185
authored
GEMM: bump MR to min(16,M) for skinny-N (n<=16) BF16 and F32 shapes (#524)
The default ZMM DE returns mr=6, nr=64, so for n<=16 shapes the JIT only reaches the lt16-mask kernel (bReg=1) and 6 of the 32 ZMMs hold C accumulators while the other ~25 sit idle.
Overriding mr to min(16, m) lets each cached B line be consumed by up to 16 rows of A instead of 6, recovering the otherwise-wasted register file.
Change:
* gemmBF16DEBackend / gemmF32DEBackend (ZMM fast path):
For n<=16 and m>0, set mr = min(16, m). nr stays at nr_hint so
the existing NR=64 packed-B layout, N-direction blocking, and
rsB-divisor math are reused unchanged. F32 is additionally
gated on !invokeRD and kc != 1.
Guards added so the bumped-MR path doesn't break the rest of the kernel set:
* New `skinnyN` flag on kernel_frame::kernelInfo (threaded through
the ctors, copy/move, operator== and
gemmDEBackendUtils::checkPostOpsAndCreateKernelInfo). Set true
only at the two ZMM override sites, when the MR bump actually
fires; false everywhere else.
* jitAmdZenFP32 / jitAmdZenBF16 generateAllKernels honor skinnyN by
skipping nr>=2. Those wider NR variants (lt32 / 32 / lt48 / 48 /
lt64 / 64) are unreachable for n<=16 and exceed the 32-ZMM budget
at MR=16 (especially with post-ops and column-major beta scaling),
so generating them only produced badKernelInfo aborts that took
down the whole kernel set.
The default (n>16) path is untouched: skinnyN stays false and every
NR variant continues to be generated and dispatched as before.
[ AMD-Internal - SWLCSG-4250 ]1 parent 74536a8 commit e06b185
4 files changed
Lines changed: 187 additions & 13 deletions
File tree
- src
- include
- decision_engine
- kernel_frame
- jit/amdzen
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
| 61 | + | |
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| |||
387 | 388 | | |
388 | 389 | | |
389 | 390 | | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
390 | 423 | | |
391 | 424 | | |
392 | 425 | | |
393 | | - | |
394 | | - | |
| 426 | + | |
| 427 | + | |
395 | 428 | | |
396 | 429 | | |
397 | 430 | | |
| |||
567 | 600 | | |
568 | 601 | | |
569 | 602 | | |
570 | | - | |
571 | | - | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
572 | 637 | | |
573 | 638 | | |
574 | 639 | | |
| |||
582 | 647 | | |
583 | 648 | | |
584 | 649 | | |
585 | | - | |
| 650 | + | |
586 | 651 | | |
587 | 652 | | |
588 | 653 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
156 | 156 | | |
157 | 157 | | |
158 | 158 | | |
159 | | - | |
| 159 | + | |
| 160 | + | |
160 | 161 | | |
161 | 162 | | |
162 | 163 | | |
| |||
184 | 185 | | |
185 | 186 | | |
186 | 187 | | |
187 | | - | |
| 188 | + | |
| 189 | + | |
188 | 190 | | |
189 | 191 | | |
190 | 192 | | |
| |||
202 | 204 | | |
203 | 205 | | |
204 | 206 | | |
205 | | - | |
| 207 | + | |
| 208 | + | |
206 | 209 | | |
207 | 210 | | |
208 | 211 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
304 | 304 | | |
305 | 305 | | |
306 | 306 | | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
307 | 313 | | |
308 | 314 | | |
309 | 315 | | |
| |||
323 | 329 | | |
324 | 330 | | |
325 | 331 | | |
| 332 | + | |
326 | 333 | | |
327 | 334 | | |
328 | 335 | | |
| |||
342 | 349 | | |
343 | 350 | | |
344 | 351 | | |
345 | | - | |
| 352 | + | |
| 353 | + | |
346 | 354 | | |
347 | 355 | | |
348 | 356 | | |
| |||
363 | 371 | | |
364 | 372 | | |
365 | 373 | | |
| 374 | + | |
366 | 375 | | |
367 | 376 | | |
368 | 377 | | |
| |||
386 | 395 | | |
387 | 396 | | |
388 | 397 | | |
| 398 | + | |
389 | 399 | | |
390 | 400 | | |
391 | 401 | | |
| |||
419 | 429 | | |
420 | 430 | | |
421 | 431 | | |
| 432 | + | |
422 | 433 | | |
423 | 434 | | |
424 | 435 | | |
| |||
448 | 459 | | |
449 | 460 | | |
450 | 461 | | |
| 462 | + | |
451 | 463 | | |
452 | 464 | | |
453 | 465 | | |
| |||
485 | 497 | | |
486 | 498 | | |
487 | 499 | | |
| 500 | + | |
488 | 501 | | |
489 | 502 | | |
490 | 503 | | |
| |||
517 | 530 | | |
518 | 531 | | |
519 | 532 | | |
| 533 | + | |
520 | 534 | | |
521 | 535 | | |
522 | 536 | | |
| |||
552 | 566 | | |
553 | 567 | | |
554 | 568 | | |
555 | | - | |
| 569 | + | |
| 570 | + | |
556 | 571 | | |
557 | 572 | | |
558 | 573 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
602 | 602 | | |
603 | 603 | | |
604 | 604 | | |
605 | | - | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
606 | 621 | | |
607 | 622 | | |
608 | 623 | | |
| |||
611 | 626 | | |
612 | 627 | | |
613 | 628 | | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
614 | 665 | | |
615 | 666 | | |
616 | 667 | | |
| |||
1414 | 1465 | | |
1415 | 1466 | | |
1416 | 1467 | | |
1417 | | - | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
| 1476 | + | |
| 1477 | + | |
| 1478 | + | |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
1418 | 1484 | | |
1419 | 1485 | | |
1420 | 1486 | | |
1421 | 1487 | | |
1422 | 1488 | | |
1423 | 1489 | | |
1424 | 1490 | | |
| 1491 | + | |
| 1492 | + | |
| 1493 | + | |
| 1494 | + | |
| 1495 | + | |
| 1496 | + | |
| 1497 | + | |
| 1498 | + | |
| 1499 | + | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
| 1507 | + | |
| 1508 | + | |
| 1509 | + | |
| 1510 | + | |
| 1511 | + | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
| 1515 | + | |
1425 | 1516 | | |
1426 | 1517 | | |
1427 | 1518 | | |
| |||
0 commit comments