Context-Switching Weirdness

Updated January 31, 2004

© Copyright 2004 by CyberLife Labs, LLC
All rights reserved



This page exists to document a minor research project whose purpose is to understand exactly why two very similar computers are exhibiting such wildly different context-switching performance characteristics.



Background

CyberLife Labs uses FreeBSD almost exclusively for its scientific, production and administrative needs. One machine in particular (appserver.geo) has always been known to run very fast, but it was never given much thought until recently. As an exercise, the BYTE Benchmark suite was installed and run on that unit along with another, nearly identical but slightly faster machine (beastie.lab). Most of the results were as expected, with beastie.lab edging out appserver.geo by a narrow margin. However, the context-switching results clocked appserver.geo at four-times the rate of it's slightly faster cousin. Considering the types of jobs these two machines typically run (very process-switching oriented), that difference explained why appserver.geo always seemed to get things done quicker. The only question was why.



System Configurations

The following table lists the configuration details of the two systems:

Component

appserver.geo

beastie.lab

Motherboard

ASUS A7V333

ASUS A7V333

CPU

AMD Athlon XP 2100+ (1.73 Ghz)

AMD Athlon XP 2200+ (1.8 Ghz)

RAM

PC2100 1024MB

PC2100 512MB

Chipset

VIA KT333

VIA KT333

ATA Controller

VIA 8233 ATA133

VIA 8233 ATA133

System Drive

Western Digital UDMA100 100GB

Western Digital UDMA33 30GB

Secondary Drive

----

Western Digital UDMA66 10GB

CD-ROM Drive

----

Creative WDMA2 52X

Operating System

FreeBSD 4.9-RELEASE

FreeBSD 4.9-RELEASE

Both systems were run with the exact same kernel configuration, compiled with the same compiler settings.



BYTE Benchmark Results

The following table lists the raw figures output from the BYTE Benchmark suite:

Test

appserver.geo

beastie.lab

Dhrystone 2 without register variables

3880692.0

4011065.9

Dhrystone 2 using register variables

3860733.5

4009818.1

Arithmetic Test (type = arithoh)

7851836.7

8163230.7

Arithmetic Test (type = register)

356915.5

370548.3

Arithmetic Test (type = short)

342698.9

355824.8

Arithmetic Test (type = int)

356759.3

370485.0

Arithmetic Test (type = long)

350642.9

370560.9

Arithmetic Test (type = float)

794357.5

826194.1

Arithmetic Test (type = double)

794815.8

826337.9

System Call Overhead Test

755378.3

740416.6

Pipe Throughput Test

899956.5

985937.3

Pipe-based Context Switching Test

340616.2

89968.4

Process Creation Test

9790.1

8763.2

Execl Throughput Test

1338.0

276.6

File Read (10 seconds)

1783104.0

1851455.0

File Write (10 seconds)

40943.0

11400.0

File Copy (10 seconds)

36109.0

11050.0

File Read (30 seconds)

1797987.0

1838905.0

File Write (30 seconds)

38704.0

11333.0

File Copy (30 seconds)

36126.0

10130.0

C Compiler Test

2445.4

333.7

Shell scripts (1 concurrent)

3522.0

337.0

Shell scripts (2 concurrent)

1763.3

168.0

Shell scripts (4 concurrent)

889.0

84.0

Shell scripts (8 concurrent)

444.7

40.0

Dc: sqrt(2) to 99 decimal places

290604.9

118030.2

Recursion Test--Tower of Hanoi

56435.2

58651.8

All tests were run in single-user mode to eliminate any chance of interference from other processes. The first nine tests all measure CPU performance. As is expected, beastie.lab performed about 4% faster than appserver.geo. The pipe-throughput, file-read and recursion tests are highly CPU dependent and thus show a similar 4% difference. The file-write and file-copy tests rate significantly higher on appserver.geo, but this is due to the use of an ATA100 disk versus the ATA33 on beastie.lab. (This was verified by forcing appserver.geo down to ATA33 and re-running the benchmark, at which point the file-write and file-copy tests were identical between the systems.) All other tests are highly dependent on context-switching and thus rate much higher on appserver.geo.

So why does appserver.geo task-switch so much better? It's not a CPU issue as the benchmarks correctly show beastie.lab to be the faster system. It's not a disk issue since, again, the benchmarks show beastie.lab to be faster (when both run UDMA33). Even if there were a large difference in disk performance, the context-switching tests don't use the disk subsystem at all. It's not a network issue since the NICs were completely shutdown. appserver.geo has twice the memory capacity of beastie.lab, but as with the disk subsystem, the context-switching tests are not memory-intensive. That leaves only memory and cache performance.



RAMSPEED Benchmark

To determine if memory or cache was affecting the results, Alasir's RAMSPEED benchmark was used to directly stress-test both subsystems. The following table shows the results (in KB/s):

Block Size
(in KB)

appserver.geo
(integer read)

appserver.geo
(integer write)

beastie.lab
(integer read)

beastie.lab
(integer write)

1

11650.8

7943.8

12192.7

8066.0

2

11915.6

7825.2

12483.1

8192.0

4

11650.8

7943.8

12192.7

8192.0

8

11915.6

7825.2

12192.7

8192.0

16

11915.6

7943.8

12483.1

8192.0

32

12192.7

7825.2

12787.5

8456.3

64

11915.6

7943.8

12483.1

8322.0

128

4519.7

4333.0

5349.9

4443.1

256

4481.1

4228.1

4559.0

4333.0

512

853.9

574.9

849.7

546.7

1024

852.5

576.1

841.6

547.9

2048

848.4

578.1

838.9

550.7

4096

844.3

578.1

834.9

548.4

8192

844.3

581.3

834.9

543.3

16384

844.3

581.9

833.5

545.6

Once again, beastie.lab shows an approximate 4% advantage over appserver.geo both with cache hits (block sizes <= 64KB) and cache misses (block sizes > 64KB), just as it should. This would suggest that neither memory nor cache are the culprit.



What's Next?

We've tested virtually everything we can think of and are still no closer to the answer than when we started. We figure the next step is to track down some of the FreeBSD kernel developers and find out exactly what affects context-switching performance. Hopefully that will give us an idea of where we should be looking.