While most would say to use C or C++ or Rust or C# or Java... I decided I wanted to look at the edges of Python performance and memory usage. Specifically, I set out to figure out the best approach for efficiently approximating C structs or classes that are more about properties than functionality.
I wanted to find what Python3 "class" or dictionary was most memory efficient but also fast for creating, updating and reading a single object. I chose to look at the following:
In the end I borrowed from these gists to create some Python code to test all of the above. Then I also found a "total size" function to estimate the size of the data structures in memory from this gist. Here's the code to measure and test Python's performance and memory usage.
The test involved the following:
Test | Python dict | Python class | Python class + slots | dataclass | recordclass | NamedTuple |
---|---|---|---|---|---|---|
creates / sec | 369,377 | 264,405 | 354,373 | 274,175 | 418,359 | 307,269 |
reads / sec | 17,076,394 | 24,402,513 | 25,380,031 | 21,810,119 | 28,528,798 | 24,930,480 |
sub-reads / sec | 8,383,577 | 7,880,326 | 10,508,616 | 7,355,073 | 8,597,880 | 6,196,159 |
read + calc / sec | 1,434,553 | 1,386,120 | 1,478,981 | 1,183,287 | 1,298,735 | 1,129,088 |
top-level change / sec | 6,586,636 | 5,969,859 | 8,098,519 | 6,318,340 | 7,849,210 | 849,501 |
class read / sec | 1,437,646 | 1,165,268 | 1,265,406 | 1,143,719 | 1,101,867 | 1,000,530 |
class update / sec | 10,171,955 | 4,178,176 | 5,626,841 | 4,727,680 | 4,761,006 | 787,321 |
bytes per entry (memory) | 658 | 170 | 178 | 170 | 154 | 346 |
Overall, performance was pretty high across the board. However creation of new objects was consistently slower across all approaches. No doubt this is due to the fact that memory has to be managed at some point. It was encouraging to see Python can generally manage millions of reads and updates per second in a single process/single thread. It's also pretty apparent that our default dictionary approach does indeed have some cost in terms of memory.
Test | Python dict | Python class | Python class + slots | dataclass | recordclass | NamedTuple |
---|---|---|---|---|---|---|
creates / sec | 0.88 | 0.63 | 0.85 | 0.66 | 1.00 | 0.73 |
reads / sec | 0.60 | 0.86 | 0.89 | 0.76 | 1.00 | 0.87 |
sub-reads / sec | 0.80 | 0.75 | 1.00 | 0.70 | 0.82 | 0.59 |
read + calc / sec | 0.97 | 0.94 | 1.00 | 0.80 | 0.88 | 0.76 |
top-level change / sec | 0.81 | 0.74 | 1.00 | 0.78 | 0.97 | 0.10 |
class read / sec | 1.00 | 0.81 | 0.88 | 0.80 | 0.77 | 0.70 |
class update / sec | 1.00 | 0.41 | 0.55 | 0.46 | 0.47 | 0.08 |
bytes per entry (memory) | 0.23 | 0.91 | 0.87 | 0.91 | 1.00 | 0.45 |
This view helps make a quick comparison of all options. Every row is normalized against the best entry in that row. Reading the first row, recordclass is the best in terms of creates per second while Python classes perform at 63% of recordclass's performance for that same creates per second metric.
Let's just go with the ranked list approach from best to worst: