Python Performance: Why 'if not list' is 2x Faster Than Using len()

⁨232⁩ ⁨likes⁩

Submitted ⁨⁨7⁩ ⁨months⁩ ago⁩ by ⁨abhi9u@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://blog.codingconfessions.com/p/python-performance-why-if-not-list

source

Comments

Sort:hotnew top

thebestaquaman@lemmy.world ⁨7⁩ ⁨months⁩ ago
I write a lot of Python. I hate it when people use “X is more pythonic” as some kind of argument for what is a better solution to a problem. I also have a hang up with people acting like python has any form of type safety, instead of just embracing duck typing.This lands us at the following:

The article states that “you can check a list for emptiness in two ways: if not mylist or if len(mylist) == 0”. Already here, a fundamental mistake has been made: You don’t know (and shouldn’t care) whether mylist is a list. These two checks are not different ways of doing the same thing, but two different checks altogether. The first checks whether the object is “falsey” and the second checks whether the object has a well defined length that is zero. These are two completely different checks, which often (but far from always) overlap. Embrace the duck type- type safe python is a myth.

source
- iAvicenna@lemmy.world ⁨7⁩ ⁨months⁩ ago
  isn’t the expected behaviour exactly identical on any object that has len defined:
  
  “By default, an object is considered true unless its class defines either a bool() method that returns False or a len() method that returns zero, when called with the object.”
  
  source
  - CompassRed@discuss.tchncs.de ⁨7⁩ ⁨months⁩ ago
    It’s not the same, and you kinda answered your own question with that quote. Consider what happens when an object defines both dunder bool and dunder len. It’s possible for dunder len to return 0 while dunder bool returns True, in which case the falsy-ness of the instance would not depend at all on the value of len
    
    source
  - thebestaquaman@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Exactly as you said yourself: Checking falsieness does not guarantee that the object has a length. There is considerable overlap between the two, and if it turns out that this check is a performance bottleneck (which I have a hard time imagining) it can be appropriate to check for falsieness instead of zero length. But in that case, don’t be surprised if you suddenly get an obscure bug because of some custom object not behaving the way you assumed it would.
    
    I guess my primary point is that we should be checking for what we actually care about, because that makes intent clear and reduces the chance for obscure bugs.
    
    source
- sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  
  type safe python is a myth
  
  Sure, but type hints provide a ton of value in documenting for your users what the code expects. I use type hints everywhere, and it’s fantastic! Yes, there’s no guarantee that the types are correct, but with static analysis and the assumption that your users want their code to work correctly, there’s a very high chance that the types are correct.
  
  That said, I lie about types all the time. For example, if my function accepts a class instance as an argument, the intention is that the code accept any class that implements the same methods as the one I’ve defined in the parameter list, and you don’t necessarily have to pass an instance of that class in (or one of its sub-classes). But I feel like putting something reasonable in there makes a lot more sense than nothing, and I can clarify in the docstring that I really just need something that looks like that object. One of these days I’ll get around to switching that to Protocol classes to reduce type errors.
  
  That said, I don’t type hint everything. A lot of private methods and private functions don’t have types, because they’re usually short and aren’t used outside the class/file anyway, so what’s the point?
  
  source
  - thebestaquaman@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Type hints are usually great, as long as they’re kept up to date and the IDE interprets them correctly. Recently I’ve had some problems with PyCharm acting up and insisting that matplotlib doesn’t accept numpy arrays, leading me to just disable the type checker altogether.
    
    All in all, I’m a bit divided on type hints, because I’m unsure whether I think the (huge) value added from correct type hints outweighs the frustration I’ve experienced from incorrect type hints. Per now I’m leaning towards “type hints are good, as long as you never blindly trust them and only treat them as a coarse indicator of what some dev thought at some point.”
    
    source
    -> View More Comments
PattyMcB@lemmy.world ⁨7⁩ ⁨months⁩ ago
I know I’m gonna get downvoted to oblivion for this, but… Serious question: why use Python if you’re concerned about performance?

source
- lengau@midwest.social ⁨7⁩ ⁨months⁩ ago
  It’s all about trade-offs. Here are a few reasons why one might care about performance in their Python code:
  
  Performance is often more tied to the code than to the interpreter - an O(n³) algorithm in blazing fast C won’t necessarily perform any better than an O(nlogn) algorithm in Python.
  
  Just because this particular Python code isn’t particularly performance constrained doesn’t mean you’re okay with it taking twice as long.
  
  Rewriting a large code base can be very expensive and error-prone. Converting small, very performance-sensitive parts of the code to a compiled language while keeping the bulk of the business logic in Python is often a much better value proposition.
  
  These are also performance benefits one can get essentially for free with linter rules.
  
  Anecdotally: in my final year of university I took a computational physics class. Many of my classmates wrote their simulations in C or C++. I would rotate between Matlab, Octave and Python. During one of our labs where we wrote particle simulations, I wrote and ran Octave and Python simulations in the time it took my classmates to write their C/C++ versions, and the two fastest simulations in the class were my Octave and Python ones, respectively. (The professor’s own sim came in third place). The overhead my classmates had dealing with poorly optimised code that caused constant cache misses was far greater than the interpreter overhead in my code (though at the time I don’t think I could have explained why their code was so slow compared to mine).
  
  source
  - PattyMcB@lemmy.world ⁨7⁩ ⁨months⁩ ago
    I appreciate the large amount of info. Great answer. It just doesn’t make sense to me, all things being equal (including performant algorithms), why choose Python and then make a small performance tweak like in the article? I understand preferring the faster implementation, but it seems to me like waxing your car to reduce wind resistance to make it go faster, when installing a turbo-charger would be much more effective.
    
    source
    -> View More Comments
  - uis@lemm.ee ⁨7⁩ ⁨months⁩ ago
    
    Performance is often more tied to the code than to the interpreter - an O(n³) algorithm in blazing fast C won’t necessarily perform any better than an O(nlogn) algorithm in Python.
    
    An O(n³) algorithm in Python won’t necessarily perform any better than an O(nlogn) algorithm in C. Ever heard of galactic algorithms?
    
    source
- JustAnotherKay@lemmy.world ⁨7⁩ ⁨months⁩ ago
  Honestly most people use Python because it has fantastic libraries. They optimize it because the language is middling, but the libraries are gorgeous
  
  source
  - ThirdConsul@lemmy.ml ⁨7⁩ ⁨months⁩ ago
    
    Honestly most people use Python because it has fantastic libraries
    
    In C++ if I remember correctly…
    
    source
    -> View More Comments
- pastermil@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  Because, while you don’t want to nitpick on each instruction cycle, sometimes the code runs millions of times and each microsecond adds up.
  
  Keep in mind that people use this kind of things for work, serving real world customers who are doing their work.
  
  Yes, the language itself is not optimal even by design, but its easy to work with, so they are making it worth a while. There’s no shortage of people who can work with it. It is easy to develop and maintain stuff with it, cutting development cost. Yes, we’re talking real businesses with real resource constraints.
  
  source
  - sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    Exactly. We picked it for the reasons you mentioned, and I still think it’s a good choice.
    
    That said, some of our heavier logic is in a lower-level language. We had some Fortran code until recently (rewrote in Python and just ate the perf cost to lower barrier to other devs fixing stuff), and we’re introducing some C++ code in the next month or two. But the bulk of our code is in Python, because that’s what glues everything together, and the code is fast enough for our needs.
    
    source
    -> View More Comments
- Takapapatapaka@lemmy.world ⁨7⁩ ⁨months⁩ ago
  You may want to beneficiate from little performance boost even though you mostly don’t need it and still need python’s advantages. Being interested in performance isnt always looking for the very best performance there is out of any language, it can also be using little tips to go a tiny bit faster when you can.
  
  source
- sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  Yes, Python is the wrong choice if performance is your top priority.
  
  But here’s another perspective: why leave easy performance wins on the table? Especially if the cost is simpler code that works as you probably wanted anyway with both None and []?
  
  Python is great if you want a really fast development cycle, because the code is generally quite simple and it’s “fast enough.” Any wins for “fast enough” is appreciated, because it delays me needing to actually look into little performance issues. It’s pretty easy for me to write a simple regex to fix this cose (s/if len\((\w+)\) == 0:/if not \1:/), and my codebase will be slightly faster. That’s awesome! I could even write up a quick pylint or ruff rule to catch these cases for developers going forward (if there isn’t one already).
  
  If I’m actively tweaking things in my Python code to get a little better performance, you’re right, I should probably just use something else (writing a native module is probably a better use of time). But the author isn’t arguing that you should do that, just that, in this case, if not foo is preferred over if len(foo) == 0 for technical reasons, and I’ll add that it makes a ton of sense for readability reasons as well.
  
  source
- jerkface@lemmy.ca ⁨7⁩ ⁨months⁩ ago
  Alternatively, why wait twice as long for your python code to execute as you have to?
  
  source
- Randelung@lemmy.world ⁨7⁩ ⁨months⁩ ago
  It comes down to the question “Is YOUR C++ code faster than Python?” (and of course the reverse).
  
  I’ve built a SCADA from scratch and performance requirements are low to begin with, seeing as it’s all network bound and real world objects take time to react, but I’m finding everything is very timely.
  
  A colleague used SQLAlchemy for a similar task and got abysmal performance. No wonder, it’s constantly querying the DB for single results.
  
  source
  - sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    Exactly!
    
    We rewrote some Fortran code (known for fast perf) into Python and the net result was faster. Why? They used bubble sort in a hot loop, whereas we used Python’s built-in sort (probably qsort or similar). So despite Python being “slower” on average, good architecture matters a lot more.
    
    And your Python code doesn’t have to be 100% Python, you can write performance-critical code in something else, like C++ or Rust. This is very common, and it’s why popular Python libraries like numpy and scipy are written in a more performant language with a Python wrapper.
    
    source
- Reptorian@lemmy.zip ⁨7⁩ ⁨months⁩ ago
  I have the same question. I prefer other languages. I use G’MIC for image processing over Python and C++.
  
  source
- WolfLink@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  I’ve worked on a library that’s Python because the users of said library are used to Python.
  
  The original version of the project made heavy use of numpy, so the actual performance sensitive code was effectively C++ and fourtran, which is what numpy is under the hood.
  
  We eventually replaced the performance sensitive part of the code with Rust (and still some fourtran because BLAS) which ended up being about 10x faster.
  
  source
sirber@lemmy.ca ⁨7⁩ ⁨months⁩ ago
How does Python know of it’s my list or not?

source
- 2xsaiko@discuss.tchncs.de ⁨7⁩ ⁨months⁩ ago
  Telemetry
  
  source
- JasonDJ@lemmy.zip ⁨7⁩ ⁨months⁩ ago
  if isinstance(mylist, list) and not mylist
  
  Problem solved.
  
  Or if not mylist # check if list is empty
  
  source
  - sirber@lemmy.ca ⁨7⁩ ⁨months⁩ ago
    I think you missed the joke 😅
    
    source
    -> View More Comments
  - gravitas_deficiency@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    You’re checking if mylist is falsey. Sometimes that’s the same as checking if it’s empty, if it’s actually a list, but that’s not guaranteed.
    
    source
    -> View More Comments
- jj4211@lemmy.world ⁨7⁩ ⁨months⁩ ago
  else: # not my list, it is ourlist
  
  source
- gargolito@lemm.ee ⁨7⁩ ⁨months⁩ ago
  Python likes giving lists.
  
  source
iAvicenna@lemmy.world ⁨7⁩ ⁨months⁩ ago
Yea and then you use “not” with a variable name that does not make it obvious that it is a list and another person who reads the code thinks it is a bool. Hell a couple of months later you yourself wont even understand that it is a list. You should not sacrifice code readability for over optimization, this is phyton after all I don’t think list lengths will be your bottle neck.

source
- jerkface@lemmy.ca ⁨7⁩ ⁨months⁩ ago
  Strongly disagree that not x implies to programmers that x is a bool.
  
  source
  - taladar@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    It does if you are used to sane languages instead of the implicit conversion nonsense C and the “dynamic” languages are doing
    
    source
  - iAvicenna@lemmy.world ⁨7⁩ ⁨months⁩ ago
    well it does not imply directly per se since you can “not” many things but I feel like my first assumption would be it is used in a bool context
    
    source
    -> View More Comments
  - acosmichippo@lemmy.world ⁨7⁩ ⁨months⁩ ago
    i haven’t programmed since college 15 years ago and even i know that 0 == false.
    
    source
  - jj4211@lemmy.world ⁨7⁩ ⁨months⁩ ago
    In context, one can consider it a bool.
    
    Besides, I see c code all the time that treats pointers as bool for the purposes of an if statement. !pointer is very common and no one thinks that means pointer it’s exclusively a Boolean concept.
    
    source
  - sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    Maybe, but that serves as a very valuable teaching opportunity about the concept of “empty” is in Python. It’s pretty intuitive IMO, and it can make a lot of things more clear once you understand that.
    
    That said, larger projects should be using type hints everywhere, and that should make the intention here painfully obvious:
    
    def do_work(foo: list | None): if not foo: ... handle empty list ... ...
    
    That’s obviously not a boolean, but it’s being treated as one. If the meaning there isn’t obvious, then look it up/ask someone about Python semantics.
    
    I’m generally not a fan of learning a ton of jargon/big frameworks to get the benefits of more productivity (e.g. many design patterns are a bit obtuse IMO), but learning language semantics that are used pretty much everywhere seems pretty reasonable to me. And it’s a lot nicer than doing something like this everywhere:
    
    if foo is None or len(foo) == 0:
    
    source
  - JustAnotherKay@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Doesn’t matter what it implies. The entire purpose of programming is to make it so a human doesn’t have to go do something manually.
    
    not x tells me I need to go manually check what type x is in Python.
    
    len(x) == 0 tells me that it’s being type-checked automatically
    
    source
    -> View More Comments
- sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  That’s why we use type-hinting at my company:
  
  def do_work(foo: list | None): if not foo: return ...
  
  Boom, self-documenting, faster, and very simple.
  
  source
  - LegoBrickOnFire@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Well, in your case it is not clear whether you intended to branch in the variable foo being None, or on the list being empty which is semantically very different…
    
    Thats why it’s better to explicitelly express whether you want an empty collection (len = 0) or a None value.
    
    source
    -> View More Comments
- acosmichippo@lemmy.world ⁨7⁩ ⁨months⁩ ago
  if you’re worried about readability you can leave a comment.
  
  source
  - chunkystyles@sopuli.xyz ⁨7⁩ ⁨months⁩ ago
    Comments shouldn’t explain code. Code should explain code by being readable.
    
    Comments are for whys. Why is the code doing the things it’s doing. Why is the code doing this strange thing here. Why does a thing need to be in this order. Why do I need to store this value here.
    
    Stuff like that.
    
    source
  - thebestaquaman@lemmy.world ⁨7⁩ ⁨months⁩ ago
    There is no guarantee that the comment is kept up to date with the code. “Self documenting code” is a meme, but clearly written code is pretty much always preferable to unclear code with a comment, largely because you can actually be sure that the code does what it says it does.
    
    Note: You still need to comment your code kids.
    
    source
  - sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    Better yet, a type hint. list | None can be checked by static analysis, # foo is a list isn’t.
    
    source
  - iAvicenna@lemmy.world ⁨7⁩ ⁨months⁩ ago
    If there is an alternative through which I can achieve the same intended effect and is a bit more safer (because it will verify that it has len implemented) I would prefer that to commenting. Also if I have to comment every len use of not that sounds quite redundant as len checks are very common
    
    source
- LegoBrickOnFire@lemmy.world ⁨7⁩ ⁨months⁩ ago
  I really dislike using boolean operators on anything that is not a boolean. I recently made an esception to my rule and got punished… Yeah it is skill issue on my part that I tried to check that a variable equal to 0 was not None using “if variable…”. But many programming rules are there to avoid bugs caused by this kind of inattention.
  
  source
- Artyom@lemm.ee ⁨7⁩ ⁨months⁩ ago
  In my experience, if you didn’t write the function that creates the list, there’s a solid chance it could be None too, and if you try to check the length of None, you get an error. This is also why returning None when a function fails is bad practice IMO, but that doesn’t seem to stop my coworkers.
  
  source
  - LegoBrickOnFire@lemmy.world ⁨7⁩ ⁨months⁩ ago
    Passing None to a function expecting a list is the error…
    
    source
  - iAvicenna@lemmy.world ⁨7⁩ ⁨months⁩ ago
    good point I try to initialize None collections to empty collections in the beginning but not guaranteed and len would catch it
    
    source
    -> View More Comments
Opisek@lemmy.world ⁨7⁩ ⁨months⁩ ago
The graph makes no sense. Did a generative AI make it.

source
- sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  I think there’s a good chance of that:
  
  -2x instead of ~2x - a human is unlikely to make that mistake
  
  no space here: ==0 - there’s a space every other time it’s done, including the screenshot
  
  the numbers are wrong - the screenshot has different data than the image
  
  why are there three bars? A naive approach would have two.
  
  source
- gerryflap@feddit.nl ⁨7⁩ ⁨months⁩ ago
  Looks like it. It’s a complete fever dream graph. I really don’t get how someone can use an image like that. Personally I don’t really like AI art anyways, but I could somewhat understand it as a sort of “filler” image to make your article a bit more interesting. But a graph that is supposed to convey actual information? No idea why anyone would AI gen that without checking
  
  source
- pyre@lemmy.world ⁨7⁩ ⁨months⁩ ago
  yeah I got angry just looking at it
  
  source
- iknowitwheniseeit@lemmynsfw.com ⁨7⁩ ⁨months⁩ ago
  My ad blocker has blocked all pictures on this article, so I can’t say. 😄
  
  source
  - nutsack@lemmy.dbzer0.com ⁨7⁩ ⁨months⁩ ago
    thanks I appreciate it
    
    source
uis@lemm.ee ⁨7⁩ ⁨months⁩ ago
There are decades of articles on c++ optimizations, that say “use empty() instead of size()”, which is same as here.

source
- dreugeworst@lemmy.ml ⁨7⁩ ⁨months⁩ ago
  except for c++ it was just to avoid a single function call, not extra indirection. also on modern compilers size() will get inlined and ultimate instructions generated by the compiler will likely be the same
  
  source

knighthawk0811@lemmy.ml ⁨7⁩ ⁨months⁩ ago

so these are the only 2 ways then? huge if true

source

sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago

Oh, there are plenty of other terrible ways:

for _ in mylist:
    break
else:
    # whatever you'd do if mylist was empty

if not any(True for _ in mylist):

try:
    def do_raise(): raise ValueError

    _ = [do_raise() for _ in mylist]
except ValueError:
    pass
else:
    # whatever you'd do i mylist was empty

I could probably come up with a few others as well.

source

antlion@lemmy.dbzer0.com ⁨7⁩ ⁨months⁩ ago
Could also compare against:

if not len(mylist)

That way this version isn’t evaluating two functions. The bool evaluation of an integer is false when zero, otherwise true.
source
- FooBarrington@lemmy.world ⁨7⁩ ⁨months⁩ ago
  This is honestly the worst version regarding readability. Don’t rely on implicit coercion, people.
  
  source
  - antlion@lemmy.dbzer0.com ⁨7⁩ ⁨months⁩ ago
    But the first example does the same thing for an empty list. I guess the lesson is that if you’re measuring the speed of arbitrary stylistic syntax choices, maybe Python isn’t the best language for you.
    
    source
    -> View More Comments
- sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  That’s worse. IMO, solve this problem with two things:
  
  type hint mylist as list | None or just list
  
  use if not mylist:
  
  The first documents intent and gives you static analysis tools some context to check for type consistency/compatibility, and the second shows that None vs empty isn’t an important distinction here.
  
  source
ne0n@lemmy.world ⁨7⁩ ⁨months⁩ ago
Isn’t “-2x faster” 2x slower?

source
- ChaoticNeutralCzech@feddit.org ⁨7⁩ ⁨months⁩ ago
  That woulb be 0.5x. −2x implies negative duration, which makes no sense. Neither does the layout of anything else in the image.
  
  source
- sugar_in_your_tea@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
  I think it was supposed to be a ~, since they use that in the paragraph below the image.
  
  source
  - phoenixz@lemmy.ca ⁨7⁩ ⁨months⁩ ago
    Blame AI
    
    source
    -> View More Comments
- Randelung@lemmy.world ⁨7⁩ ⁨months⁩ ago
  Maybe they mean up to?
  
  source
gigachad@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
I don’t like it very much, my variable could also be None here

source
Harvey656@lemmy.world ⁨7⁩ ⁨months⁩ ago
I could have tripped, knocked over my keyboard, cried for 13 straight minutes on the floor, picked my keyboard back up, accidentally hit the enter key making a graph and it would have made more sense than this thing.

-2x faster. What does that even mean?

source
AnUnusualRelic@lemmy.world ⁨7⁩ ⁨months⁩ ago
From that little image, they’re happy it takes a tenth of a second to check if a list is empty?

What kind of dorito chip is that code even running on?

source
Archr@lemmy.world ⁨7⁩ ⁨months⁩ ago
I haven’t read the article. But I’d assume this is for the same reason that not not string is faster than bool(string). Which is to say that it has to do with having to look up a global function rather than a known keyword.

source
borokov@lemmy.world ⁨7⁩ ⁨months⁩ ago
Isn’t it because list is linked list, so to get the Len it has to iterate over the whole list whereas to get emptyness it just have to check if there is a 1st element ?

I’ too lazy to read the article BTW.

source