Efficiently Manage Memory Usage in Pandas with Large Datasets

⁨55⁩ ⁨likes⁩

Submitted ⁨⁨6⁩ ⁨months⁩ ago⁩ by ⁨Aalonatech@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://geekpython.in/copy-on-write-in-pandas

source

Comments

Sort:hotnew top

Dremor@lemmy.world ⁨6⁩ ⁨months⁩ ago
This should probably be posted on a programing community.

source
phlegmy@sh.itjust.works ⁨6⁩ ⁨months⁩ ago
This could really do with an explanation for wtf ‘pandas’ is, and why this is relevant.

source
Nomecks@lemmy.ca ⁨6⁩ ⁨months⁩ ago
Is there a benefit to doing CoW with Pandas vs. offloading it to the storage? Practically all modern storage systems support CoW snaps. The pattern I’m used to (Infra, not big data) is to leverage storage APIs to offload storage operations from client systems.

source
- sem@lemmy.ml ⁨6⁩ ⁨months⁩ ago
  If you are doing data processing in pandas CoW allows to avoid of a lot of redundant computations on intermediate steps. Before CoW any data processing in Pandas required manual and careful working with code to avoid the case described in the blog post. To be honest I cannot imagine the case of offloading each result of each operation in the pipeline to the storage…
  
  source
  - Nomecks@lemmy.ca ⁨6⁩ ⁨months⁩ ago
    So you would be using CoW in-memory in this case?
    
    source
    -> View More Comments
- LunarLoony@lemmy.sdf.org ⁨6⁩ ⁨months⁩ ago
  I’m confused by all this talk of black-and-white animals. Can we instead use a Zebra node and put it behind a TuxedoCat cluster? I’ve also heard good things about barred-knifejaw as a data warehouse.
  
  (Genuine question: what are Pandas and Cows in this context?)
  
  source
- sem@lemmy.ml ⁨6⁩ ⁨months⁩ ago
  If you are doing data processing in pandas CoW allows to avoid of a lot of redundant computations on intermediate steps. Before CoW any data processing in Pandas required manual and careful working with code to avoid the case described in the blog post. To be honest I cannot imagine the case of offloading each result of each operation in the pipeline to the storage…
  
  source