take6056@feddit.nl 11 months ago
I would look into a library that does manipulation of odt (or docx). Code whatever algorithm you need to do the restructuring. Now your left with an in memory representation of the document that you can hopefully figure out how many pages it spans, or save it to a temporary file.
All depends really on how feature rich the odt libraries are and/or how deep you want to dive into the spec.
I feel like this is an XY problem. Is there an underlying issue your trying to resolve?
Red1C3@lemmy.world 11 months ago
Yeah my main is issue is trying to figure out how many pages it spans, I’ve looked at some docx and odt libs, none did seem to have an API related to getting the number of pages nor the height of some component (except for stuff with fixed heights like images…).
The underlying issue is that I want to create an exam paper with the least papers possible per exam, so I guess that at least I should be able to get the height of each question of the exam and rearrange them (using an algorithm) in a fashion that uses less papers.
lupec@lemm.ee 11 months ago
Is using something like typst to generate your exams an option? There’d be a learning curve but it’s full of utilities to format and arrange content and whatnot so it feels like it’d be a less hacky way of achieving what you want. Plus, it’d make iterating easier and give you more consistency over time going forward
Red1C3@lemmy.world 11 months ago
Not really no, I need something that I can embed into my application, rather than 3rd party software, my application must work offline too :/
ericjmorey@programming.dev 11 months ago
Use Google Apps Script to open the document in Google Docs, read the number of pages that Google Docs renders, closes the document, then delets the document (optional).
Red1C3@lemmy.world 11 months ago
I need to automate the process to use it during an algorithm, this is far from practical.
ericjmorey@programming.dev 11 months ago
My suggestion was to automate the process using Google Apps Script to automate the process using an algorithm. You’ve not give a lot of details about what you actually want to do but for what you did give, Google Apps Script would let you automate the task.
Turun@feddit.de 11 months ago
How about generating latex source code, compiling it and getting the page count of the generated PDF? Reorder your set of questions and see if the result is better or worse. Optionally do it in a smart way to reduce the number of PDF compilations you have to do. (Simulated annealing comes to mind for example.)
I think it would be easier to find a library to find the last line on a PDF page than it is to parse unzipped odt files and basically write a layout engine that does the same as libre office just to get the number of pages.
Maybe you can even get Tex to put it in the log during compilation. That would be the most convenient option.