Comment

Comment on I'm Starting A Search Engine For The Fediverse

scrubbles@poptalk.scrubbles.tech ⁨1⁩ ⁨year⁩ ago

I disagree. Post privacy sure, but the drivers is by definition public. Anything you put out there can be used for pretty much everything, the original rules of the internet apply. I’d be happy to see an easy opt out on the engine to remove yourself, but if everything is opt in it’ll never get off the ground.

source

Sort:hotnew top

TimLovesTech@badatbeing.social ⁨1⁩ ⁨year⁩ ago
As the fediverse is almost exclusively run by volunteers that are paying server bills and being admins, I could see some larger instances not taking kindly to this, especially depending on how much stress it would be putting on some already at capacity servers.

source
- loobkoob@kbin.social ⁨1⁩ ⁨year⁩ ago
  Ideally, OP's crawlers will just come from their own instance that other instance owners can defederate from if they want to opt out.
  
  source
  - lautan@lemmy.ca ⁨1⁩ ⁨year⁩ ago
    Yeah that would be the case.
    
    source
    scrubbles@poptalk.scrubbles.tech ⁨1⁩ ⁨year⁩ ago
    That’s a good idea. Listen to public data being broadcasted out, then you aren’t worrying people with scraping or anything. It would only be from go live onward, but you would just be listening to the protocol.
    
    source
    -> View More Comments
- TrickDacy@lemmy.world ⁨1⁩ ⁨year⁩ ago
  How much bandwidth do you suppose a crawler would use? I’d guess very little
  
  source
  - TimLovesTech@badatbeing.social ⁨1⁩ ⁨year⁩ ago
    I was thinking more in terms of resources (number of spider threads X posts/communities/users being indexed) that would be now dedicated to a bot, not so much network traffic that is probably tiny if not downloading images.
    
    source
    TrickDacy@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Right, it would be an initial hit but if the bot was properly built it wouldn’t need to do full reindexing very often. I’m no expert but I think it could be done in a way that there is no noticeable spike in traffic or anything
    
    source
    -> View More Comments
  - lautan@lemmy.ca ⁨1⁩ ⁨year⁩ ago
    It will be very little if not downloading full html pages.
    
    source
gabe@literature.cafe ⁨1⁩ ⁨year⁩ ago
That’s not how the fediverse functions and approaching it that way is a problem waiting to happen. I’m stating so as a warning to be mindful of the culture of the way the fediverse itself functions. This is not Reddit, we share the fediverse with other software with different uses and features and we need to be mindful of that especially when building these kinds of tools. Making it opt out not only places a burden on smaller instances but presents a potential harassment risk for instances with vulnerable people on other fediverse platforms. As well, it is contrary to the entire way specific other activitypub instances operate. The fediverse is like a city we share with others, if Lemmy is not mindful of that city’s culture then people will promptly give them the boot.

I’m not saying user by user opt in either, but instance by instance. Lemmy needs a tool of archiving especially. There is already cultural clashes I see occurring with the rest of the fediverse. Post like these of potential tools when it seems like the creator doesn’t know the messy history behind previous projects like them in the fediverse make me fearful of the clashes coming to fruition.

source
- lautan@lemmy.ca ⁨1⁩ ⁨year⁩ ago
  Well that’s why I’m asking for input. And I won’t launch this on every instance without letting them know. Baby steps.
  
  source
  - gabe@literature.cafe ⁨1⁩ ⁨year⁩ ago
    My matrix is open if you want are actually interested in doing this in a way that won’t make the rest of the fediverse flip shit. I support this tools creation especially for lemmy, but if it isn’t done the right way it’ll be received poorly. Making it behave differently on lemmy compared to other software as well might be an idea too.
    
    source
  - Kierunkowy74@kbin.social ⁨1⁩ ⁨year⁩ ago
    Mastodon since 4.2 version supports allows its users to opt into appearing into search results. Just respect this flag with Mastodon users, and you will be fine, IMHO
    
    source
- scrubbles@poptalk.scrubbles.tech ⁨1⁩ ⁨year⁩ ago
  But it publishes all of the data out. I don’t think this is going out to servers asking for data, it’s listening to public data being broadcasted out. If people are broadcasting over activitypub then they’re okay with it being shared
  
  source
- 0x1C3B00DA@kbin.social ⁨1⁩ ⁨year⁩ ago
  
  That’s not how the fediverse functions
  
  That is how the fediverse functions. Instances send posts to anyone who request it, unless a block is in place. ActivityPub is opt-out and the web has always worked this way.
  
  be mindful of the culture
  
  There is no "the culture" on the fediverse. Your talking about a subgroup, which has a different opinion from other subgroups. They don't get to define "culture" on the fediverse.
  
  source
skullgiver@popplesburger.hilciferous.nl ⁨1⁩ ⁨year⁩ ago
[deleted]
source
- scrubbles@poptalk.scrubbles.tech ⁨1⁩ ⁨year⁩ ago
  again it’s not going to servers and scraping data, it would be sitting somewhere receiving public data that is pushed out. There’s no malicious getting around privacy settings, if it’s pushed out then it’s free game. I agree about post privacy, but again activitypub already takes care of that
  
  source