Back to archive

Thread

2 tweets

1
A tweet comes with a lot of metadata embedded (e.g. profile information containing follower numbers, etc.) so it‘s more like 1K per tweet. When I was working with the subsampled feed that you could get publicly, it was like 5GB of raw data per day.
2
@srchvrs It was never disclosed what the subsample was, but I think it was more like 1%. Now you could of course discard all the metadata and just stick with the text of the tweet, but then you‘d be discarding also a lot of data that could help you identify bots.