Category: 4chan

  • My First Corpus (5: Dispute Markers — It’s Really Complicated [CW])

    CW: slurs in the data. Markers As I discovered while writing the previous post, there are more kinds of identity dispute than I had previously thought, and so far I’ve located them by grepping through the data for tropes which are likely to be received with skepticism, for example “I’m not gay but …” (which…

  • My First Corpus (4: Three Terms, One Analysis [CW])

    CW: slurs in the data. After poking and prodding at the data for a while, I saw plenty of amusing exchanges. I noticed that part of what was amusing, aside from 4chan’s characteristic vitriol, was that users were arguing about each other’s identities on an anonymous image board. “Identity construction” is the expression I use…

  • My First Corpus (3: Poking the Data [CW])

    CW: slurs in the data. Data collection As I continue collecting data, I’ve settled on a format for now — a large CSV in which each record consists of:[board], [thread number], [post number], [poster name], [timestamp (seconds)], [body] The [thread number] corresponds to the [post number] of the OP, thus the OP will have the…

  • My First Corpus (2: wgetting Data)

    Before I began, I knew that eventually I wanted to get as much text data from 4chan as possible. I knew that there were boards, which were subdirectories of the site (e.g. the board /diy/ appears at 4chan.org/diy/), and that these each corresponded to a topic, had their own moderators; and to some extent, their…

  • Tumamdai-Vauva (1s: Tsuite)

    If you’re interested in linguistics, at some point you’ve probably run into “Corpus Analysis”, which is linguistic work done on a substantial body (“corpus”) of text. There are all kinds of corpora: one might be all the works by Shakespeare, or all of the language used on a particular news site in a given year,…