MEANRedd

A work-in-progress data visualization tool for discovering meaningful words and emergent topics in a corpus of Reddit submissions.

This tool is in the prototyping stage and does not represent the intended functionality.

How to use

Step One: Data Collection

  • Specify a valid subreddit name stub (i.e. spacex)

  • Reddit has a few types of submission listings we can scrape data from: New, Top and Hot. Top can be filtered by day, week, month and year.

  • Including comments will greatly increase the available data but at the tradeoff of making a large number of API calls to Reddit.
    • The quality of the data associated with comments is generally lesser than that of the posts themselves.

Step Two: Data Visualization

  • The tool will process the retrieved data and attempt to create a network between frequently used sets of meaningful words.

  • If no frequent itemsets are found, try lowering the minimum support threshold.
    • Too low of a threshold will result in longer computation times.

  • Remove undesired or subreddit-specific stop words by selecting them in the meaningful word listing.
    • A list of common English stop words have already been loaded.

  • If the tool becomes unresponsive, refresh the page and try again.

Meaningfulness

Our proposed meaningfulness score will focus not only on the frequency of a given word over the subreddit, but it will also be scored based on the karma of the post that the word occurs in. This helps distinguish between a common word and a meaningful word.

This data visualization tool was made as a course project for CSCI4210U Information Visualization taught by Dr. Christopher Collins @ UOIT.

Created by

View the source @ GitHub



Meaningful Words

To filter words from the visualization, select the bar of a word below.

Bars are composed of: karma score and frequency

    Frequent Itemsets


    Min support

    Edge Count:

    No snapshot exists