Investigate, fix and migrate a social news aggregator's relational database ✍️
PS :
bad_post
has 50K records whilebad_comments
has 100K records.
bad_posts
is poorly structured to contain data that should ideally be in other tables and only then referenced from thebad_posts
table. The way the username, upvotes and downvotes column is used is proof of this fact.bad_posts
has very little data constraints and validation where it should. Since the url column is optional, the title and text_content need to be not null, at least conditionally.- The upvotes and downvotes columns in
bad_posts
being text will make filtering or aggregating on them very difficult . bad_comments
seems to allow empty comments since the text_content column is not constrained with no null.- Intuitively, comments are usually threaded, but
bad_comments
does not seem to indicate any support for that, - Placing the actual text username of the author into
bad_posts
orbad_comments
means the data can easily become inconsistent. At least it will be a nightmare to maintain data consistency and data integrity in the face of change. Ideally, username should only be updatable in a single place.
Summarized below 👇 Full details are 👉 in here
Guideline 1: Features and specifications that Udacious needs to support its website and admin interface 😎 🤓
- Allow new users to register
- Allow registered users to create new topics
- Allow registered users to create new posts on existing topics
- Allow registered users to comment on existing posts
- Make sure that a given user can only vote once on a given post
- List all users who haven’t logged in in the last year.
- List all users who haven’t created any post.
- Find a user by their username.
- List all topics that don’t have any posts.
- Find a topic by its name.
- List the latest 20 posts for a given topic.
- List the latest 20 posts made by a given user.
- Find all posts that link to a specific URL, for moderation purposes.
- List all the top-level comments (those that don’t have a parent comment) for a given post.
- List all the direct children of a parent comment.
- List the latest 20 comments made by a given user.
- Compute the score of a post, defined as the difference between the number of upvotes and the number of downvotes
The DML for migrating data is in src/migrate/dml.sql, but below are the snippets for migrating users
:
-- users
DROP TABLE IF EXISTS "users";
CREATE TABLE "users" (
"id" SERIAL PRIMARY KEY,
"username" VARCHAR(25) UNIQUE NOT NULL,
"last_seen" TIMESTAMP WITH TIME ZONE,
"created_at" TIMESTAMP WITH TIME ZONE NOT NULL default CURRENT_TIMESTAMP
);
ALTER TABLE "users"
ADD CONSTRAINT "non_empty_username" CHECK (LENGTH(TRIM("username")) > 0);
WITH allusers AS (
SELECT username FROM bad_posts
UNION SELECT unnest(string_to_array(upvotes, ',')) AS username FROM bad_posts
UNION SELECT unnest(string_to_array(downvotes, ',')) AS username FROM bad_posts
UNION SELECT username FROM bad_comments
)
INSERT INTO "users" ("username") SELECT DISTINCT username FROM allusers;
There are lots of duplicate users in different places e.g
SELECT COUNT(users) FROM (SELECT username AS users FROM bad_posts) t1
shows 50K users vsSELECT COUNT(users) FROM (SELECT DISTINCT username AS users FROM bad_posts) t1
shows 100 distinct users who've created posts. Also, querying for upvotes withSELECT unnest(string_to_array(upvotes, ',')) AS users
vsSELECT DISTINCT unnest(string_to_array(upvotes, ',')) AS users
results in 249.7k vs 1.1k users while a smilimar query for downvotes results in 249.9k vs 1.1k users. Comsequently, total users (acrossusername
,upvotes
, anddownvotes
in thebad_posts
table andusername
in thebad_comments
table) sits at ~11.0k