Skip to content

BelumS/spring-git-scraper

Repository files navigation

GitScraper

An app that scrapes user data from GitHub.

Table of contents

Overview

The challenge

Users should be able to:

  • Retrieve API data about a GitHub user by pinging a REST endpoint.
  • See the user data displayed as JSON.

UML Diagram

Architecture Diagram

Screenshot

My process

Built With

  • Spring Boot v2.7.7
    • Spring Web
    • Spring Cache
    • Spring Retry v1.3.2
    • Spring AOP (required dependency for Retry)
  • Java 11
  • Project Lombok
  • Testing
    • JUnit 5
    • AssertJ v3.24.1
  • Springfox API Documentation v3.0.0
  • GitHub REST API

How to Scrape Data - Native

  1. Start the app using the ./gradlew bootRun command
    • If on Windows, run: gradle bootRun
  2. Ping the REST endpoint with command: curl -v localhost:8080/scraper/api/v1/git/${username} | json_pp, or use Postman.
    • Replace ${username} with a valid Github username String
  3. The endpoint will return your desired user data as JSON.

How to Scrape Data - Docker (Optional)

  1. Ensure you have Docker installed, and if you dont, go here
  2. Pull the image from my Docker Hub: docker pull belum/spring-git-scraper:latest
  3. Check if the image was downloaded successfully: docker images
  4. Run the image with: docker run -it -p8080:8080 belum/spring-git-scraper:latest
  5. Interact with the endpoint using the Native instructions

What I Learned

I learned how to get Jackson JSON to serialize JDK 8 Date/Time types.

    implementation 'com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.13.4'
    testRuntimeOnly 'com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.13.4'
@Configuration
public class ApplicationConfig {
  @Bean
  public ObjectMapper objectMapper() {
    ObjectMapper mapper = new ObjectMapper();
    mapper.registerModule(new JavaTimeModule());
    return mapper;
  }
}

I learned how to cache the data using Spring Cache.

implementation 'org.springframework.boot:spring-boot-starter-cache'
@Configuration
@EnableCaching
public class ApplicationConfig {
  @Bean
  public CacheManager cacheManager() {
    SimpleCacheManager cacheManager = new SimpleCacheManager();
    cacheManager.setCaches(List.of(
            new ConcurrentMapCache("users"),
            new ConcurrentMapCache("repos")
    ));
    return cacheManager;
  }
}
@Component
public class GithubClientImpl implements GithubClient {

  private HttpEntity<String> httpEntity() {
    HttpHeaders headers = new HttpHeaders();
    headers.set("Cache-Control", "public, max-age=60, s-maxage=60");
    return new HttpEntity<>(headers);
  }
}
@Service
public class GithubServiceImpl implements GithubService {
    
  @Override
  @Cacheable(value = "users")
  public GitUser getUserData(String username) {
      //blank for brevity
  }

  @Override
  @Cacheable(value = "repos")
  public List<GitRepo> getRepoData(String username) {
    //blank for brevity
  }
}

Useful resources

Author