Replies: 3 comments
-
After some further thought, it makes sense to consider the "IPFS Controller" to be the main concept, and the "interfaces" to simply be clients for the controller which can then be implemented in however way possible. The concept of creating a CDN using this system would be an application of the IPFS Controller, and can be separated from this write-up. |
Beta Was this translation helpful? Give feedback.
-
This design seems fine at a high level assuming that your private nodes are peered with each other sufficiently. I'd take a use case (e.g. CDN), flesh out the specific requirements and metrics and then iterate based on that. You may want to look at IPFS search. It seems to have a system similar to your "controller" above where you can query for CIDs based on certain metadata. I think the Filecoin slack (https://filecoin.io/slack) is probably a good place to get some more feedback. It's used both for IPFS and FIlecoin discussion. |
Beta Was this translation helpful? Give feedback.
-
IPFS Controller
IPFS Controller is a system which allows applications to leverage the benefits of IPFS for
data distribution and replication for usage within existing large-scale applications.
Specifically, applications can query a centralized controller to learn about the current
state of content in their network, and be able to access data relevant to their use-case.
Motivation
Suppose you operate a thriving video-sharing platform, whose users view and upload short-lived video blobs which then virally spread throughout your user network. To achieve a snappy user experience, you have your servers aggregate and cache videos on servers that are located near their users.
You don't know which blobs you will need to download, only some specific parameters that you'd like to search for.
For example, a server you operate located in the Southwest United States would run a query for the following parameters:
Obtaining a listing of content identifiers, along with added metadata and attributes associated with the given content. After querying for this data, your server now has a list of all the content IDs it needs to download within that refresh cycle.
All that would need to happen from this point forward, is that the server ask its local IPFS node to download these exact IDs.
The beauty of going through IPFS for this, is that your nodes no longer need to track or maintain exact location addresses of where the content can be found. This provides a separation of concern between the application logic - who is concerned about what content it's serving to users - and the data fetching logic, which is only tasked with obtaining the desired content identifiers.
Architecture
Overview
The big picture of this architecture is summarized as follows:
Application <--> IPFS Controller
Of all the concepts presented in this outline, this is the most important one.
The communication between the application and the IPFS controller must be simple.
This means that it should be relatively simple for the application to
request data from the controller and upload new data.
Here's a rough outline of the interaction flow:
In this scenario, there are three main actors:
The application asks the interface for some data, say the 50 most downloaded movies in the united states, which then asks the controller for content IDs based on that filter.
Once the interface has the content IDs, it downloads that data from the IPFS network and
provides the results to the application. Now that the interface has downloaded that content,
it will be stored as a cache on the local IPFS node.
This means that not only will subsequent requests for the same content be
cached and quickly retrievable, other nodes downloading the same content
will be able to get it from those who already have it.
Because nodes can download content from each other in addition to the central content store,
this has the added benefit of reducing load on the central store.
Content Orchestration and Events
Another aspect of this would be the ability for the IPFS controller, or another overseeing
application to orchestrate the data cached by a given server.
In theory, you can already do this using IPFS without the use of a controller, but the bottleneck
ends up being that you don't know details about content popularity within a network.
You may configure metrics exporters and loggers within your application logic, but then where would the metrics end up going? A centralized database of course, which will then dictate content popularity, and... you guessed it, issue directives for the servers to then download said content.
The solution to this problem ends up being the IPFS controller once again,
therefore the ability for content orchestration should be provided either through the controller
or an auxillary application subscribed to the controller's events.
Load on IPFS Controller
The IPFS Controller never stores any actual IPFS data. Instead, it is solely responsible
for the storing of records associated with IPFS Content IDs.
This provides more availability for the controller to process an intense amount of requests
without breaking a sweat, and opens up room to integrate an eventing system where clients can subscribe to receive certain events, such as the publication of new records matching some set of parameters.
IPFS Controller
This is essentially the "brains" of the operation. The IPFS Controller has the following responsibilities:
API
With respect to the following points, we represent the API using GraphQL for its ability to
easily provide certain types of queries:
The above schema is a sample of what it could be.
The following takes into account popularity data, allowing the controller
to return the IPFS CIDs based on how many views certain bits of content are receiving.
IPFS Controller API w/ Popularity & Metrics
Controller storage
Storing this data should ideally be handled by either a NoSQL document store such as MongoDB, CockroachDB, or a key-value database such as redis or etcd.
SQL should be avoided due to issues in obtaining scale in the way that would be required for this type of system.
Assuming MongoDB as the database, the records would be stored in a primary collection titled
content
and would have the following shape:
Just like in object storage, there are additional attributes associated with the data.
These attributes would be mutable piece of data which assist in knowing which data to interact with.
Use as a CDN
The IPFS Controller system in theory allows you to have CDN boxes which download data
in a mechanism similar to Amazon S3. The benefit of this system, is that when it comes
to storing immutable blobs of data (images, videos, 3D objects, etc.), CDN
nodes would be able to distribute the load incurred from downloading from
a central source onto other CDN servers carrying the same content.
In this instance, the controller would take on the role of talking with nodes
and dictating which content gets stored on which server.
Comparison to Existing CDN Solutions
A CDN system which is roughly similar is AWS CloudFront, which uses S3
for its storage engine - which leads me to think that something similar
could be accomplished by leveraging IPFS as a storage layer.
The unknowns in this system are how exactly this system would provide a real benefit
over current solutions, what problems current solutions have, and whether this system could be used to
address any of them.
Roles & Permissions
By having the controller as a central ledger of what content exists within a system, it's
also possible to provide RBAC to content & manage all data available within a network
while still leveraging the benefits of handling data using a decentralized network internally.
Data Visibility
The idea for this system is that all IPFS sidecar nodes would ideally be running on the same
private network, disconnected from the public IPFS DHT.
It would also be possible to run this type of system in public, however you would lose
out on the ability control access to content once it has been distributed.
Beta Was this translation helpful? Give feedback.
All reactions