Ownership of data (DRAFT)
This is a work in progress, name and content may change. Language is not formalised yet
Each bounded context - BC manages its own data. Two BCs should not share a data store. Instead, each BC is responsible for its own private data store, which other services cannot access directly.
- Each BC owns its data but may not necessarily be the 'master' of the data
- Be careful with the word 'master' ... coming up with a single source of truth always fails as data is mastered in many places, so don't get hung up on it. There is always a logical flow of data and source of data but data gets added / removed along the way
- Each service should expose it's data via an API.
- Each service is responsible for describing its data that it wants to expose, eg metadata, how old the data is. - See Evolutionary systems - bulk data could be AVRO, PARQUET, CSV
- The product team should be a point of contact for describing their own data eg a slack channel
- Consumers have a choice about going to the source service or a service in between, eg stock can come from stock service or a search service than consumes the stock service. However the easiest option may not be the freshest data
- Data should be stored and schemas designed by each service. Data can be transformed and services should represent their copy of data in whatever way is best for them
- Think about changes that may be breaking, eg API versioning, consumers should be a tolerant reader
- There is a data loss Google API that automatically masks data - could look into this
- Consider which data is sensitive as part of data modelling / design
Legacy data TBC
- Access for people to data is going to be 'solved' by Partnership Data Platform
- Data lakes are a good use case for this