论文

论文学习。主要是分布式系统,数据库,存储的论文。会优先看工作相关系统(如kafka,es,raft,dymano等),然后看其他部分。分布式系统庞大且复杂,经典的理论到具体的产品实现之间的鸿沟以及取舍虽然可以用模型来说通,但是细节是难以推敲的,甚至很难验证,只能靠感觉,因为理论不够扎实以及实践不够深入,所以这里论文是拼凑分布式系统的一个蓝图,从而形成自己的理解。

预计阅读

ID名字作者阅读时间进度
1Highly Available Transactions: Virtues and LimitationsDone
2A Critique of the CAP TheoremDone
3Consistency Tradeoffs in Modern Distributed Database System DesignDone
4Consensus in the Cloud: Paxos Systems DemystifiedDone
5Understanding Replication in Databases and Distributed SystemsDone
6Eventually Consistent:Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.Werner VogelsDone
7Life beyond Distributed Transactions: an Apostate’s OpinionPat HellandToDo
8ZooKeeper: Wait-free coordination for Internet-scale systems
9Zab: High-performance broadcast for primary-backup systems
10Raft:In Search of an Understandable Consensus Algorithm (Extended Version)
11Distributed systems for fun and profit
12Kafka: a Distributed Messaging System for Log Processing
13Designing Access Methods: The RUM Conjecture
14Consistency in Non-Transactional Distributed Storage Systems
15Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applications
16Facebook:Cassandra - A Decentralized Structured Storage System
17Google:The Chubby lock service for loosely-coupled distributed systems
18Google:Bigtable: A Distributed Storage System for Structured Data
19Amazon:Dynamo: Amazon's Highly Available Key-value Store
20Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
21Probabilistically Bounded Staleness for Practical Partial Quorums
22A Survey on the Evolution of Stream Processing Systems
23The Case for Shared Nothing

论文收集

ID名字作者阅读时间进度
1The Anatomy of a Large-Scale Hypertextual Web Sear
2Web Search for a Planet: The Google Cluster Architecture
3The Google File System
4MapReduce: Simplified Data Processing on Large Clusters
5Bigtable: A Distributed Storage System for Structured Data
6The Chubby lock service for loosely-coupled distributed systems
7Interpreting the Data: Parallel Analysis with Sawzall
8Pregel: a system for large-scale graph processing
9Dremel: Interactive Analysis of Web-Scale Datasets
10Percolator: Large-scale Incremental Processing Using Distributed Transactions and Notifications
11MegaStore: Providing Scalable, Highly Available Storage for Interactive Services
12Case Study GFS: Evolution on Fast-forward
13Google File System II: Dawn of the Multiplying Master Nodes
14Tenzing – A SQL Implementation on the MapReduce Framework
15F1-The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business
16Elmo: Building a Globally Distributed, Highly Available Database
17PowerDrill:Processing a Trillion Cells per Mouse Click
18Google-Wide Profiling:A Continuous Profiling Infrastructure for Data Centers
19Spanner: Google’s Globally-Distributed Database
20Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
21Omega: flexible, scalable schedulers for large compute clusters
22CPI2: CPU performance isolation for shared compute clusters
23Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams
24F1: A Distributed SQL Database That Scales
25MillWheel: Fault-Tolerant Stream Processing at Internet Scale
26B4: Experience with a Globally-Deployed Software Defined WAN
27The Datacenter as a Computer
28Google brain-Building High-level Features Using Large Scale Unsupervised Learning
29Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing
30Shasta: Interactive Reporting at Scale
31Goods: Organizing Google’s Datasets
32FlumeJava: Easy, Efficient Data-Parallel Pipelines
33Large-scale cluster management at Google with Borg
34Borg: The Predecessor to Kubernetes
35Spanner: Becoming a SQL System
36The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
37Appraising Two Decades of Distributed Computing Theory Research
38A brief history of Consensus 2PC and Transaction Commit
39The Byzantine Generals Problem
40Impossibility of distributed consensus with one faulty process
41Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency
42Time Clocks and the Ordering of Events in a Distributed System
43The Part Time Parliament
44How to Build a Highly Available System Using Consensus
45Paxos Made Simple
46Paxos Made Live
47Consensus on Transaction Commit
48Why Do Computers Stop and What Can Be Done About It
49On Designing and Deploying Internet-Scale Services
50Single-Message Communication
51How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
52Distributed Snapshots: Determining Global States of a Distributed System
53Self-stabilizing systems in spite of distributed control
54Wait-Free Synchronization
55Solution of a Problem in Concurrent Programming Control
56A New Solution of Dijkstra’s Concurrent Programming Problem
57Life beyond Distributed Transactions:an Apostate’s Opinion
58Hints for Computer System Design
59Virtual Time and Global States of Distributed Systems
60Timestamps in Message-Passing Systems That Preserve the Partial Ordering
61Fundamentals of Distributed Computing:A Practical Tour of Vector Clock Systems
62Knowledge and Common Knowledge in a Distributed Environment
63Understanding Failures in Petascale Computers
64Why Do Internet services fail, and What Can Be Done About It?
65End-To-End Arguments in System Design
66Rethinking the Design of the Internet: The End-to-End Arguments vs. the Brave New World
67The Design Philosophy of the DARPA Internet Protocols
68Uniform consensus is harder than consensus
69Paxos made code – Implementing a high throughput Atomic Broadcast
70RAFT:In Search of an Understandable Consensus Algorithm
71Problems, Unsolved Problems and Problems in Concurrency
72Implementing fault-tolerant services using the state machine approach
73White Paper Introduction to IEEE 1588 & Transparent Clocks
74Unreliable Failure Detectors for Reliable Distributed Systems
75A Relational Model of Data for Large Shared Data Banks
76SEQUEL:A Structured English Query Language
77Implentation of a Structured English Query Language
78A System R: Relational Approach to Database Management
79Granularity of Locks and Degrees of Consistency in a Shared DataBase
80Access Path Selection in a RDBMS
81Notes on Data Base Operating Systems
82The Transaction Concept:Virtues and Limitations
83NONBLOCKING COMMIT PROTOCOLS
84MVCC:Multiversion Concurrency Control-Theory and Algorithms
85ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging
86A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem
87A Formal Model of Crash Recovery in a Distributed System
88What Goes Around Comes Around
89Anatomy of a Database System
90Architecture of a Database System
91Towards Robust Distributed Systems
92Harvest, Yield, and Scalable Tolerant Systems
93BASE an Acid Alternative
94MapReduce: A major step backwards
95The Log: What every software engineer should know about real-time data’s unifying abstraction
96Dynamo: Amazon’s Highly Available Key-value Store
97Cassandra – A Decentralized Structured Storage System
98PNUTS: Yahoo!’s Hosted Data Serving Platform
99Designs, Lessons and Advice from Building Large Distributed Systems
100The Tail at Scale
101How To Design A Good API and Why it Matters
102The ganglia distributed monitoring system:design, implementation, and experience
103Chukwa: A large-scale monitoring system
104Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
105Practical Byzantine Fault Tolerance
106PacificA: Replication in Log-Based Distributed Storage Systems 2008
107Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
108Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing
109Consistent Hashing with Bounded Loads Google 2017
110The φ Accrual Failure Detector
111CAP Twelve Years Later: How the "Rules" Have Changed
112A simple totally ordered broadcast protocol 2008
113Virtual Time and Global States of Distributed Systems 2002
114Paxos Made Practical
115Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore 2011
116Consensus in the Presence of Partial Synchrony 1988
117Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 2003
118Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems 2001
119Kademlia: A Peer-to-Peer Information System Based on the XOR Metric 2002
120A Scalable Content-Addressable Network 2001
121Ceph: A Scalable, High-Performance Distributed File System 2006 OSDI Sage Weil
122The Log-Structured Merge-Tree (LSM-Tree) 1996
123HBase: A NoSQL Database 2017 Hiren Patel
124Tango: Distributed Data Structures over a Shared Log 2013
125Finding a needle in Haystack: Facebook's photo storage 2010
126Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency 2011
127Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing 2012
128Scaling Distributed Machine Learning with the Parameter Server 2014
129S4: Distributed Stream Computing Platform 2010
130Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases 2017
131Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes 2018
132Chain Replication for Supporting High Throughput and Availability
133No compromises: distributed transactions with consistency, availability, and performance
134Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS
135Secure Untrusted Data Repository (SUNDR)
136The Case for Shared NothingMichael Stonebraker
137the red book
138SAGASHector Garcaa-Molrna
139ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging
140Big Data:A survey

论文来源

  1. 分布式系统领域经典论文翻译集:http://duanple.com/?p=170
  2. DDIA:https://github.com/ept/ddia-references
  3. MIT:http://dsrg.pdos.csail.mit.edu/papers/
  4. MIT6.824:https://pdos.csail.mit.edu/6.824/schedule.html
  5. 分布式数据库系统原理 :https://cs.uwaterloo.ca/~ddbook/downloads/appendix/References.pdf
  6. Kafka论文:https://kafka.apache.org/books-and-papers
  7. CMU15721:https://15721.courses.cs.cmu.edu/spring2020/schedule.html
  8. Distributed Systems Reading List:https://dancres.github.io/Pages/
  9. Lamport:https://lamport.azurewebsites.net/pubs/pubs.html
  10. 共识:https://github.com/dgryski/awesome-consensus
  11. Awesome Distributed Systems:https://github.com/theanalyst/awesome-distributed-systems
  12. Readings in distributed systems:http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html
  13. Foundational distributed systems papershttp://muratbuffalo.blogspot.com/2021/02/foundational-distributed-systems-papers.html
  14. Distributed Systems Readings:https://henryr.github.io/distributed-systems-readings/
  15. Developer And Architect:https://github.com/xiaozhiliaoo/my-collect
  16. 100 open source Big Data architecture papers for data professionals:https://github.com/xiaozhiliaoo/bigdata/blob/main/paper/100paper-paypal-anil-madan.md
  17. 世界名校的课程收集来的论文:https://xiaozhiliaoo.github.io/distributed-system-practice/course/
  18. 各种《分布式系统泛型与原理》《分布式系统概念与技术》书
  19. https://github.com/rxin/db-readings
  20. CMU_15445,CMU_15721 课程。