This paper proposes a integrated framework for optimal control of opinion dynamics in social networks, addressing three progressively challenging scenarios: Model-based stochastic control, where agent interactions follow known probability distributions, enabling analytical optimal policies; Model-free Reinforcement Learning (RL), where interaction randomness has unknown distributions but system dynamics are preserved; Data-driven RL for unknown systems, where time-varying network dynamics (with stochasticity constraints) are fully unknown, requiring purely observational learning. By designing an RL control framework grounded in convex quadratic optimization, we bridge model-based control and data-driven learning, offering new insights for social network manipulation and multi-agent coordination. Numerical simulations demonstrate the framework’s effectiveness.